Author Archives: admin

Is Power BI ready for the enterprise?

Power BI Ready For The EnterpriseWith the latest Power BI releases in mid-2016 Microsoft finally plugs the gap between on-premise Business Intelligence solutions and the cloud. On-premise enterprise BI solutions can now be re-used seamlessly with low level of development and infrastructure changes. If you would have asked Microsoft, they would probably have told you that Power BI was enterprise ready years ago. Even though the product might have been visually appealing and worked well for self-service, small businesses to SME’s the product lacked important features for enterprise BI. Scalability and stability have also been an issue. Until now. Last year Microsoft have made quite a commitment to Power BI and some important features have been added which makes Power BI more mature and maybe a game changer for enterprises to start moving Business Intelligence into the cloud. Among some of the exciting features that have been released last year are:

  • Live connection from Power BI to existing on-premise multidimensional cubes using On-Premise Data Gateway
    – No need to rebuild cubes to tabular cubes anymore. This also removes any scalability issues of having data models stored in the cloud (limits on data size and data refresh timeout). Re-use your existing multidimensional cubes and infrastructure.
  • Analyze in Excel
    – Users can now connect Excel directly to the Power BI service for data either stored directly in the cloud or your on-premise multidimensional or tabular cubes. Functionality for Power BI Excel connectivity to multidimensinoal cubes where added last month. This is a very important step since now we are able to benefit of interactive visuals and collaborative features of Power BI and also at the same time keep existing functionality of Excel for end users. 
  • Reporting Services Integration
    – Re-use your existing Reporting Services reports in Power BI by pinning them to your dashboards. Currently you will only be able to access them when you are within your internal network.
  • Integration with Azure Machine Learning, Azure Data Factory and statistical analytics using Microsoft R which will open more business opportunties through your data.
  • Integration with Big Data/Hadoop/HDInsight and all kinds of other sources
  • Cloud based BI solution for a single point of entry to all our enterprise data at an affordable price
  • Improved stability and reliability
  • Good drill-down functionality within the same dashboard. (This is something that PowerView and Power BI lacked previously)
  • Mobile BI and custmoization of reports for mobile view (This part still leaves some room for improvements. For example Android tablets are not supported yet, however Android phones works well)
  • Role and row level security. Possible though using live connection to on-premise analysis services and re-use existing security models or directly within power BI itself (the latter is still i preview).

With the features above we have the ability to create BI solutions that cover the whole span of analytics from traditional reports, interactive dashboards, mobile BI and self-service BI both for advanced (using Excel/R) and mainstream end users (using easy drag and drop functionality in Power BI).

I find that many enterprise organizations that use Microsoft as a BI platform tend to rely heavily on Excel, Reporting Services and SSAS multidimensional cubes. Hence I find it very strange that full integration and compatibility with those on premise solutions not have been incorporated into Power BI from the first release. However now it is here so if you belong to the organizations who use Microsoft BI and are considering moving into the cloud Power BI can be the perfect match.

Over the years Microsoft BI portfolio have been going in all directions with multiple solutions that a lot of time overlap each other making it hard to know which tool to use where and for what. Hopefully we can now start to see the end of it. Just to mention a few BI visualization tools from Microsoft:

  • Performance Point Server
  • Reporting Services (standalone)
  • Reporting Services (through SharePoint)
  • Performance Point Services (through SharePoint)
  • Excel Services (through SharePoint)
  • PowerPivot
  • PowerView (though Excel)
  • PowerView (through SharePoint)
  • PowerMap (through Excel)
  • Plain Excel
  • DataZen (which now is part of reporting services)

*By SharePoint I refer to SharePoint Server on-premises

Microsoft have for some time had a strong backend for BI solutions using Microsoft SQL server and Analysis Services. However, before Power BI Microsoft have been behind competitors when it comes to visualize data in an easy and good looking way. Either organizations have used another vendor or relied on multiple solutions from Microsoft to achieve same features such as combining SharePoint, Excel Services, Power View and Performance Point services. The first usually require additional licensing costs and the latter also add licensing costs but also costs for maintenance and overhead since having a flora of multiple solutions increase complexity.

I think with Power BI getting more mature it is a good opportunity to see how existing solutions can be consolidated to reduce cost and make maintenance and scalability easier. Combining Power BI, Reporting Services and Excel would cover most organisations visiualization needs and could be used to repalce various on-premise solutions such as SharePoint BI and third party solutions. If you already are on Office 365 you might consider to scrap the entire on-premise SharePoint farms all together. Far to often I believe SharePoint farms are installed with limited knowledge and never used to their full potential. However, the product is rather complex and tend to be maintenance heavy and error prone if not taken care of properly. Hence Power BI may be an appealing option since we do not need to care for infrastructure, configurations and security patches etc. There might be cases where data is required to be kept on premise for legal reasons or complex customized solutions so that we cannot move away from on-premise SharePoint. However with Power BI we are now able to let data stay on-premise and do the data visualization, gather insights and collaborate and share them in the cloud.

When I first got in contact with Microsoft Power BI in 2013 the product felt as an unfinished implementation of PowerPivot and Power View in Office 365 rebranded as “Power BI”. Even though the solution provided a nice facelift to Microsoft BI portfolio it lacked a lot of functionality that would be required for the typical enterprise and also had a lot of stability and scalability issues and lacked important features for end-users such as:

  • Not being able to re-use existing multidimensional cubes. Only tabular cubes where supported and they had to be created within Excel and PowerPivot and then uploaded to the cloud. Later functionality to access on-premise tabular cubes where added but multidimensional cubes where left out. I find this very strange since if Microsoft would have liked a quick adoption of their product I believe they should have had supported multidimensional cubes first since that is what most of their existing clients would already be using.
  • Not being able to re-use existing reporting services reports.
  • Limited visual customization due to the use of Power View (for example you could not even properly customize the colors of a chart, you could only use pre-defined themes)
  • When keeping PowerPivot data models in the cloud there was initially a limit of 2gb. Also to schedule them for refresh had a 30-minute timeout. Hence analytics of larger datasets where completely out of the picture. Going with this approach usually also resulted in that existing multidimensional cube solutions had to be rebuild as tabular models in PowerPivot.
  • A lot of bugs and random errors. (I have a 22-page word document with screenshots of different errors which I send to Microsoft)
  • No native apps for Android or IOS only windows
  • Based on Silverlight, then an effort to make it work in HTML5 was done however it lacked functionality compared to Silverlight function
  • The HTML5 version did not work properly on iPad2 (It crashed when you did any kind of zooming with your fingers in Safari)
  • No proper drill-down functionality (All visual objects could only show data from the selected level and drill-down had to be achieved by having multiple reports without connections to each other)
  • No good way to restrict data for certain roles/users other than creating multiple reports/data sources

Current version of Power BI has solved the issues above and feels as a more mature product. Microsoft already had a strong backend for BI solutions and with Power BI they have now bridged the gap between the two. The service comes with 99.9% uptime, financially backed service level agreement (SLA) that will be enough for most organizations. Enterprise organizations are now able transition into the service by re-using their on-premise data warehouse (DW) and analysis services solutions at an affordable price making Power BI a competitive choice in the BI market.

 

Raspberry PI 2 Hadoop 2 Cluster

Setup your own Raspberry PI 2 Hadoop 2 cluster (Raspberry PI 2 Model B and Hadoop 2.7.2) on Rasbian Linux. For some background and general information around Hadoop please see my previous post:

http://www.widriksson.com/raspberry-pi-hadoop-cluster/

Contents

The Setup

  • 3x Raspberry PI 2 Model Braspberry pi hadoop cluster boards(4core CPU, 1GB RAM)
  • 3x 16gb MicroSDHC cards (Sandisk UHS-I 10x)
  • Rasbpian Jessie Lite
    (Linux 4.1.17-v7+)
  • Oracle Java java version 1.8.0_65
  • Hadoop 2.7.2 (ARM native compiled from source)

Difference Hadoop 1.x vs. Hadoop 2.x

Some of the more major differences are:

  • YARN (Yet Another Resource Negotiator) – Next generation MapReduce (MRv2)
    • Separation of processing engine and resource management which was implemented in Hadoop 1.x mapreduce
    • In Hadoop 1.x all processing was done through the mapreduce framework. With Hadoop 2.x the use of other data processing  frameworks is possible
    • TaskTracker slots are replaced with containers which are more generic
    • Hadoop 2.x MapReduce programs are backward compatible with Hadoop 1.x MapReduce
    • Overall increased scalability and performance
  • HDFS Federation – Possible to use multiple namenode servers to manage namespace which allows for horizantal scaling

Hadoop 1 vs 2

Depricated properties

See the following urls for properties that are depricated from Hadoop 1.x:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

Web Interface (default ports)

With Yarn and MapReduce 2 (MR2) http ports for monitoring have changed:

TCP Port Application Configuration property
8032 ResourceManager Client RPC yarn.resourcemanager.address
8030 ResourceManager Scheduler RPC yarn.resourcemanager.scheduler.address
8033 ResourceManager Admin RPC yarn.resourcemanager.admin.address
8088 ResourceManager Web UI and REST APIs yarn.resourcemanager.webapp.address
8031 ResourceManager Resource Tracker RPC yarn.resourcemanager.resource-tracker.address
8040 NodeManager Localizer RPC yarn.nodemanager.localizer.address
8042 NodeManager Web UI and REST APIs yarn.nodemanager.webapp.address
10020 Job History RPC mapreduce.jobhistory.address
19888 Job History Web UI and REST APIs mapreduce.jobhistory.webapp.address
13562 Shuffle HTTP mapreduce.shuffle.port

Install Raspbian and prepare environment for Hadoop

Rasbian installation

Download Raspbian Jessie Lite:

https://www.raspberrypi.org/downloads/raspbian/

Write to SD card (use any tool of choice) for windows I use:

https://sourceforge.net/projects/win32diskimager/

Plugin in SD card and fire up your PI.

For inital configuration (raspi-config)

  • Expand filesystem
  • Under 9 Advanced Options -> A3 Memory Split
    Choose 16MB to give as much RAM as possible for Hadoop

If you wish you may also try to overlcock the PI a bit to improve performance. For this tutorial I use default.

Configure network

Login as pi (default password: raspberry)

Edit (as root with sudo -s):

 /etc/dhcpcd.conf

At the bottom of the file add:

interface eth0
static ip_address=192.168.0.110/24
# static ip_address=192.168.0.111/24
# static ip_address=192.168.0.112/24
static routers=192.168.0.1
static domain_name_servers=123.123.123.123 123.123.123.123

Change domain name servers to your environment. Note that we are not required to make any changes to /etc/interfaces in Raspbian Jessie.

Update system and install Oracle Java

sudo apt-get update && sudo apt-get install oracle-java7-jdk

Run update-alternatives, ensure jdk-8-oracle-*** is selected:

sudo update-alternatives --config java

Configure Hadoop user

Create a new user for use with Hadoop:

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo

Create SSH paris keys with blank password. This will enable nodes to communicate with each other in the cluster.

su hduser
mkdir ~/.ssh
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Login as hduser (answer yes when prompted to trust certificate key – otherwise Hadoop will fail to login later)

su hduser
ssh localhost
exit

Compile Native Hadoop 2.7.2 for Raspberry PI (ARM)

Ensure you have logged out as hduser and logged in as pi user. (for sudo command below to work properly)

Install protobuf 2.5.0

This is required to build Hadoop.

wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar xzvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure --prefix=/usr
make
make check
sudo make install

Install Hadoop 2.7.2

Download and build

wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz
tar xzvf hadoop-2.7.2-src.tar.gz

Java 8 uses a more strict syntax than previous version. We need to be less strict to be able to compile Hadoop 2.7.2. To fix this edit:

hadoop-2.7.2-src/pom.xml

Between <properties></properties> tags insert the following:

<additionalparam>-Xdoclint:none</additionalparam>

For Hadoop 2.7.2 to build properly we also need to apply a patch.

cd hadoop-2.7.2-src/hadoop-common-project/hadoop-common/src
wget https://issues.apache.org/jira/secure/attachment/12570212/HADOOP-9320.patch
patch < HADOOP-9320.patch

cd ~/hadoop-2.7.2-src/

To start build Hadoop run the following: (Note this may take ~1,5 hours)

sudo mvn package -Pdist,native -DskipTests -Dtar

Install

Copy compiled binaries to /opt

cd hadoop-dist/target/
sudo cp -R hadoop-2.7.2 /opt/hadoop

Give access to hduser

sudo chown -R hduser.hadoop /opt/hadoop/

Verify installation and native libraries

su hduser
cd /opt/hadoop/bin
hadoop checknative -a

16/03/24 20:20:03 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/03/24 20:20:03 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib/arm-linux-gnueabihf/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib/arm-linux-gnueabihf/libbz2.so.1
openssl: true /usr/lib/arm-linux-gnueabihf/libcrypto.so

Check version

hadoop version

Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by root on 2016-02-21T19:05Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.7.2.jar

Configure environment variables

In /etc/bash.bashrc, add to bottom of file:

export JAVA_HOME=$(readlink -f /usr/bin/java | sed “s:jre/bin/java::”)
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_HOME=$HADOOP_INSTALL

Edit and change varibales in Hadoop environment.sh (/opt/hadoop/etc/hadoop/)

Find out your java home (readlink -f /usr/bin/java | sed “s:jre/bin/java::).

export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt

Enable the use of native hadoop library and IPv4 stack:

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_INSTALL/lib/native -Djava.net.preferIPv4Stack=true"

Hadoop 2.7.x YARN and MapReduce memory and resources configuration

Some of the challanges of running Hadoop on the Raspberry is the limited resources. In order for it to run properly we will need to adjust memory configuration of YARN and Mapreduce framework.

Slots in Hadoop 1.x vs. containers in Hadoop 2.x

In Hadoop 1.x each map or reduce task occupied a slot regardless of how much memory or cpu it used. In Hadoop 2.x YARN is coordinating resources (such as RAM, CPU and disk) among different applications that is running on the cluster. This allows for more dynamic and efficient use of resources. YARN will allocate resources to applications (MapReduce, Spark, Impala etc..) by using containers. A container is a unit of processing capacity in YARN. Each application in the cluster will request processing capacity from YARN which then in turn is provided by a processing container. When the applicaiton reuqest the container the request also contains information about how much resources it would like to use. YARN is then balancing and negoitating the resource requests on the cluster. For the purpose of this tutorial we will look into the memory configuration.

YARN Configuration (yarn-site.xml)

For more details see: https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Property Default Our Desription
yarn.nodemanager.resource.memory-mb 8192 768 Amount of physical memory, in MB, that can be allocated for containers
yarn.scheduler.minimum-allocation-mb 1024 64 Minimum amount of memory YARN will allocate for a container. An application should ask for at least this amount of memory or more. If it ask for more YARN will round it up to the closest unit of yarn.scheduler.minimum-allocation-mb. For example if we have a minimum allocation of 2048mb and an application asks for 3072mb it will receieve a container of 4096mb
yarn.scheduler.maximum-allocation-mb 8192 256 Maximum amount of memory YARN will allocate for a container
yarn.nodemanager.vmem-check-enabled true true true/false, weather enforced virtual memory limit will be enabled
yarn.nodemanager.vmem-pmem-ratio 2.1 2.1 Ratio of virtual to physical memory when applying memory limit for containers
yarn.nodemanager.pmem-check-enabled true true true/false, weather enforced physical memory limit will be enabled

In case memory would exceed physical/virtual memory limits you will receieve errors similar to below when running mapreduce jobs:

16/05/12 10:07:02 INFO mapreduce.Job: Task Id : attempt_1463046730550_0002_m_000006_0, Status : FAILED
 Container [pid=20945,containerID=container_1463046730550_0002_01_000047] is running beyond virtual memory limits. Current usage: 38.2 MB of 128 MB physical memory used; 484.2 MB of 268.8 MB virtual memory used. Killing container.
 Dump of the process-tree for container_1463046730550_0002_01_000047 :
 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
 |- 20945 20941 20945 20945 (bash) 1 2 3121152 493 /bin/bash -c /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx404m -Djava.io.tmpdir=/hdfs/tmp/nm-local-dir/usercache/hduser/appcache/application_1463046730550_0002/container_1463046730550_0002_01_000047/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1463046730550_0002/container_1463046730550_0002_01_000047 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.0.110 48748 attempt_1463046730550_0002_m_000006_0 47 1>/opt/hadoop/logs/userlogs/application_1463046730550_0002/container_1463046730550_0002_01_000047/stdout 2>/opt/hadoop/logs/userlogs/application_1463046730550_0002/container_1463046730550_0002_01_000047/stderr
 |- 20952 20945 20945 20945 (java) 668 36 504578048 9288 /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx404m -Djava.io.tmpdir=/hdfs/tmp/nm-local-dir/usercache/hduser/appcache/application_1463046730550_0002/container_1463046730550_0002_01_000047/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1463046730550_0002/container_1463046730550_0002_01_000047 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.0.110 48748 attempt_1463046730550_0002_m_000006_0 47

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

MapReduce 2 configuration properties (mapred-site.xml)

For more details see:
https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Property Default Our Desription
mapreduce.framework.name local yarn The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn
mapreduce.map.memory.mb 1024 256 The amount of memory to request from the scheduler for each map task
mapreduce.map.java.opts -Xmx1024M -Xmx204M Java opts (Max JVM heap size for map task. (Should be less than mapreduce.map.memory.mb, common recommendation 0.8 * mapreduce.map.memory.mb))
mapreduce.map.cpu.vcores 1 2 Number of CPU’s availible for map tasks
mapreduce.reduce.memory.mb 1024 102 Max amount of memory for reduce task
mapreduce.reduce.java.opts -Xmx2560M -Xmx102M Java opts (Max JVM heap size for reduce task. (Should be less than mapreduce.reduce.memory.mb, common recommendation 0.8 * mapreduce.reduce.memory.mb))
mapreduce.reduce.cpu.vcores 1 2 Number of CPU’s availible for reduce tasks
yarn.app.mapreduce.am.resource.mb 1536 128 Max amount of memory for App Master
yarn.app.mapreduce.am.command-opts -Xmx1024m -Xmx102M Java opts (Max JVM heap size for App Master)
yarn.app.mapreduce.am.resource.cpu-vcores 1 1 Max amount of CPU’s for App Master
mapreduce.job.maps 2 2 The default number of map tasks per job. Ignored when mapreduce.jobtracker.address is “local”
mapreduce.job.reduces 1 2 The default number of reduce tasks per job. Ignored when mapreduce.jobtracker.address is “local”

YARN and MapReduce memory configuration overview

With our setup using the values above the YARN resource manager could potentially give use the following scenarios below when running MapReduce jobs:

 

hadoop 2 yarn mapreduce 2 memory configuration 1

hadoop 2 yarn mapreduce 2 memory configuration 2

The idea is that we try to utilize as much RAM and CPU power out of our PI as possible.

Configure Single Node

Edit configuration files in /opt/hadoop/etc/hadoop

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hdfs/tmp</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>5242880</value>
  </property>
</configuration>
Property Default Our Description
dfs.replication 3 1 By default hadoop replicates data on 3 nodes. Since we only have one node we set it to 1 and later switch it to 3.
dfs.blocksize 134217728 5242880 Default blocksize for new files in bytes. Default value is 128mb.  In this tutorial we will use files less than this size and in order for them to scale on Raspberry PI we set this property to 5mb instead. This means that files will be split in chunks of 5mb in our cluster when distributed among nodes and also each of our mapreduce tasks will work with 5mb at a time.

For a complete list of properties for hdfs-default.xmlhttps://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

slaves

Currently we only have one node. Keep default value of localhost. We will later add all our nodes here for a multi-node setup.

localhost

yarn-site.xml

<configuration>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address</name>
		<value>node1:8025</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address</name>
		<value>node1:8035</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address</name>
		<value>node1:8050</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.cpu-vcores</name>
		<value>4</value>
	</property>
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>768</value>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>64</value>
	</property>
	<property>
		<name>yarn.scheduler.maximum-allocation-mb</name>
		<value>256</value>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-vcores</name>
		<value>1</value>
	</property>
	<property>
		<name>yarn.scheduler.maximum-allocation-vcores</name>
		<value>4</value>
	</property>
	<property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.nodemanager.pmem-check-enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.nodemanager.vmem-pmem-ratio</name>
		<value>2.1</value>
	</property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>256</value>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx204m</value>
    </property>
    <property>
        <name>mapreduce.map.cpu.vcores</name>
        <value>2</value>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>128</value>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx102m</value>
    </property>
    <property>
        <name>mapreduce.reduce.cpu.vcores</name>
        <value>2</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>128</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx102m</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
        <value>1</value>
    </property>
    <property>
        <name>mapreduce.job.maps</name>
        <value>4</value>
    </property>
    <property>
        <name>mapreduce.job.reduces</name>
        <value>4</value>
    </property>
</configuration>

Format HDFS filesystem

sudo mkdir -p /hdfs/tmp
sudo chown hduser:hadoop /hdfs/tmp
sudo chmod 750 /hdfs/tmp
hadoop namenode -format

Start Hadoop

Run the following commands as hduser:

start-dfs-sh
start-yarn.sh

Verify that all services started correctly:

jps

Result should show the following active processes:
1413 NodeManager
838 NameNode
1451 Jps
1085 SecondaryNameNode
942 DataNode
1311 ResourceManager

If all processes not are visible review log files in /opt/hadoop/logs

Run hadoop hello world – wordcount

Download sample files (From my preivous blog post)

Run all commands as hduser. Extract the files to home directory of hduser.

tar -xvzf hadoop_sample_txtfiles.tar.gz ~/

Upload files to HDFS:

hadoop fs -put smallfile.txt /smallfile.txt
hadoop fs -put mediumfile.txt /mediumfile.txt

Execute wordcount example from source code folder (~/hadoop-2.7.2-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/target/):

time hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /smallfile.txt /smallfile-out
time hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /mediumfile.txt /mediumfile-out

Result:

File Size Using Raspberry PI 1 Raspberry PI 2 (this setup)
smallfile.txt 2MB 2:17 1:47
mediumfile.txt 35MB 9:19 5:50

When you execute the job you will also see that there is an http url to track progress:

lynx http://node1:8088/proxy/application_1463048023890_0002 (Url will be unique to your execution, see url when you start the job – you may use any browser in this case I used Lynx)

Task Tracker

When this snapshot was taken we can see that we have one runing map task and 2 running reduce tasks which looks good concerning our anticipated memory allocation outcome here.

Configure Multiple Nodes

Clear HDFS filesystem

rm -rf /hdfs/tmp/*
hdfs namenode -format

Edit configuration files

/etc/hostname

node1/node2/node3

/etc/hosts

192.168.0.110 node1
192.168.0.111 node2
192.168.0.112 node3

/etc/dhcpcd.conf

interface eth0
static ip_address=192.168.0.110/24
# static ip_address=192.168.0.111/24
# static ip_address=192.168.0.112/24
static routers=192.168.0.1
static domain_name_servers=123.123.123.123 123.123.123.123

/opt/hadoop/slaves

node1
node2
node3

/opt/hadoop/core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hdfs/tmp</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://node1:54310</value>
  </property>
</configuration>

/opt/hadoop/hdfs-site.xml

By changing this to 3 we will ensure that we have data locally on each node when we add new files (assuming we have 3 nodes in our cluster)

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

/opt/hadoop/yarn-site.xml

<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>node1:8025</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>node1:8035</value>
</property>
<property>
  <name>yarn.resourcemanager.address</name>
  <value>node1:8050</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

Clone SD-Card

Use Win32 imager or any other software of your choice to clone the sd-card. After clone ensure to configure each node with new hostname and ip-address.

Also ensure to configure /etc/hostname and /etc/dhcpcd.conf on each node correctly with IP/hostname.

Login with ssh to all nodes to fix host key verification

Logon with ssh to all nodes from namenode (node1). Enter Yes when you get the “Host key verification failed message”. This is important in order for namenode to be able to communicate with the other nodes without entering a password.

 Start services

Run the following commands as hduser:

start-dfs-sh
start-yarn.sh

Run jps command on all nodes to ensure all services started fine. On the data/processing nodes you should have the following services:

hduser@node2:~ $ jps
1349 DataNode
1465 NodeManager
1582 Jps

And on the namenode (node1) you should have the following:

hduser@node1:/home/pi $ jps
3329 ResourceManager
2851 NameNode
3435 NodeManager
3115 SecondaryNameNode
3742 Jps
2959 DataNode

Performance comparison to Raspberry PI 1 Model B (512mb)

Run wordcount as done in previous step. As we can see there is a significant difference using the newer Raspberry PI 2:

File Size Using Raspberry PI 1 Raspberry PI 2 (this setup – 3 nodes)
smallfile.txt 2MB 2:17 1:41
mediumfile.txt 35MB 9:19 2:22
(larger file) 431MB N/A 17:10

For fun I also tried to include a larger file to see if the PI’s could put up with it. They did fine but I got some containers killed due to memory issues but that could probably be tweaked further.

Happy data crunching! 🙂

References

http://www.instructables.com/id/Native-Hadoop-260-Build-on-Pi/

http://stackoverflow.com/questions/32823375/compiling-hadoop-2-7-1-on-arm-raspbian

http://www.becausewecangeek.com/building-a-raspberry-pi-hadoop-cluster-part-1/

https://www.mapr.com/blog/managing-monitoring-and-testing-mapreduce-jobs-managing-jobs-and-tasks

http://arturmkrtchyan.com/how-to-setup-multi-node-hadoop-2-yarn-cluster

Update url and server references in SharePoint content using PowerShell

Update url and server references in SharePoint content can be quite challenging when you a lot of your Excel Services Reports, Reporting Services Reports and Data Connections (ODC, PPSDC and RSDS files). This post assumes you have migrated to a new SharePoint server with new hostname/dns name and have new database and analysis services servers which are used as source for reports within your SharePoint Business Intelligence portal. The task of manually go through all report and update database and server references can be a quite tedious task. To our rescue: PowerShell! I have put together a PowerShell script which can update url and server references in SharePoint content described below:

Instructions

To control which action(s) the script should do there are a few variables in the beginning of the script that needs to be configured correctly. You will also have to configure how your server names and url’s should be replaced. The script only updates references/text in Office Open Document format Excel (XLSX) files. Hence if you have old XLS files you will first need to download them to disk. Then convert them to XLSX and upload them to your SharePoint portal again. This process is also partly automated using multiple scripts. The SP_UpdateUrlRef script can download the XLS files and upload the converted XLSX files. The Convert_XLS script can automate the conversion but has to be run on a computer with Microsoft Office 2013 (only version I have tested).

PowerShell script execution steps

  1. Download powerShell script zip package (includes update script (SP_UpdateUrlRef) and XLS conversion script (Convert_XLS) and DocumentFormat.OpenXml.dll)
  2. Open SP_UpdateUrlRef and configure variables as described in PowerShell script parameters. Review the script and what it does and make sure to modify it to fit your needs and environment.
    I do not take any responsibility or any kind of warranty for any damage or issue that might be caused or related to the script.
  3. If you have old XLS files in your SharePoint portal configure the SP_UpdateUrlRef script to only download these to a folder using $downloadXLS=$true and other variables set to $false. Otherwise you may go to step 7.
  4. Edit the Convert_XLS PowerShell script to use same XLS folder as you did in the previous script
  5. Run Convert_XLS  to convert the XLS files to XLSX (Note that you will need to have Office 2013 installed on the machine where you run this script)
  6. If you converted old XLS files configure the SP_UpdateUrlRef script to only upload these to a folder using $uploadConvertedXLSFiles=$true and other variables set to $false. You may optionally set $deleteOldXLSFileAfterUpload = $false to delete the old XLS files.
  7. Set $deleteOldXLSFileAfterUpload and $downloadXLS to $false. Set the “run for” variables that you wish to use to $true.
  8. Run the script and review if any issues did occur

PowerShell script parameters

Parameter Description
$runForText Use configuration in $replaceItems to replace text within ODC, RSDS and PPSDC files. (Can easily be extended for other text files as well if you look through conditions in GetFilesFromSite function.
$runForExcelWebParts  Uses configuration in $replaceItems to replace text in Excel Web Access Web Parts Workbook Url property:
Excel Web Access Web Part Url Reference
$runForExcelXLSX Use configuration in $replaceItems to replace text in Excel workbook connection string. Uses configuration in $pureUrlReplaceItems to replace text within Excel workbook connection file property:
excel connection string odc cube
*Connection file propery is usually an Url pointing to a ODC file within your SharePoint site
$runForSSRS  Use configuration in $ssrsDSMappings to match Report(s) data source names with Url of SSRS data connection file. When RDL report  files are migrated to a new environment the old Url cannot be found and will become blank in new enivironment:
sharepoint manage ssrs data sources
ssrs data source url sharepoint
$xlsWorkDirRoot  Path to directory where XLS/XLSX files are downloaded/uploaded from
$downloadXLS  If true download all found XLS and XLM files to path defined in $xlsWorkDirRoot
$uploadConvertedXLSFiles  If true upload all XLSX/XLSM files from $xlsWorkDirRoot to original location (matching on file name and document library name without file extension)
$deleteOldXLSFileAfterUpload  If true old XLS files will be deleted when new XLSX files are uploaded
$SitesToIgnore  List of sites to ignore (based on site title)
$webAppUrls  List of web applications to iterate
$reportServerUri  Reporting Services Web Service (http://newportal.com/_vti_bin/ReportServer/ReportService2010.asmx?WSDL)
$ssrsReportDocLib  Root location for service $reportServerUri to look for reports (/ for root directory or http://newportal.com/site/doclib)
$username  User to access Reporting Services Web Service
$password  Password for above
$openXMLLibDir  Location of DocumentFormat.OpenXml.dll

Convert_XLS.ps1

PowerShell script to convert old XLS files to newer Microsoft Open Document format (XLSX).

Download PowerShell script (bundle)

Function SetEnglishCulture {
    $cultureName = "en-US"

	$currentThread = [System.Threading.Thread]::CurrentThread 
	$culture = [System.Globalization.CultureInfo]::CreateSpecificCulture($cultureName) 
	$currentThread.CurrentCulture = $culture 
	$currentThread.CurrentUICulture = $culture 
	Write-Host Culture chnaged to: $cultureName -ForegroundColor Green
}

# Run once again if this does not work...
SetEnglishCulture

$xlFixedFormat = [Microsoft.Office.Interop.Excel.XlFileFormat]::xlOpenXMLWorkbook
write-host $xlFixedFormat
$excel = New-Object -ComObject excel.application
$excel.visible = $true
$folderpath = "C:\SP_UpdateUrlRef\FilesToConvert"
$filetype ="*xls"
Get-ChildItem -Path $folderpath -Include $filetype -recurse |
ForEach-Object `
{
	$path = ($_.fullname).substring(0, ($_.FullName).lastindexOf("."))
	
	"Converting $path"
	$workbook = $excel.workbooks.open($_.fullname)
	
	$path += ".xlsx"
	$workbook.saveas($path, $xlFixedFormat)
	$workbook.close()
	
	$oldFolder = $path.substring(0, $path.lastIndexOf("\")) + "\_original_xls"
	
	write-host $oldFolder
	if(-not (test-path $oldFolder))
	{
		new-item $oldFolder -type directory
	}
	
	move-item $_.fullname $oldFolder
	
}
$excel.Quit()
$excel = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()

SP_UpdateUrlRef.ps1

PowerShell script to update url and server references in SharePoint content. Useful after change of database or analysis services (olap) server or if SharePoint server have been migrated to new domain.

Download PowerShell script (bundle)

# Script to Update url and server references in SharePoint content after SharePoint/sql server/sql analysis server migration
#
# Jonas Widriksson, 2015
#
# The script handles the following scenarios:
# - Search and replace text in files (ODC, RSDS (and whatever extension you like to include)
# - Search and replace text in Excel files embedded connection string
# - Search and replace text in Excel ODC Url connection string
# - Search and replace text Excel Viewer WebPart Url (reference to Excel file)
# - Map SSRS report data sources to mapping list below and assign the correct shared data source

# Configuration variables
# -----------------------

# Run for:
$runForText = $false
$runForExcelWebParts = $false
$runForExcelXLSX = $true
$runForSSRS = $false

#XLS Management
$xlsWorkDirRoot = "C:\SP_UpdateUrlRef\FilesToConvert"
$downloadXLS = $false
$uploadConvertedXLSFiles = $false
$deleteOldXLSFileAfterUpload = $false

#Sites to ignore by site title
$SitesToIgnore = "John Doe", "Special Site"

# List of site collections to run script on:
$webAppUrls = "http://mysite.newdomain.net" # "http://mysite.newdomain.net"; #"http://newportal.newdomain.net" #"http://newportal.newdomain.net" , "http://mysite.newdomain.net";

# Url of reporting services webservice and site collection to run ssrs mapping on:
$reportServerUri = "http://newportal.newdomain.net/_vti_bin/ReportServer/ReportService2010.asmx?WSDL"

# Url to ssrs site/doclib where to look in all children for ssrs reports
$ssrsReportDocLib = "http://newportal.newdomain.net/site/doclib"

# Used to authenticate with SSRS SharePoint webservice
$username = "SP_Farm"
$password = "********"

$openXMLLibDir = "C:\SP_UpdateUrlRef\Lib\DocumentFormat.OpenXml.dll"
$excelWebUIDLL = "C:\Windows\Microsoft.NET\assembly\GAC_MSIL\Microsoft.Office.Excel.WebUI\v4.0_15.0.0.0__71e9bce111e9429c\Microsoft.Office.Excel.WebUI.dll"

# Name of file log:
$logFileName = "output-last-run.csv";

# All textfiles will be search and replaced with the values defined here:
# Note that it will replace items will searcha and replace of replaced values in list order (hence it can be a bit messed up if your connections use both server names and FQDN names)
# This is why sample list below have multiple "invalid" urls at the bottom to address this issue
$replaceItems = @(	
    # OLD_SQL_SERVER
    @{ 
		old="OLD_SQL_SERVER";
		new="OLD_SQL_SERVER.newdomain.net";       
	},    
	@{
		old="10.32.150.2"; 
		new="OLD_SQL_SERVER.newdomain.net";
	},        
    @{
        old="OLD_SQL_SERVER.old-domain.net";
        new="OLD_SQL_SERVER.newdomain.net";
    },

    #OLD_OLAP_SERVER
    @{ 
		old="OLD_OLAP_SERVER";
		new="OLD_OLAP_SERVER.newdomain.net";       
	},    
	@{
		old="10.32.150.3"; 
		new="OLD_OLAP_SERVER.newdomain.net";
	},        
    @{
        old="OLD_OLAP_SERVER.old-domain.net";
        new="OLD_OLAP_SERVER.newdomain.net";
    },

    # OLD_SQL_SERVER (Fix of text fixed above - might be needed)
    @{
        old="OLD_SQL_SERVER.newdomain.net.old-domain.net";
        new="OLD_SQL_SERVER.newdomain.net";
    },
    @{
        old="OLD_SQL_SERVER.newdomain.net.newdomain.net.old-domain.net";
        new="OLD_SQL_SERVER.newdomain.net";
    },
    @{       
        old="OLD_SQL_SERVER.newdomain.net.newdomain.net";
        new="OLD_SQL_SERVER.newdomain.net";
    },
    @{
        old="OLD_SQL_SERVER.newdomain.net.newdomain.net.newdomain.net";
        new="OLD_SQL_SERVER.newdomain.net";
    },

    # OLD_OLAP_SERVER (Fix of text fixed above - might be needed)
    @{
        old="OLD_OLAP_SERVER.newdomain.net.old-domain.net";
        new="OLD_OLAP_SERVER.newdomain.net";
    },
    @{
        old="OLD_OLAP_SERVER.newdomain.net.newdomain.net.old-domain.net";
        new="OLD_OLAP_SERVER.newdomain.net";
    },
    @{             
        old="OLD_OLAP_SERVER.newdomain.net.newdomain.net";
        new="OLD_OLAP_SERVER.newdomain.net";
    },
    @{
        old="OLD_OLAP_SERVER.newdomain.net.newdomain.net.newdomain.net";
        new="OLD_OLAP_SERVER.newdomain.net";
    },
    
    # Force MSOLAP provider version
    #@{
    #    old="MSOLAP.6";
    #    new="MSOLAP";
    #},

    # Special fix for RSDS files, somehow it seems that they get malformed with ??? in beginning of XML after first time script is run. Ensure to apply this to avoid this issue if it happens
    @{
        old="\?\?\?";
        new="";
    }
)

# All SSRS reports (found in webservice/sitecoll defined in reportServerUri) will be assigned rsds file from below:
$ssrsDSMappings = @(
    @{ dsName="OLD_SQL_SERVER"; dsNewUrl="http://newportal.newdomain.net/Data Sources For Reporting Services/OLD_SQL_SERVER.rsds"; },
    @{ dsName="OLD_SQL_SERVER_SupplyChain"; dsNewUrl="http://newportal.newdomain.net/Data Sources For Reporting Services/OLD_SQL_SERVER_SupplyChain.rsds"; },    
    @{ dsName="OLD_SQL_SERVER_Finance"; dsNewUrl="http://newportal.newdomain.net/Data Sources For Reporting Services/OLD_SQL_SERVER_Finance.rsds"; }
)

# Text replace mappings for Excel ODC Connection URL (Not this is not for actual connection string in Excel only ODC Url, 
# The actual connection string should be defined in replaceItems above
# This will also be used for replacement of urls in excel viewer webparts
$pureUrlReplaceItems = @(
    @{
        old="http://oldportal.old-domain.net";
        new="http://newportal.newdomain.net";
    },
    @{
        old="http://oldportal";        
        new="http://newportal.newdomain.net";
    },
    @{
        old="http://10.32.150.1";
        new="http://newportal.newdomain.net";
    },
    
    # XLS -> XLSX (To solve after XLS files have been converted to XLSX)
    @{
         old=".xls";
         new=".xlsx";
    }
)

# Other variables
# -----------------

$global:itemList = New-Object System.Collections.Generic.List[PSObject]

# Methods
# -------

Function CheckOutItem($item) 
{
    if ($item.List.ForceCheckout) { 
        Write-Host "Checking out: " $item.Title -ForegroundColor Gray
        $file.CheckOut() 
    } 
}

Function CheckInItem($item)
{
    if ($item.List.ForceCheckout) { 
        Write-Host "Checking in: " $item.Title -ForegroundColor Gray
        $file.CheckIn($true)

        if ($list.List.EnableVersioning) {
            Write-Host "Publishing: " $item.Title -ForegroundColor Gray
            $file.Publish("File checked/modified by migration script")
        }
    }
}

Function ReplaceTextInFile($item)
{      
    CheckOutItem $item

    $itemUpdated = $false

    $file = $item.File;
    
    $fileContent = [System.Text.Encoding]::ASCII.GetString($file.OpenBinary())
    $newFileContent = $fileContent

    $replaceItems | foreach-object  {      
        $newFileContent = $newFileContent -ireplace $_.old, $_.new
    }
        
    if ($fileContent -ne $newFileContent) {
        Write-Host "Updating: " $item.Url -ForegroundColor Green
        $file.SaveBinary([System.Text.Encoding]::ASCII.GetBytes($newFileContent))
        $itemUpdated = $true
    }
    
    CheckInItem $item

    return $itemUpdated
}

Function ReplaceUrlInExcelWebPArt($item)
{
    CheckOutItem $item

    $itemUpdated = $false

    $webPartManager = $web.GetLimitedWebPartManager($item.Url, [System.Web.UI.WebControls.WebParts.PersonalizationScope]::Shared);
    $webParts = $webPartManager.WebParts;
    foreach($webPart in $webParts) 
    {
        if ($webPart.GetType() -eq [Microsoft.Office.Excel.WebUI.ExcelWebRenderer]) {

            $excelUri = $webPart.WorkbookUri
            $newExcelUri = $excelUri

            Write-Host "Found WorkbookUri: " $excelUri -ForegroundColor Yellow

            $pureUrlReplaceItems | foreach-object  { 
                $newExcelUri = $newExcelUri -ireplace $_.old, $_.new
            }

            if ($excelUri -ne $newExcelUri) {
                Write-Host "Updating: " $item.Url -ForegroundColor Green
                $webPart.WorkbookUri = $newExcelUri;
                $webPartManager.SaveChanges($webPart);
                $itemUpdated = $true
            }
            
        }
    }

    CheckInItem $item

    return $itemUpdated
}

Function ReplaceTextInExcelConnStrings($item)
{    
     $fileStream = $item.File.OpenBinarystream()
     $xlPackage = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($fileStream,$True)

     $doUpdate = $false

     # Check if document has any connections
     if ($xlPackage.WorkbookPart.ConnectionsPart) {


         $connStream = $xlPackage.WorkbookPart.ConnectionsPart.GetStream()
         $xmlDoc = New-Object System.Xml.XmlDocument
         $xmlDoc.Load($connStream) #= [System.Xml.XmlDocument](Get-Content $connStream)
         #$xmlDoc

         $nsManager = [System.Xml.XmlNamespaceManager]$xmlDoc.NameTable;
         $nsManager.AddNamespace("default", $xmlDoc.DocumentElement.NamespaceURI);
         $nodelist = [System.Xml.XmlNodeList]$xmlDoc.SelectNodes("//default:connections/default:connection", $nsManager);                 
         
         foreach ($node in $nodelist) {
            # Fix ODC connection
            if ($node.Attributes["odcFile"]) {               

                $odcFileUrl = $node.Attributes["odcFile"].Value
                $newOdcFileUrl = $odcFileUrl 


                $pureUrlReplaceItems | foreach-object  { 
                    $newOdcFileUrl = $newOdcFileUrl -ireplace $_.old, $_.new
                }

                if ($odcFileUrl -ne $newOdcFileUrl) {
                    Write-Host "Updating: " $item.Url " odc connection string" -ForegroundColor DarkGreen                        
                    $node.Attributes["odcFile"].Value = $newOdcFileUrl
                    $doUpdate = $true
                }                
                
            }
            else {
                Write-Host "Connection: " $node.name " ODC Url will not be modified" -ForegroundColor Gray
            }

            # Fix embedded connection string
            if ($node.dbPr) {
                if ($node.dbPr.attributes["connection"]) {

                    $connStr = $node.dbPr.attributes["connection"].Value
                    $newConnStr = $connStr

                    $replaceItems | foreach-object  { 
                        $newConnStr = $newConnStr -replace $_.old, $_.new
                    }

                    if ($connStr -ne $newConnStr) {
                        Write-Host "Updating: " $item.Url " embedded connection string" -ForegroundColor DarkGreen                        
                        $node.dbPr.attributes["connection"].Value = $newConnStr
                        $doUpdate = $true
                    }
                }
            }          
            else {
                Write-Host "Connection: " $node.name " connection string will not be modified" -ForegroundColor Gray
            }            
         }

         if ($doUpdate) {
            Write-Host "Updating excel file: " $item.Url -ForegroundColor Green

             $streamWriter = new-object System.IO.StreamWriter($xlPackage.WorkbookPart.ConnectionsPart.GetStream([Io.FileMode]::Create))
             $xmlDoc.Save($streamWriter.BaseStream)

             $item.File.SaveBinary([System.IO.Stream]$fileStream)

             $streamWriter.Dispose()
             $xlPackage.Dispose()
         }
     }
     else {
        Write-Host "No connections found in: " $item.Url -ForegroundColor Gray
     }

     return $doUpdate
}

Function LogItem($currentSiteUrl, $currentWebStr, $listStr, $itemUrlStr, $commentStr)
{
    $listItem = New-Object PSObject -Property @{
        webAppUrl = $currentSiteUrl;
        SiteUrl = $currentWebStr; 
        DocumentLibrary = $listStr;        
        Url = $itemUrlStr;
        Comment = $commentStr
        }
    $global:itemList.Add($listItem)
}


Function GetFilesFromSite([Microsoft.SharePoint.SPWeb] $currentWeb) {

    # Check if current web is in ignore list
    $isIgnored = $false

    foreach ($ignoreSite in $SitesToIgnore) {
        if ($currentWeb.Title -eq $ignoreSite) {
            $isIgnored=$true
        }
    }

    if (-not $isIgnored) {    

    $itemsToDelete = @()        

    foreach($list in $currentWeb.lists)    {      
      if ( ($list.BaseType -eq "DocumentLibrary") -and ( ($list.DefaultViewUrl -notlike "*_catalogs*") -and ($list.DefaultViewUrl -notlike "*Style Library*")))
      {     
        try {   
            $itemCount = $list.ItemCount
            $itemCollection = $list.Items

            for ($i=0; $i -le $itemCount; $i++) {
                try {

                $item = $itemCollection[$i]

                    # ODC, RSDS, PPSDC Files
                    if ($runForText) {
                        if ($item.Url -like "*.ppsdc" -or $item.Url -like "*.rsds" -or $item.Url -like "*.odc") {                        
                            Write-Host "Processing file: " $item.Url -ForegroundColor Gray                 
                            $updated = ReplaceTextInFile $item $global:oldText $global:newText

                            if ($updated) {
                                LogItem $currentWeb.Site.Url $currentWeb $list $item "File modified"
                            }
                        }
                    }

                    # Excel Viewer Web Parts
                    if ($runForExcelWebParts) {
                        if ($item.Url -like "*.aspx") {
                            Write-Host "Processing page: " $item.Url -ForegroundColor Gray
                            $updated = ReplaceUrlInExcelWebPArt $item
                    
                            if ($updated) {
                                LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $item.Url "Page ExcelWebRenderer WebPart modified"
                            }
                        }
                    }

                    # Excel Data Connections (embedded in Excel files)
                    if ($runForExcelXLSX) {
                        if ($item.Url -like "*.xlsx" -or $item.Url -like "*.xlsm")  { #-or $item.Url -like "*.xls") {
                            Write-Host "Processing Excel file: " $item.Url -ForegroundColor Gray
                            $updated = ReplaceTextInExcelConnStrings $item

                            if ($updated) {
                                LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $item.Url "Excel connection modified"
                            }
                        }
                    }

                    # Download XLS Files   
                    if ($downloadXLS) {
                        if ($item.Url -ilike "*.xls" -or $item.Url -like "*.xlm") {
                            Write-Host "Downloading old format Excel workbook: " $item.Url -ForegroundColor Gray

                            $isDuplicate = $false
							
							$folderName = "";

							if ($item.File.ParentFolder) {
								$folderName = "/" + $list.Title + "/" + $item.File.ParentFolder
								Write-Host "ParentFolder: " + $folderName -ForegroundColor Yellow
							}
							else {
								$folderName = "/" + $list.Title
							}
							
                            $destDir = $xlsWorkDirRoot +"/" + $currentWeb.Title  + $folderName

                            if (!(Test-Path -path $destDir))
                            {
                                $destCreate = New-Item $destDir -type directory 
                            }

                            Get-ChildItem -Path $destDir -Include $filetype -recurse | ForEach-Object {
                                if ($_.Name -eq $item.File.Name) {
                                    Write-Host "File already exists!" -ForegroundColor Red
                                    LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $item.Url "Duplicate XLS Name!"  
                                    $isDuplicate = $true                          
                                }
                            }

                            if (-not $isDuplicate) {
                                #Download file
                                $binary = $item.File.OpenBinary()
                                $stream = New-Object System.IO.FileStream($destDir + "/" + $item.File.Name), Create
                                $writer = New-Object System.IO.BinaryWriter($stream)
                                $writer.write($binary)
                                $writer.Close()

                                LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $item.Url "Downloaded XLS"       
                            }                     
                        }
                    }

                    # Upload converted XLS (to XLSX) files
                    # This assumes we have *.xlsx files in $xlsWorkDir with same name as XLS files downloaded with $downloadXLS=true
                    # The conversion has to be done manually (or with other script) hence we cannot run both $downloadXLS=true and 
                    # $uploadConvertedXLSFiles at the same time
                    # Download XLS Files   
                    if ($uploadConvertedXLSFiles) {

                    
                        if ($item.Url -like "*.xlsx" -or $item.Url -like "*.xlm") {
                                         
                            Write-Host "Found fixed XLSX workbook: " $item.Url -ForegroundColor Gray
							
							$folderName = "";

							if ($item.File.ParentFolder) {
								$folderName = "/" + $list.Title + "/" + $item.File.ParentFolder								
							}
							else {
								$folderName = "/" + $list.Title
							}
							
                            $xlsWorkDir = $xlsWorkDirRoot +"/" + $currentWeb.Title  + $folderName + "/"

                            if (Test-Path $xlsWorkDir) {
                                Write-Host "Looking for new format XLSX workbook on disk at: " $xlsWorkDir -ForegroundColor Gray

                                Get-ChildItem -Path $xlsWorkDir | ForEach-Object {
                                    if ($_.Name -eq $item.File.Name) {
                                                                    
                                        Write-Host "Found converted XLSX file! Try to upload..." -ForegroundColor Green  
                                    
                                        if ($item.File.ParentFolder) {
                                            $uploadFolder = $item.File.ParentFolder
                                        }
                                        else {
                                            $uploadFolder = $list
                                        }
                                                                        
                                        # Add code to upload da shit!
                                        $doesFileExist = $false
                                        foreach ($f in $uploadFolder.Files) {
                                            if($f.Name -eq $_.Name) {
                                                $doesFileExist = $true
                                            }
                                        }

                                        if ($doesFileExist) {                                        
                                            Write-Host "Note: File already exists will overwrite..." -ForegroundColor Red
                                        }
                                    
                                        #Get the file stream of each file, assign it to a variable.
                                        $Stream = (Get-Item $_.fullname).openread()                                            
                                        $NewFile = $uploadFolder.Files.Add($_.Name, $Stream, $True)
                                            
                                        CheckInItem($NewFile)

                                        #Commit these changes
                                        $NewFile.Update()
                                        Write-Host "Done!" -ForegroundColor Green

                                        LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $item.Url "Uploaded Converted XLSX file"
                                                                                        
                                        if ($deleteOldXLSFileAfterUpload) {
                                            Write-Host "Queue item for delete: " $item.Url -ForegroundColor Gray
                                            $itemsToDelete += $item
                                        }

                                    }
                                }
                            }
                            else {
                                Write-Host "Directory: " $xlsWorkDir " does not exist..." -ForegroundColor White
                            }
                            
                        }
                    }
                    
                }
                catch [Exception] {
                    Write-Host "Error occured and was logged for item: " $item.Url -ForegroundColor Red
                    $comment = "X Error: " + $_.Exception.Message
                    Write-Host $comment
                    LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $item.Url $comment
                }
            
            }
          }
          catch [Exception] {            
            Write-Host "Error occured and was logged for list: " $list.Url -ForegroundColor Red
            $comment = "X List Error: " + $_.Exception.Message
            Write-Host $comment
            LogItem $currentWeb.Site.Url $currentWeb.Url $list.Title $list.Description $comment
          } 
        }
    }

    if ($deleteOldXLSFileAfterUpload) {
        foreach ($item in $itemsToDelete) {
            Write-Host "Deleting old XLS file: " $item.Url -ForegroundColor Yellow
            LogItem $currentWeb.Site.Url $currentWeb.Url $item.File.ParentFolder $item.Url "Deleting old XLS file..."

             $r = $item.Recycle()
                                        
            Write-Host "Done!" -ForegroundColor Yellow
        }
    }
}
else {
    Write-Host "This site is ignored: " $currentWeb.Title -ForegroundColor Yellow
    }
}

Function UpdateReportDataSources()
{
    Write-Host "Updating SSRS reports data sources (using SSRS webservice)"  -ForegroundColor White
    Write-Host "Using SSRS web service: " $reportServerUri -ForegroundColor White

    $secstr = New-Object -TypeName System.Security.SecureString
    $password.ToCharArray() | ForEach-Object {$secstr.AppendChar($_)}
    $cred = new-object -typename System.Management.Automation.PSCredential -argumentlist $username, $secstr
    $Proxy = New-WebServiceProxy -Uri $reportServerUri -Credential $cred
    
    # Gather the webservice types for later use 
    $WebServiceTypes = @{}
    foreach ($Type in $Proxy.GetType().Assembly.GetExportedTypes())
    {
        $WebServiceTypes.Add($Type.Name, $Type.FullName);
    }

    $Proxy.Timeout = 600000

    $ReportItems = $Proxy.ListChildren($ssrsReportDocLib, $true) | Where-Object { $_.Name -Like "*.rdl" }

    foreach ($ReportItem in $ReportItems)
    {
        Write-Host "Processing report: " $ReportItem.Path -ForegroundColor Gray

        $reportDataSources = $Proxy.GetItemDataSources($ReportItem.Path)

        $doDSUpdate = $TRUE;  
        
        try {  
                
            for ($i=0; $i -lt $reportDataSources.length; $i++) {
                $foundDS = $FALSE;

                foreach ($dsMapping in $ssrsDSMappings) {
                    if ($reportDataSources[$i].Name -eq $dsMapping.dsName) {                        
                        $foundDS = $TRUE;

                        # Declare DataSourceReference object
                        $ref = New-Object $WebServiceTypes.DataSourceReference
                        $ref.Reference = $dsMapping.dsNewUrl;

                        # Declare DataSourceReference object
                        $ds = New-Object $WebServiceTypes.DataSource
                        $ds.Name = $dsMapping.dsName
                        $ds.Item = $ref

                        $reportDataSources[$i] = $ds
                    }                
                }

                if (!$foundDS) {                
                    Write-Host "Miss mapping for: " $reportDataSources[$i].Name -ForegroundColor Red
                
                    $comment = "Miss mapping for: " + $reportDataSources[$i].Name
                    LogItem "Problem SSRS Update" "N/A" "N/A" $ReportItem.Path $comment

                    $doDSUpdate = $FALSE                
                }
            }

            if ($doDSUpdate) {
                Write-Host "Updating data source(s) on: " $ReportItem.Path -ForegroundColor Green

                LogItem "SSRS Update" "N/A" "N/A" $ReportItem.Path "Updated data sources"

                $Proxy.SetItemDataSources($ReportItem.Path, $reportDataSources)
            }
            else {
                Write-Host "Skipped update of data source(s) on: " $ReportItem.Path -ForegroundColor Red            
            }
        
        }
        catch [Exception] {
            Write-Host "Error occured and was logged for report/datasource: " $ReportItem.Path -ForegroundColor Red
            $comment = "X SSRS DS Error: " + $_.Exception.Message
            Write-Host $comment
            LogItem "SSRS Update" "N/A" "N/A" $ReportItem.Path $comment
        }
    }


    Write-Host "Logging not used DS mappings"  -ForegroundColor White
    $ssrsDSMappings | ForEach-Object { 
        LogItem "SSRS DS Mapping not used" "N/A" "N/A" $_.dsName $_.dsNewUrl
    }

    $Proxy.Dispose()
}



Function LoadDependencies()
{
    Write-Host "Loading dependencies" -ForegroundColor White

    if ((Get-PSSnapin -Name Microsoft.SharePoint.Powershell -ErrorAction SilentlyContinue) -eq $null) {
        Add-PSSnapin Microsoft.SharePoint.Powershell
    }

    $openXMLlib = [System.Reflection.Assembly]::LoadFrom($openXMLLibDir)
    $excelWebUI = [System.Reflection.Assembly]::LoadFrom($excelWebUIDLL)

    #[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.Office.Excel.WebUI.ExcelWebRenderer')
}

# Script start
# -----------------------------


#[Microsoft.SharePoint.SPSecurity]::RunWithElevatedPrivileges({

    Write-Host "Script started at" (Get-Date).ToString()

    LoadDependencies

    if ($runForText -or $runForExcelWebParts -or $runForExcelXLSX -or $downloadXLS -or $uploadConvertedXLSFiles -or $deleteOldXLSFileAfterUpload) {
        foreach($webAppUrl in $webAppUrls)
        {                        
            $SPWebApp = Get-SPWebApplication $webAppUrl

            Write-Host "Web Application: " $SPWebApp.Url -ForegroundColor Cyan

            foreach ($siteCollection in $SPWebApp.Sites ) {
                 
                # If you would like to limit on certain site/mysite                      
                #if ($siteCollection.Url -ilike "*personal/U123456") {

                    Write-Host "Site Collection: " $siteCollection.Url -ForegroundColor Cyan
        
                    foreach($web in $siteCollection.AllWebs) {
                        Write-Host "Web: " $web.Url -ForegroundColor White
                        LogItem "MySite:" "N/A" "N/A" $web.Title $web.Url
                        GetFilesFromSite $web    
                        $web.Dispose()
                    }        
                #}

                $siteCollection.Dispose()
            }            
        }
    }
    else {
        Write-Host "No variable configured to run action against. Check if statement in script." -ForegroundColor Cyan
    }

    if ($runForSSRS) {
        UpdateReportDataSources
    }

    
    Write-Host "Writing log file: " $logFileName
    $itemList | Select-Object webAppUrl, SiteUrl, DocumentLibrary, Url, Comment | Export-Csv $logFileName -NoType

    # release the array since it will be quite large
    $itemList = $null

    Write-Host "Script finished at" (Get-Date).ToString()
#});


Upload large files to Azure file share

This tutorial will describe howto upload large files to Azure file share using azCopy and PowerShell. First time I tried to upload large files to Azure I was a bit optimistic and tried to copy and paste my files directly through remote desktop. My files where database files with around 200gb of data using an internet connection with about 5mbit/s in upload speed. It did not work out very well and I had issues with timed out/aborted file transfers.  Then I tried to use plain PowerShell and Set-AzureStorageBlobContent to upload my files to an azure file share however it still timed out or got interrupted.

To solve this issue I used an azure file share, PowerShell and and azCopy to do the task. azCopy is a command line utility that lets you copy data to Azure Blob, File, and Table storage. Recent versions also supports to resume failed and interrupted uploads.

This tutorial assumes you already have a working Azure subscription and that you have configured Azure for use with PowerShell:

https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/

Before we can upload any files to Azure we will need to store them somewhere so go ahead and create an Azure file share.

Create Azure file share

azure create storage account

Do not choose any premium storage since it does not work as a file share. Choose a standard storage type. (Standard-RAGRS for example):

azure create storage account choose type

The file share can be used as a network drive on your virtual machine(s). When your file share is created select it and click connect to get the command to mount it as a network drive:

azure create storage account file share connect

If you click properties you will see the URL of the file share:

azure file share url

In order to connect to the file share you will need the access key which can be found on the main page of the storage account under access keys:

azure create storage account find keys

Keep the access key and the URL. It will be needed when we run our PowerShell script and azCopy later on.

Install azCopy

Download and install:

http://aka.ms/downloadazcopy

Run azCopy PowerShell script

azCopy will by default install to C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy. When transferring files azCopy will keep a journal in %LocalAppData%\Microsoft\Azure\AzCopy which can be used to resume failed file transfers.

Modify the script below to fit your needs:

$azCopyPath = "C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy"
Set-Location $azCopyPath

$accountName = "azcopydemoclassic"
$accountKey = "ABC123-YOUR-KEY-GOES-HERE"
$dest = "https://azcopydemoclassic.file.core.windows.net/myfiles/"

$filesToTransfer = 
"C:\Projects\Azure\test1.zip",
"C:\Projects\Azure\test2.zip";

foreach ($file in $filesToTransfer) {	
	$fileItem = Get-ChildItem $file
	$path = $fileItem.DirectoryName + "\"
	$fileName = $fileItem.Name
	Write-Host "Path: " $path -ForegroundColor Yellow
	Write-Host "Filename: " $fileName -ForegroundColor Yellow
	
	$transferResult = .\AzCopy.exe /Source:$path /Pattern:$fileName /Dest:$dest /destkey:$accountKey /Y	
	$transferResult
	
	[int]$f = $transferResult[5].Split(":")[1].Trim()
	$azJournal = "$env:LocalAppData\Microsoft\Azure\AzCopy"
	$i=1
	while ($f -gt 0) {
		$i++
		[int]$f = $transferResult[5].Split(":")[1].Trim()
		$transferResult = .\AzCopy.exe /Z:$azJournal /Y
		$transferResult
		$i
	}	
}

Download script (azCopyToFileShare.ps1)

Azure IO Performance

Cloud services are getting more and more mature. Before it was only a buzz but now everyone wants to get on the train. Flexibility, availability, pricing. So you are thinking about moving that database, datawarehouse or cubes to the cloud? On the Azure pricing list you can clearly see the number of CPU’s, memory, storage etc. What about the underlying disk system? This can be a rather important factor to look up if you have disk intensive ETL (Extract Transform Load) processes or other disk intensive tasks that you intend to move to Azure. How is Azure IO Performance? What is the difference between tiers, A, D and G series? I got a little bit curious and did some quick tests with HDtune on the following virtual machine configurations:

  • Basic Tier, A2 (2 cores, 3,5gb ram, North Europe)
  • Standard Tier, A2 (2 cores, 3,5gb ram, Central US)
  • Standard Tier, D2 (2 cores, 7gb ram, Central US)
  • Standard Tier, G2 (4 cores, 56gb ram, East US 2)

All are virtual machines (VM’s) with Windows 2012 R2 DataCenter default image. Please note that there are heaps of configurations and scenarios to consider for services in Azure. The intention of this post is not to be in anyhow any complete cover for that – just a quick test of disk performance in Azure VM’s. 

Contents

Basic Tier, A2 (2 cores, 3,5gb ram, North Europe)

Azure Basic Tier A2

Standard Tier, A2 (2 cores, 3,5gb ram, Central US)

Azure Standard Tier A2

Standard Tier, D2 (2 cores, 7gb ram, Central US)

Azure Standard Tier D2

Standard Tier, G2 (4 cores, 56gb ram, East US 2)

Azure Standard Tier G2

My laptop

For fun I will include my laptop, a Macbook Pro Core i5-3210m, 16gb, Samsung SSD 840 EVO 500 GB.

My Laptop IO

Conclusion

It seems as if my laptop still outperform most of the Azure VM options. However cloud services are constantly improving so there is probably likely to see faster alternatives in the near future. Also note that the test might not be completely accurate since I only run the tests once on each VM. However concerning the services available it’s important not only to look at the price and the more common specs like CPU, RAM, Disk Size etc. For more information on Azure subscription limits you may want to have a look here:

https://azure.microsoft.com/sv-se/documentation/articles/azure-subscription-service-limits/

Here you can see that the target IOPS for Azure is 300 for Basic Tier and 500 for Standard Tier:

Max 8 KB IOPS per persistent disk (Basic Tier virtual machine) 3002
Max 8 KB IOPS per persistent disk (Standard Tier virtual machine) 5002
Total Request Rate (assuming 1KB object size) per storage account Up to 20,000 IOPS, entities per second, or messages per second
Target Throughput for Single Blob Up to 60 MB per second, or up to 500 requests per second
Target Throughput for Single Queue (1 KB messages) Up to 2000 messages per second
Target Throughput for Single Table Partition (1 KB entities) Up to 2000 entities per second
Target Throughput for Single File Share (Preview) Up to 60 MB per second

 

You may also configure multiple disks and stripe them in order to to improve performance as described here:

http://blogs.technet.com/b/canitpro/archive/2014/11/18/azure-and-disk-iops-and-virtual-machines-in-iaas.aspx

For data warehouse solutions the recent launched service for SQL Data Warehous “elastic data warehouse-as-a-service” might be of interest:

http://azure.microsoft.com/en-us/services/sql-data-warehouse/

 

 

Power BI refresh errors

This article refers to when you use Power BI and the Data Management Gateway. Sometimes I have experienced reports that works locally on my machine I get Power BI refresh errors when they are scheduled in Power BI/Office 365. There seem to be quite a difference on what works locally compared to what works in the cloud. Unfortunately far to often the error that you receive in the Power BI refresh status page is not very descriptive. The errors that I describe in this post has occurred when using Data Management Gateway version 1.2.

General guidelines

Please see my previous post: (Power BI Limitations and Considerations)

http://www.widriksson.com/power-bi-limitations-considerations/

In general make sure that you have data sources that are stated as supported by Microsoft. Do not write any SQL statements in the sources when defining them in Excel and do not use any “fancy” datatypes but try to keep as standardized types as possible. A best practice is to cleanse your data as close to the source as possible.  Also avoid views with complicated queries that might cause the refresh to time out. Currently the limit for the data management gateway is 30 minutes. See here for some more description on SQL statements and unsupported data types:

http://www.widriksson.com/power-bi-limitations-considerations/#Not_supported_to_use_SQL_statements_as_source_in_PowerQuery

and

http://www.widriksson.com/power-bi-limitations-considerations/#Unsupported_data_types

Errors in the high-level relational engine

errors in the high relational engine - power bi

Errors in the high-level relational engine. The following exception occurred while the managed IDataReader interface was being used: ‘Type=System.ArgumentException,Message=The value of prefixLength argument must be 1&#44;2&#44; or 4. Parameter name: prefixLength,Source=Microsoft.DataTransfer.Common,’;transfer service job status is invalid Response status code does not indicate success: 400 (Bad Request).. The current operation was cancelled because another operation in the transaction failed.

In my case this occurred when I used a view as data source in Teradata. The view used LEFT JOIN to merge multiple tables together.

CREATE VIEW Example AS
SELECT
  MyTable1.Name,
  MyTable2.CodeDesc
FROM
  MyTable1
LEFT JOIN MyTable2
ON MyTable1.Code = MyTable2.Code

This worked perfectly fine when I did the data refresh of my report locally however not when uploaded and scheduled for refresh within Power BI in Office 365. First i thought the issue was due to that the LEFT JOIN caused null values and changed it:

CREATE VIEW Example AS
SELECT
  MyTable1.Name,
  COALESCE(MyTable2.CodeDesc, 'Unknown') AS CodeDesc
FROM
  MyTable1
LEFT JOIN MyTable2
ON MyTable1.Code = MyTable2.Code

This did not work either however the error message became slightly different:

error in the blob storage - power bi

Errors in the high-level relational engine. The following exception occurred while the managed IDataReader interface was being used: Unexpected data read from blob.. The current operation was cancelled because another operation in the transaction failed.

The problem seems to be related to that Power BI will not be able to tell the datatype of the column from MyTable2.CodeDesc in the view. In my case the CodeDesc was a varchar(50) in the MyTable2 so not any special kind of datatype. And also i was confused since it worked properly when refresh locally on my computer. When i changed the view to cast the CodeDesc into varchar(50) in the view it worked and refreshed successfully:

CREATE VIEW Example AS
SELECT
  MyTable1.Name,
  CAST(MyTable2.CodeDesc AS VARCHAR(50)) AS CodeDesc
FROM
  MyTable1
LEFT JOIN MyTable2
ON MyTable1.Code = MyTable2.Code

ExcelServices operation ended with error

excel services operation ended with error - power bi

ExcelServices operation ended with error: Correlation ID: <GUID>

When this error occurred I had an Excel spreadsheet that had a standard Excel PivotTable that was linked to a PowerPivot model. From the standard PivotTable I created an Excel table that I added to Power Pivot (linked table). Then I added this table from Power Pivot into a new PowerView report. This worked fine on my local machine but not when trying to schedule report for Power BI refresh in Office 365 as seen in the screenshot above.

Access Office 365 list with Power BI

This post will describe how you can access an Office 365 list with Power BI using the Office 365/SharePoint REST API. This can be a very useful in a scenario where we have forms that stores data in lists or other information in our Office 365 environment that we would like to do reporting on. When I first tried to access an Office 365 list with Power BI using the REST API I was somewhat confused of which type of connection I should use in Excel/Power BI. In this guide I will show some of the pitfalls trying out the different options as well as the solution that works with Power BI and scheduled data refresh.

If you only want the solution scroll to the end of page.

Contents

Different sources in Excel to import data

Within Excel there are three different places where you could use OData/web services to import data:

In PowerQuery:

Here we have two options, one to import from SharePoint list and on to import from OData feed:

import office 365 sharepoint list power query

In PowerPivot:

Import OData feed webservice Power Pivot

In the data tab of Excel:

Import office 365 sharepoint list excel data tab

 

Office 365 REST Service URL

To make it a bit more confusing there are also different approaches to access list data within SharePoint 2013 and Office 365 REST servce. One is through the SharePoint 2013 newer REST service or using ListData.svc which still is there for backward compatibility to SharePoint 2010.

SharePoint 2013 REST service url

http://server/site/_api/lists/getbytitle('listname')/items

Note the /items at the end which tells the service that it is the list items we want and not just the list itself. Also you might want to change the url to use HTTPS.

More information:

https://msdn.microsoft.com/en-us/library/office/fp142380.aspx

SharePoint 2010 REST service url

http://server/site/_vti_bin/ListData.svc

More information:

https://msdn.microsoft.com/en-us/library/office/hh134614(v=office.14).aspx

Which method work?

PowerQuery with from SharePoint list – No

What would be natural to people I guess would be to use “From SharePoint List” using Power Query (which is also hardly promoted by Microsoft to user for Power BI). This was also what I tried first:

powerquery sharepoint list error

It does not work.

Note: It might be possible to get this working if you fiddle around in the advanced settings of the connection with the security settings. However this is not something we go in to here. We want something easy.

PowerQuery with OData Feed – No

When you do the connection you may user either the SharePoint 2010 or SharePoint 2013 REST API (they are both available in Office 365). After you entered the correct url (and make sure to use your organisational account to login when you get the configuration dialog) the data will load nicely into Power Query. So far so good. Yeah, you thought! When you upload it into the Power BI site, enabled the report and scheduled it for refresh you will get this:

onpremise error the data source for this connection error

And if you ignore the error and still try to refresh it you will get this:

powerquery odata feed error

 

The solution – OData Feed from Excel Data Tab

Use the Office 365/SharePoint 2013 REST API and change the URL/listname to match your environment:

http://server/site/_api/lists/getbytitle('listname')/items

Note: It will work with the older SharePoint 2010 ListData.svc REST service as well. However use this one since this is a newer approach and more future proof.

Step 1

Import office 365 sharepoint list excel data tab

Step 2

data tab excel odata feed connection

Step 3

data tab excel odata feed connection

Step 4

data tab excel odata feed connection

Step 5

Define your PowerPivot/transform your data in Excel and create your report.

Step 6

Upload the Report to Office 365 and enable it in Power BI.

Step 7

data tab excel odata feed connection

Enable the report for scheduled refresh and configure date/time.

Note that you will get the error message as seen in screenshot above when you test the connection – IGNORE IT!

Step 8

data tab excel odata feed connection

Save and refresh the report! Success we have now connected an Office 365 list with Power BI! It is a bit ridiculous that it is not more straight forward and easy to achieve this without error messages etc… Let’s hope Microsoft improve things like this in the next release of Power BI.

 

 

 

Power BI limitations & considerations

Please note that this post is regarding Power BI for Office 365 (the old experience) which will be deprecated by March 31, 2016. Although some some content might still be relevant.

This is thought to be a technical introduction to Power BI limitations & considerations which hopefully will allow you to avoid most common implementation pitfalls. Please take into consideration that Power BI is in a fast-paced development and that features and functionality might change in near future.

Contents

Required Services

The current version of Power BI (2015-03-08) is available in two subscription levels. Either as an add-on to an existing E3/E4 Office 365 subscription or as a stand alone subscription which includes Office 365 as well.

Read more about Power BI licensing options and pricing here:
https://www.powerbi.com/dashboards/pricing

Difference between Office 365 and SharePoint online?

You may wonder what the difference is between Power BI and Office 365/SharePoint online (Office 365 includes a SharePoint Online subscriptions plus additional features). At first it might not look as that much difference except the more visual and interactive Power BI site however are there are a few important differences:

Feature Office 365/SharePoint Online Power BI
Access on-premise data No Yes
Max Excel Services workbook size
(If file is larger user may still download the file and use it in the desktop version of Excel)
10mb 250mb
(250 for Excel file content and 250mb for PowerPivot Data Model)
Scheduled Data Refresh No Yes
Power BI Q&A
(Ask questions with “natural language” and get answers on the fly. Works but is still somewhat limited in many business cases and also requires well named and structured data)
No Yes
Access through windows store app (works on windows 8 based tablets and PC’s) No Yes

More on differences between Power BI and Office 365/SharePoint online:

https://technet.microsoft.com/en-us/library/dn198235.aspx

https://support.office.com/en-sg/article/File-size-limits-for-workbooks-in-SharePoint-Online-9e5bc6f8-018f-415a-b890-5452687b325e?ui=en-US&rs=en-SG&ad=SG

Required Software

  • Office 2010 Professional
  • Office 2013 Professional
  • Office 365 Pro Plus
    • Includes additional features such as adding synonyms to your PowerPivot data model for discovery in Power BI Q&A

You will also need to ensure that you have the following add-ins installed and enabled in Excel to be able to use all features of Power BI:

  • PowerPivot (required by PowerView and used as data & analytics engine withing Excel)
  • PowerView (required to build highly interactive and visual reports/dashboards)
  • PowerQuery (required for import of certain data sources, see next section)
  • PowerMap (only for desktop use in Excel at the moment)

To view current enabled add-ins in Excel click file -> options -> add-ins -> choose manage: com add-ins -> Go…

Power BI Enable Plugins Excel

For Office 2010 you will need to download and install all plugins above from Microsoft web site.

As default only PowerPivot, PowerView are installed with Excel 2013 but are NOT enabled. If you have Office 2013 or Office 365 Pro Plus PowerMap might also be installed but also needs to be enabled.

To install PowerQuery for Office 2013 go to the following link:

http://www.microsoft.com/en-us/download/details.aspx?id=39379

Power BI Data Management Gateway

To source on-premise data to the cloud a Data Management Gateway (DMG) is required. This is an intermediate software that runs on a server inside your corporate network that sources data from internal resources to the Power BI cloud.

The DMG will copy your on premise data to the cloud. The data is always stored within your Excel documents. When using PowerPivot models there is a “hidden” tabular engine in the cloud that will use the data from the uploaded and enabled Excel files.

PowerBI Data Management Gateway Architectue

You can read more about how to install the data management gateway and configure data sources here:

https://support.office.com/en-nz/article/Power-BI-for-Office-365-Admin-Center-Help-5e391ecb-500c-47a3-bd0f-a6173b541044?ui=en-US&rs=en-NZ&ad=NZ

Important things to consider when installing the data management gateway

  • Installation account and service account that runs the data management gateway service requires full administrative access to the server
  • Windows firewall must be enabled for installation and uninstallation. Yes this is stupid but this is the way it is. (DMG version 1.2 and 1.4)
  • Power BI Data Management Gateway TCP ports:
    Outgoing TCP ports: 9350-9354 (Falls back to port 443/80 on failiure)
    OData feeds require incoming ports 8050 or 8051 to be open by default

Although the installation and configuration might be straight forward many enterprise organisations may have security rules/process that affects the requirements above.

Supported data sources & scheduled data refresh

In Power BI you have the ability to schedule data sources for automatic refresh on a daily or weekly basis. At the time of writing scheduled data refresh is supported only for the following data sources:

Data Source Location Auto-Refresh Data Management Gateway Required
SQL Azure Database Azure Yes No
Azure VM running SQL Server Azure Yes Yes/No[1]
SQL Server (2005 and later) On-Premises Yes Yes
Oracle (10g and later) On-Premises Yes Yes
OData feed[2] Web/On-Premises/Azure Yes No
Power Query[3][4] On-Premises/Azure Yes Yes

[1] Azure VM may be configured with access to SQL by using a public IP and adding an exception in firewall/network configuration. However when configured as a virtual network with a private IP address data management gateway is required. For a production environment use data management gateway with a secure connection.

[2] Note that no gateway is required for OData feed. However if you connect to an OData feed through PowerQuery a data management gateway is required. Hence if you like to access an OData feed on the internet or on your Office 365 site do NOT use PowerQuery.

[3] All data sources in PowerQuery must be hosted on the same data management gateway.

[4] Read here for more details on supported Power Query data sources: https://support.office.com/en-au/article/Supported-Data-Sources-and-Data-Types-cb69a30a-2225-451f-a9d0-59d24419782e

To make things even more confusing there are three places in Excel to import data:

  • Data Tab
    power bi import data sources data tab
  • Within PowerPivot
    power bi import data sources powerpivot
  • PowerQuery tab
    power bi import data sources power query tab

As long as you stick to the supported data sources you should not have any problem to use any of the three different methods. Power BI will use the connection string that you find under data tab -> connections.

Before going ahead. Carefully plan which data sources you intend to use and DO READ the information in the following links.

Supported data sources for scheduled data refresh:

https://support.office.com/en-au/article/Supported-Data-Sources-and-Data-Types-cb69a30a-2225-451f-a9d0-59d24419782e

Power Query pre-requisites:

https://support.office.com/en-us/article/Data-source-prerequisites-6062cf52-c764-45d0-a1c6-fbf8fc05b05a?CorrelationId=c378d239-021b-4a58-8711-d4058d315f69&ui=en-US&rs=en-US&ad=US

Power BI max workbook size

Beware of that Power BI currently does not allow files larger than 250mb. Actually you are allowed to have 250mb of workbook content and 250mb of PowerPivot data in your workbooks.
Power BI 250mb workbook size limit

 

Tip: To avoid building unnecessary large workbooks load data directly into Power Pivot without saving it into an actual Excel sheet first. In Power query right click connection choose load to… then chose only load data to data model:
powerquery load data to model

Power BI data source limitations and issues

Not able to change data source type in Power BI report

Keep in mind that once you have chosen a data source there is no support to change to another data source type without rebuilding the connection which also will force you to rebuild the report and data models. As for example if you have created a SQL connection using the data tab in Excel to a Microsoft SQL server instance you are not able easily switch this to a PowerQuery Teradata connection.

Not supported to use SQL statements as source in PowerQuery

When building your query use the application interface (navigation) to select a table or view. Writing custom SQL statements is currently not supported.

power query navigation

Use Power Query navigation when creating a connection

Error message received when trying to create a PowerQuery connection that uses a SQL statement as source definition:

“Unsupported data source types detected”

unsupported data source types detected

Unsupported data types

Also be careful by using data types that are not supported for example hour-to-minute and other time related data types in Teradata. Read more here:

https://support.office.com/en-au/article/Supported-Data-Sources-and-Data-Types-cb69a30a-2225-451f-a9d0-59d24419782e

Error message received when trying to refresh a report with unsupported data source types:

“The following system error occurred: Type mismatch. Errors in the back-end database access module. OLE DB was unable to convert a value to the data type requested for column ‘XX’. The current operation was cancelled because another operation in the transaction failed.”

power bi data refresh type mismatch

Fields in Power View reports are not updated when changed in PowerPivot

When you modify a field in PowerPivot e.g. change a field name this change does not go through to your Power View report. Instead it gets removed from the reports. This means that for any name changes you do within PowerPivot you have to re-add those fields to your report.

Not able to update fields in PowerPivot when using PowerQuery

If you make changes to a source table or view those fields get detected and added to your PowerQuery connection when refreshed from within PowerQuery. However this does NOT propagate properly to PowerPivot. Hence in order to get your added or updated field into PowerQuery you will need to remove your PivorPivot table and re-add it. This implies that all your changes for that table in PowerPivot has to be re-developed. This also forces you to re-add all fields affected Power View reports (as describe in previous section). This is massively annoying however Microsoft states that they are aware and currently working on a fix for this.

Note: If you try to edit the data source in Power Pivot you will get the following error message
“The connection you’re trying to edit was created with Power Query. To change this connection, use Power Query”. 

to change this connection use power query

Scheduled data refresh max time span

In the current version of Power BI you are only able to schedule data refresh 3 months into the future. Hence you will manually need to update reports 3 months after schedule which is rather inconvenient. Note that the little forward button is grayed out in date selector of the image below which is a screenshot from data refresh page in Power BI. There is also no way to trick the date selector by using JavaScript or inject any other value into the date text-box. (It will fail to save the schedule)

Power BI maximum month scheduled data refresh

 

Not able to cancel a scheduled data refresh

When you have scheduled a data refresh you are not able to turn it off by default. Or you have a switch to turn it off but there is a bug preventing the save button to show so you are not able to save your changes. Resolution: Use a document object model  inspector or other browser developer tool to enable the disabled save button and you are able to save your changes.

Not able to disable a report once enabled in Power BI

Currently there does not seem to be any function to disable a report once it has been enabled. The only option is to delete the report and upload it again:

power bi not possible disable report

Platform support

Usually you would visualize Power BI reports with Power View which currently is supported for by the following platforms and browsers:

Windows 8 with:
• Internet Explorer 10+ desktop
• Internet Explorer 10+ modern UI

Windows 8.1 with:
• Internet Explorer 11 desktop
• Internet Explorer 11 modern UI

Surface Pro with:
• Internet Explorer 10+ desktop

Windows 7 with:
• Internet Explorer 10+

Note: There is an HTML5 version of Power View available. This version could theoretically work on any device with a HTML5 compatible browser. However it currently has at least the following limitations and bugs:

  • Using iPad (any version) and Safari the browser crashes when using zoom (as by zooming with fingers on the touch screen) withing browser window. Using alternative browser such as Google Chrome makes the experience better.
  • Bubble chart diagram is not able to play using the play axis
  • Text-based filter miss search boxes which makes it difficult for users to filter on larger data-sets. Either the user has to scroll a lot to find desired item to use as filter however this is also limited since the filter list will not show all items if to large data-set.
  • Text and design of Power View reports may differ from silverlight version hence you might need to adjust design to make it work desirable in HTML5 mode.

Future & opportunities – Power BI preview

Microsoft recently released the preview version of Power BI which will add a number of interesting features which among others are:

  • SSAS Connector
    You will no longer be limited to copy local data to the cloud but be able to query an on-premise SSAS server directly through the SSAS Connector which works similar as the DMG but live.
  • App for IOS and Android
  • Power BI designer
    New application to create reports and dashboards without Excel
  • New chart components (speed meters etc…)
  • Interactive dashboards linking to underlying reports

You might expect that all previous features of Power BI will be implemented in the new version however that does currently not seem to be the case. For example I have noticed the following with the Power BI preview:

  • The right filter pane of Power View reports is not available in the new iPad app.
  • Teradata data sources through Power Query is not supported for scheduled data refresh
  • There are probably other limitations as well. I have tried to ask Microsoft about if filter pane and Teradata data sources will be supported in the next general availability release of Power BI but has not received any response yet.

Checkout more on:

https://www.powerbi.com/dashboards

 

 

Raspberry PI Hadoop Cluster

hadoop pi boardsIf you like Raspberry Pi’s and like to get into Distributed Computing and Big Data processing what could be a better than creating your own Raspberry Pi Hadoop Cluster?

The tutorial does not assume that you have any previous knowledge of Hadoop. Hadoop is a framework for storage and processing of large amount of data. Or “Big Data” which is a pretty common buzzword those days. The performance of running Hadoop on a Rasperry PI is probably terrible but I hope to be able to make a small and fully functional little cluster to see how it works and perform.

For  a tutorial on Hadoop 2 please see my newer post:
http://www.widriksson.com/raspberry-pi-2-hadoop-2-cluster/

In this tutorial we start with using one Raspberry PI at first and then adding two more after we have a working single node. We will also do some simple performance tests to compare the impact of adding more nodes to the cluster. Last we try to improve and optimize Hadoop for Raspberry Pi cluster.

Contents

Fundamentals of Hadoop

What is Hadoop?

“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

– http://hadoop.apache.org/

Components of Hadoop

Hadoop is built up by a number of components and Open Source frameworks which makes it quite flexible and modular. However before diving deeper into Hadoop it is easier to view it as two main parts – data storage (HDFS) and data processing (MapReduce):

  • HDFS – Hadoop Distributed File System
    The Hadoop Distributed File System (HDFS) was designed to run on low cost hardware and is higly fault tolerant. Files are split up into blocks that are replicated to the DataNodes. By default blocks have a size of 64MB and are replicated to 3 nodes in the cluster. However those settings can be adjusted to specific needs.Overview of HDFS File System architecture:
    Hadoop HDFS
  • MapReduce
    MapReduce is a software framework written in Java that is used to create application that can process large amount of data. Although its written in Java there are other languages available to write a MapReduce application. As with HDFS it is built to be fault tolerant and to work in large-scale cluster environments. The framework have the ability to split up input data into smaller tasks (map tasks) that can be executed in parallel processes. The output from the map tasks are then reduced (reduce task) and usually saved to the file system.Below you will see the MapReduce flow of the WordCount sample program that we will use later. WordCount takes a text file as input, divides it into smaller parts and then count each word and outputs a file with a count of all words within the file.

    MapReduce flow overview (WordCount example):
    Hadoop MapReduce WordCount

Daemons/services

Daemon/service Description
NameNode Runs on a Master node. Manages the HDFS file system on the cluster.
Secondary NameNode Very misleading name. It is NOT a backup for the NameNode. It make period checks/updates so in case the NameNode fails it can be restarted without the need to restart the data nodes. – http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F
JobTracker Manages MapReduce jobs and distributes them to the nodes in the cluster.
DataNode Runs on a slave node. Act as HDFS file storage.
TaskTracker Runs MapReduce jobs which are received from the JobTracker.

Master and Slaves

  • Master
    Is the node in the cluster that has the namenode and jobtracker. In this tutorial we will also configure our master node to act as both master and slave.
  • Slave
    Node in the cluster that act as a DataNode and TaskTracker.

Note: When a node is running a job the TaskTracker will try to use local data (in its “own” DataNode”) if possible. Hence the benefit of having both the DataNode and TaskTracker on the same node since there will be no overhead network traffic. This also implies that it is important to know how data is distributed and stored in HDFS.

Start/stop scripts

Script Description
start-dfs.sh Starts NameNode, Secondary NameNode and DataNode(s)
stop-dfs.sh Stops NameNode, Secondary NameNode and DataNode(s)
start-mapred.sh Starts JobTracker and TaskTracker(s)
stop-mapred.sh Stops JobTracker and TaskTracker(s)

The above scripts should be executed from the NameNode. Through SSH connections daemons will be started on all the nodes in the cluster (all nodes defined in conf/slaves)

Configuration files

Configuration file Description
conf/core-site.xml General site settings such as location of NameNode and JobTracker
conf/hdfs-site.xml Settings for HDFS file system
conf/mapred-site.xml Settings for MapReduce daemons and jobs
conf/hadoop-env.sh Environment configuration settings. Java, SSH and others
conf/master Defines master node
conf/slaves Defines computing nodes in the cluster (slaves). On a slave this file has the default value of localhost

Web Interface (default ports)

Status and information of Hadoop daemons can be viewed from a web browser through web each dameons web interface:

Daemon/service Port
NameNode 50070
Secondary NameNode 50090
JobTracker 50030
DataNode(s) 50075
TaskTracker(s) 50060

hadoop cluster in a shoeboxThe setup

  • Three Raspberry PI’s model B
    (Or you could do with one if you only do first part of tutorial)
  • Three 8GB class 10 SD cards
  • An old PC Power Supply
  • An old 10/100 router used as network switch
  • Shoebox from my latest SPD bicycle shoes
  • Raspbian Wheezy 2014-09-09
  • Hadoop 1.2.1
Name IP Hadoop Roles
node1 192.168.0.110 NameNode
Secondary NameNode
JobTracker
DataNode
TaskTracker
node2 192.168.0.111 DataNode
TaskTracker
node3 192.168.0.112 DataNode
TaskTracker

Ensure to adjust names and IP numbers to fit your enivronment.

Single Node Setup

Install Raspbian

Download Raspbian from:
http://downloads.raspberrypi.org/raspbian_latest

For instructions on how to write the image to an SD card and download SD card flashing program please see:
http://www.raspberrypi.org/documentation/installation/installing-images/README.md

For more detailed instructions on how to setup the Pi see:
http://elinux.org/RPi_Hub

Write 2014-09-09-wheezy-raspbian.img to your SD card. Insert the card to your Pi, connect keyboard, screen and network and power it up.

Go through the setup and ensure the following configuration or adjust it to your choice:

  • Expand SD card
  • Set password
  • Choose console login
  • Chose keyboard layout and locales
  • Overclocking, High, 900MHz CPU, 250MHz Core, 450MHz SDRAM (If you do any voltmodding ensure you have a good power supply for the PI)
  • Under advanced options:
    • Hostname: node1
    • Memory split: 16mb
    • Enable SSH Server

Restart the PI.

Configure Network

Install a text editor of your choice and edit as root or with sudo:
/etc/network/interfaces

iface eth0 inet static
address 192.168.0.110
netmask 255.255.255.0
gateway: 192.168.0.1

Edit /etc/resolv.conf and ensure your namesservers (DNS) are configured properly.

Restart the PI.

Configure Java Environment

With the image 2014-09-09-wheezy-raspbian.img Java comes pre-installed. Verify by typing:

java -version

java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) Client VM (build 25.0-b70, mixed mode)

Prepare Hadoop User Account and Group

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo

Configure SSH

Create SSH RSA pair keys with blank password in order for hadoop nodes to be able to talk with each other without prompting for password.

su hduser
mkdir ~/.ssh
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

Verify that hduser can login to SSH

su hduser
ssh localhost

Go back to previous shell (pi/root).

Install Hadoop

Download and install

cd ~/
wget http://apache.mirrors.spacedump.net/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz
sudo mkdir /opt
sudo tar -xvzf hadoop-1.2.1.tar.gz -C /opt/
cd /opt
sudo mv hadoop-1.2.1 hadoop
sudo chown -R hduser:hadoop hadoop

Configure Environment Variables

This configuration assumes that you are using the pre-installed version of Java in 2014-09-09-wheezy-raspbian.img.

Add hadoop to environment variables by adding the following lines to the end of /etc/bash.bashrc:

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin

Alternative you can add the configuration above to ~/.bashrc in the home directory of hduser.

Exit and reopen hduser shell to verify hadoop executable is accessible outside /opt/hadoop/bin folder:

exit
su hduser
hadoop version

hduser@node1 /home/hduser $ hadoop version
Hadoop 1.2.1
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /opt/hadoop/hadoop-core-1.2.1.jar

Configure Hadoop environment variables

As root/sudo edit /opt/hadoop/conf/hadoop-env.sh, uncomment and change the following lines:

# The java implementation to use. Required.
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=250

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTSi -client"

Note 1: If you forget to add the -client option to HADOOP_DATANODE_OPTS you will get the following error messge in hadoop-hduser-datanode-node1.out:

Error occurred during initialization of VM
Server VM is only supported on ARMv7+ VFP

Note 2: If you run SSH on a different port than 22 then you need to change the following parameter:

# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
export HADOOP_SSH_OPTS="-p <YOUR_PORT>"

Or you will get the error:

connect to host localhost port 22: Address family not supported by protocol

Configure Hadoop

In /opt/hadoop/conf edit the following configuration files:

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hdfs/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

Create HDFS file system

sudo mkdir -p /hdfs/tmp
sudo chown hduser:hadoop /hdfs/tmp
sudo chmod 750 /hdfs/tmp
hadoop namenode -format

Start services

Login as hduser. Run:

/opt/hadoop/bin/start-dfs.sh
/opt/hadoop/bin/start-mapred.sh

Run the jps command to checkl that all services started as supposed to:

jps

16640 JobTracker
16832 Jps
16307 NameNode
16550 SecondaryNameNode
16761 TaskTracker
16426 DataNode

If you cannot see all of the processes above review the log files in /opt/hadoop/logs to find the source of the problem.

Run sample test

Upload sample files to HDFS (Feel free to grab any other textfile you like than license.txt):

hadoop dfs -copyFromLocal /opt/hadoop/LICENSE.txt /license.txt

Run wordcount example:

hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar wordcount /license.txt /license-out.txt

When completed you will see some statistics about the job. If you like to see the outputfile grab the file form HDFS to local file system:

hadoop dfs -copyToLocal /license-out.txt ~/

Open the ~/license-out.txt/part-r-00000 file in any text editor to see the result. (You should have all words in the license.txt file and their number of occurrences)

Single node performance test

For performance test I have put together a few sample files by concatenating textbooks from project gutenberg and run them in the same manner as the sample test above.

Result:

File Size Wordcount execution time (mm:ss)
smallfile.txt 2MB  2:17
mediumfile.txt 35MB  9:19

Download sample text files for performance test.

I also tried to some larger files but then the PI ran out of memory.

Hadoop Raspberry Pi Cluster Setup

Prepare Node1 for cloning

Since we will make a clone of node1 later the settings made here will be the “base” for all new nodes.

Edit configuration files

/etc/hosts

192.168.0.110 node1
192.168.0.111 node2
192.168.0.112 node3

In a more serious setup you should use real DNS to setup name lookup, however to make it easy we will just go with the hosts file.

/opt/hadoop/conf/masters

node1

Note: conf/masters file actually tells which node that is the Secondary NameNode. Node1 will become NameNode when we start the NameNode service on that machine.

In /opt/hadoop/conf edit the following configuration files and change from localhost to node1:

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hdfs/tmp</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://node1:54310</value>
  </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>node1:54311</value>
  </property>
</configuration>

Wipe HDFS

Note: In the next step we will completely wipte out the current hdfs storage – all files and data that you have used in hdfs will be lost. When you format the namenode there is also an issue causing the error message: Incompatible namespaceIDs in path/to/hdfs. This can happen when starting/doing file operations on the datanode after the namenode has been formatted. This issue is explained more in detail here.

rm -rf /hdfs/tmp/*

Later on we will format the namenode but we do this to ensure the hdfs filesystem is clean on all the nodes.

Clone Node1 and setup slaves

Clone the SD Card of node1 to the other SD cards you plan to use for the other nodes. There are various programs that can do this i used Win32DiskImager.

For each cloned node make sure to:

  • Change hostame in /etc/hostname
  • Change IP Adress in /etc/network/interfaces
  • Restart the Pi.

Configure Node1

/opt/hadoop/conf/slaves

node1
node2
node3

Note: The masters and slaves configuration files are only read by the hadoop start/stop scripts such as: start-all.sh, start-dfs.sh and start-mapred.sh.

On node1, ensure you can reach node2 and node3 from ssh as hduser without need to enter password. If this does not work: copy /home/hduser/.ssh/id_rsa.pub on node1 to /home/hduser/.ssh/authorized_keys on the node that you try to connect to.

su hduser
ssh node1
exit
ssh node2
exit
ssh node3
exit

Enter Yes when you get the “Host key verification failed message”.

Format hdfs and start services

On node1:

hadoop namenode -format
/opt/hadoop/bin/start-dfs.sh
/opt/hadoop/bin/start-mapred.sh

Verify that daemons are running correctly

On node1:

jps
3729 SecondaryNameNode
4003 Jps
3607 DataNode
3943 TaskTracker
3819 JobTracker
3487 NameNode

On the other nodes:

jps
2307 TaskTracker
2227 DataNode
2363 Jps

Note: If you have issues you can examine the logfiles /opt/hadoop/logs or you can try to start each service manually on the node that is failing for example:

On node1:
hadoop namenode
hadoop datanode

You may now also try to access hadoop from the web interface to see which nodes that are active and other statistics:

http://node1:50030
http://node1:50070

Hadoop Raspberry Pi performance tests and optimization

For those tests I used the same sample text files as for the single node setup.

Download sample files

Those tests are to highlight some of the issues that can occur when you run hadoop the first time and especially in a Raspberry Pi cluster since it is very limited.  The tests will do some things “very wrong” in order to point out the issues that can occur. If you just want to optimize for the Raspberry Pi you can check out the changes that are made in the last test. Also please notice that those test are done for the mediuim.txt sample file provided above and is no “general-purpose” optimizations. If you have used Hadoop before those test are probably of no use for you since you already have figured out what to do 🙂

First run

Start two three SSH terminal windows – one for each node. Then start a monitoring program in each of them. I used nmon but you could as well go with top or any other monitor of your choice. Now you will be able to watch the load put on your Pi’s by the WordCount MapReduce program.

Go back to your main terminal window (for node1) and upload files to HDFS and run the WordCount program:

hadoop dfs -copyFromLocal mediumfile.txt /mediumfile2.txt
hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar wordcount /mediumfile2.txt /mediumfile2-out.txt

Then watch the monitors of your nodes. Not much going on on node2 and node3? But node1 is running all of the job? The JobTracker is not distributing the jobs out to our other nodes. This is because as default HDFS is configured for use of really large files and the block-size is set to 64mb. Our file is only 35MB (medium.txt) hence it will only be split into one block and hence only one node can work on it.

Second run

Optimize block size

In order to tackle the block-size problem above edit the conf/hdfs-site.xml on all your nodes and to the following:

hdfs-site.xml

<configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.block.size</name>
 <value>1048576</value>
 </property>
</configuration>

The above configuration will set block size to 1mb. Lets make another run and see what happens:

hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar wordcount /mediumfile2.txt /mediumfile3-out.txt
File Size WordCount execution time (mm:ss)
mediumfile.txt 35MB  14:24

Haddop Terminal MonitoringStill not very impressive, right? It’s even worse than the single node setup… This is due to that when you upload a file to HDFS and you do it locally e.g. from a datanode (which we are doing since node1 is a datanode) it will copy the data local. Hence all our blocks are now on node1. Hadoop also tries to run jobs as close as possible to where the data i stored to avoid network overhead. However some of the blocks might get copied over the node2 and node3 for processing but node1 is moste likely to get the most load. Also node1 is running as NameNode and JobTracker and has additional work to do. Also I noticed in several of the jobs the job failed with out of memory exception as seen in picture to the right. Then 1mb of block-size is might be to small even on the Pi’s depending on our file size. But now will have our file split into 31 blocks where each block will cause a bit of overhead. (The less blocks we need the better – if we still can evenly spread the blocks across our nodes).

Third run

Optimize block size

Lets make another try. This time we change the block-size to 10mb: (conf/hdfs-site.xml)

hdfs-site.xml

<property>
 <name>dfs.block.size</name>
 <value>1048576</value>
 </property>

Format NameNode

Node1 got a bit overloaded in the previous scenario we will now remove its role as TaskTracker and DataNode. Before we can remove node1 as DataNode format the namenode (as we otherwise would end up with dataloss since we have the dfs.replication set to 1 our data is not redundant)

On all nodes:

rm -rf /hdfs/tmp/*

On node1:

hadoop namenode -format

Configure Node1 to only be master

Edit conf/slaves and remove node1. Then stop and start the cluster again:

stop-mapred.sh
stop-dfs.sh
start-dfs.sh
start-mapred.sh

Then upload our sample data and start the job again:

hadoop dfs -copyFromLocal mediumfile.txt /mediumfile.txt
hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar wordcount /mediumfile.txt /mediumfile-out.txt
File Size WordCount execution time (mm:ss)
mediumfile.txt 35MB  6:26

So now we actually got a bit of improvement compared to a single node setup. This is due to that when you upload a file to HDFS from a client e.g. not locally on the DataNode Hadoop will try to spread the blocks evenly among the nodes and not as in our previous test. However this is still not optimal since now we are not using node1 to its full processing potential. What we would like to do is to have all nodes as DataNodes and TaskTrackers with the file blocks spread nice and evenly on all of them.

Also if you go to http://node1:50030 and click on number 3 under “nodes” in the table you will see that our nodes are setup to be able to handle 2 map tasks (See picture below). However the Raspberry Pi is a one (and one pretty slow) processor core. It will most likely not perform well of running multiple tasks. So lets set things correct in the last run.

hadoop web task trackers 2

Fourth run

Re-format NameNode (again)

On all nodes:

rm -rf /hdfs/tmp/*

On node1:

hadoop namenode -format

Optimize block size

Lets make the block-size a bit smaller than before. Lower it to 5mb.

<configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.block.size</name>
 <value>5242880</value>
 </property>
</configuration>

Configure TaskTrackers max tasks

As mentioned in the last text of previous test. If you go to http://node1:50030 and look on your nodes you will se that max map and reducer tasks are set to 2. This is to much for the Raspberry Pi’s. We will change max map and reducer tasks to the amount of CPU cores each device has: 1.

On all your nodes:

mapred-site.xml

 <configuration>
 <property>
 <name>mapred.job.tracker</name>
 <value>node1:54311</value>
 </property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
</configuration>

Configure Node1 back to act as both slave and master

Edit conf/slaves and add node1. Then stop and start the cluster again:

stop-mapred.sh
stop-dfs.sh
start-dfs.sh
start-mapred.sh

Verify Max Map Tasks and Max Reduce Tasks

Go to http://node1:50030, click your nodes in the cluster summary table and ensure max map and max reduce tasks are set to 1:

hadoop web task trackers

Upload Sample file (again)

hadoop dfs -copyFromLocal mediumfile.txt /mediumfile.txt

Balance HDFS file system

Of course it is possible to upload data on one node and the distribute it evenly across all nodes. Run the following to see how our mediumfile.txt currently is stored:

hadoop fsck /mediumfile.txt -files -blocks -racks

As you most likely will see all the blocks are stored on node1. In order to spread the blocks evenly on all nodes run the following:

hadoop balancer -threshold 0.1

The threshold parameter is a float value from 0 to 100 (percentage). The lower the more balanced your blocks will be. Since we only have one file and that file is a very small percentage of our total storage we need to set it really small to put the balancer into work. After the balancing is complete very the file blocks again by:

hadoop fsck /mediumfile.txt -files -blocks -racks

Last run

hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar wordcount /mediumfile.txt /mediumfile-out.txt
File Size WordCount execution time (mm:ss)
mediumfile.txt 35MB  5:26

Finally  we got a bit better performance! There are probably lots of other things we could fine tune more but for this tutorial we are happy with this. If you want to go further there are plenty of stuff to find on google and elsewhere. Hope you enjoyed! Now go code some cool MapReduce jobs and put your cluster to work! 🙂

SharePoint 2013 BI Development Environment

This tutorial show you howto setup a complete SharePoint 2013 BI Development environment. The goal is to have SharePoint environment with all Business Intelligence features enabled and a more production like setup with distinguished service accounts and kerberos configuration enabled. The installation process will be partly automated using AutoSPInstaller and PowerShell scripts for installation and provisioning of service applications. The reason for this is to have a more clear and consistent method for future installations that also can be adapted to a  real world production environment.

The environment will consist of three servers: domain controller, database server and SharePoint server. To achieve those three servers we will configure a virtual environment using VirtualBox.

The environment will be “two-tier” farm setup shown in the diagram below:

sharepoint2013_virtual_environment

SharePoint 2013 BI Development Environment

The tutorial is divided into four different posts/parts. Although they are all a part of the tutorial they are thought to be used independently as well depending on your needs. For example if you only would like to configure kerberos or only want automate an installation with AutoSPInstaller etc.

Content

  1. SharePoint 2013 development environment using Virtualbox
  2. Install SharePoint 2013 SP1 using AutoSPInstaller
  3. Configure SharePoint 2013 Business Intelligence Services
  4. SharePoint 2013 Business Intelligence kerberos configuration

SharePoint 2013 kerberos configuration

SharePoint 2013 kerberos configuration is required in a SharePoint setup when user delegation is needed to access external data sources or other resources. This post will go through the steps you need to configure SharePoint 2013 kerberos for business intelligence services and web applications.

Most often when SharePoint is used with business intelligence this is needed since data sources and cubes etc. is stored on another server than the SharePoint machine. In the previous post Configure SharePoint 2013 BI Services we have one of those setups. However there we avoided to use kerberos by using service accounts or data access accounts to access our data sources and were not able to use integrated security option for our report/analyses data sources.

Although this post is the fourth post in the series Setup SharePoint 2013 BI Development Environment and assumes that you have made all the previous tutorials it can also be generally applied as long as you change names and urls to fit your own environment.

SharePoint 2013 BI Development Environment tutorial

  1. SharePoint 2013 development environment using Virtualbox
  2. Install SharePoint 2013 SP1 using AutoSPInstaller
  3. Configure SharePoint 2013 Business Intelligence Services
  4. >SharePoint 2013 Business Intelligence kerberos configuration<

Contents

Getting started

For this tutorial we have an environment resembling the following diagram:

sharepoint2013_virtual_environment

 As you see in the picture above when using integrated security and a client connects to GRAMMI we would like GRAMMI to use the same user credentials to access TUMMI. For example the user opens an excel document (with excel services in the browser) with an external data source to a graph/pivot etc. and press the refresh data button. We would then expect GRAMMI to be able to connect to TUMMI using the same credentials as the client to access the data. However this will not work when using NTLM (which is what is enabled by default) and we will get what is called a NTLM double hop scenario:

ntlm double hop

With NTLM the client credential is only able to “hop” one step hence we get a failure when the SharePoint server is trying to access our database server. We will not go in to the details of kerberos in this post but if you like to dig deeper you can take a look at:

SharePoint 2010 Kerberos Configuration Guide
http://www.microsoft.com/en-us/download/details.aspx?id=23176

Plan for kerberos authentication in SharePoint 2013 and links to other resources
http://technet.microsoft.com/en-us/library/ee806870(v=office.15).aspx

In the previous post Configure SharePoint 2013 BI services we avoided this double hop scenario by specifying a service account in our datasource or use a pre-configured unattended service account in SharePoint Secure Store service application. This solution has a few issues among others:

  • End users must  configure their data source to use an unattended service account or know the name and have access to another pre-configured credential in secure store
  • End users must know username/password of service account for the datasource they would like to use
  • There is no possibility to easy filter which data that should be available to the user by using their login (as for example roles and dimension filters in Analysis Services)
  • NTLM is less secure and causes more overhead on the network and traffic to the domain server that handles authentication

Test that user delegation is not working without kerberos

  • Create a workbook with a connection to Analysis Services using current user in authentication settings for the data connection (This is the default setting when adding a new server connection) and create a PivotTable. (Your current user will need to have access to the Analysis Server and the cube)
  • Upload/save the workbook to SharePoint.
  • Open the workbook in SharePoint using Excel Services -> try to drill down/change the PivotTable. You should get the following error message:
    external data refresh failed excel sercvices sharepoint 2013

If you look in the SharePoint logs you will also find an entry for:

SPSecurityContext: Could not retrieve a valid windows identity for username 'GUMMIS\SP2013_Install' with UPN 'SP2013_Install@gummis.com'. UPN is required when Kerberos constrained delegation is used. Exception: System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: WTS0003: The caller is not authorized to access the service. (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is: System.UnauthorizedAccessException: WTS0003: The caller is not authorized to access the service.   

So this is what we will solve in the rest of this article.

Service account permissions for kerberos

This post assumes you have configured all service accounts below as managed accounts and also provisioned all BI service applications with a dedicated account as described in Configure SharePoint 2013 BI Services. If not make sure to adapt further settings to your environment.

Before continue, please configure all service accounts in your environment accordingly to the table below:

Account Security settings
SP2013_C2WTS

Claims to windows token service
(Service account for: Claims To Windows Token Service Application)

Permissions:

  • Need to be in local administrators group
  • In local security policy
    (Open gpedit.msc as administrator -> Windows Settings -> Security Settings -> Local Policies > User rights assignment)
    • Act as part of operating system
    • Log on as a service
    • Impersonate a client after authentication
  • Edit the file:
    C:\Program Files\Windows Identity Foundation\v3.5\c2wtshost.exe.config
    Ensure the file has a section configured as below:

    <configuration>
      <windowsTokenService>
        <allowedCallers>
          <clear/>
          <add value=”WSS_WPG” />
        </allowedCallers>
      </windowsTokenService>
    </configuration>

SP2013_Farm

Farm account

Permissions:

  • Nothing “special” needed if service applications and everything else is configured using instructions in SharePoint 2013 BI Development Environment.
SP2013_ExcelServices

Excel Services
(Service account for: Excel Service Service Application)

Permissions:

  • Admin on Analys Services (SSAS) for SharePoint instance (GRAMMI\POWERPIVOT).
  • On SQL Server, locate Central Administration database
    (Usually SharePoint_AdminContent). 
    Expand security folder -> add/open setttings for SP2013_ExcelService account -> in the mappings settings tick the SPDataAccess role checkbox -> OK.

 

SP2013_PowerPivot

PowerPivot Services
(Service account for: PowerPivot Service Application)

Security rights:

  • Admin on Analys Services (SSAS) for SharePoint instance. (GRAMMI\POWERPIVOT)
  • In local security policy
    • Act as part of operating system
SP2013_PPS

Performance Point Services
(Service account for: Reporting Services Service Application)

Permissions:

  • No additional settings except what is done automatically when provisioning service application.

 

SP2013_RS

Reporting Services

(When using reporting services in integrated mode with reporting services service application on the same server as your site web application then both the web application and reporting services service application need to use the same service account!)

In this setup the SP2013_WepApp will be used as service account for reporting services and no dedicated reporting services account will be used.

Permissions:

  • Admin rights on Analys Services (SSAS) for SharePoint instance. (GRAMMI\POWERPIVOT)(Needed if you would like to use PowerPivot as source for PowerView reports)
  • Read/admin rights on Analys Services (SSAS) tabular instances/databases that you which to use as source for PowerView.
  • Edit the file:
    C:\Program Files\Common Files\microsoft shared\Web Server Extensions\15\WebServices\Reporting\rsreportserver.config

    Change AuthenticationTypes RSWindowsNTLM to RSWindowsNegotiate and ensure the file has a section configured as below:

    <Authentication>
      <AuthenticationTypes>
        <RSWindowsNegotiate>
      <AuthenticationTypes>

SP2013_WebApp

Site web application pools
(Service account for: Reporting Services and)
(Application pool account for portal web applications)

Security rights:

  • Read text in red and security rights for SP2013_RS account.
SP2013_Install

Installation account

This account will be used for installation and configuration of the BI services with the exception of PowerPivot Configuration Tool which have some issues to detect proper user permissions. (described later)

Permissions:

  • Member of Local Administrators security group
  • Member of Central Administrators group in Central Administration.
SP2013_DataAccess

Unattended Data Access account
(If you like to use unattended data access or secure store account for BI services)

Permissions:

  • Read access to all data sources used by BI services.
    (TUMMI default SQL, TUMMI default SSAS)

Note: If you are going for kerberos configuration later you might not need this account.

*End users*

Permissions:

  • Read access to all data sources
  • Access to SharePoint BI portal site
  • The BI portal site should be added as trusted site in Internet Explorer

Configure Claims To Windows Token Service (C2WTS)

Ensure the C2TWS service is running

Open Central Administration -> Service On Server:

claims to windows token service central administration

Change the service account for C2WTS service

Ensure that you have added your C2WTS service account (SP2013_C2WTS) as a managed account. Run the following PowerShell script as administrator in SharePoint 2013 Management Shell:

# Change Claims to Windows Token Services service account 
$c2twsAccount = "GUMMIS\SP2013_C2WTS"
 
Add-PSSnapin Microsoft.SharePoint.PowerShell

[string] $Identity = $c2twsAccount
[string] $ServiceTypeName = "Claims to Windows Token Service"
 
#Get Reference to Service
$Service = (Get-SPFarm).services | where {$_.typename -eq $ServiceTypeName}
 
#Get Reference to Managed Account
$IdentityManagedAcct = Get-SPManagedAccount -Identity $Identity
 
#Get Reference to Process Identity and Update Identity
$SvcProcessIdentity = $Service.ProcessIdentity
$SvcProcessIdentity.CurrentIdentityType = [Microsoft.SharePoint.Administration.IdentityType]::SpecificUser
$SvcProcessIdentity.Username = $IdentityManagedAcct.UserName
$SvcProcessIdentity.Update()
$SvcProcessIdentity.Deploy()

 You can also download the script Configure SharePoint 2013 BI Services.ps1 which has function to create BI service applications and change C2WTs token service service account.

Add/verify DNS A records

For kerberos to work you will need to add DNS a records to the web applications that you would like to setup with SharePoint 2013 kerberos delegation. Please verify that alternate access mappings are properly for your web applications as well. For more information see previous post Install SharePoint 2013 SP1 using AutoSpInstaller – Configure DNS

We currently have the following A records:
portal.gummis.com
mysites.gummis.com

Create SPN’s

Create fake SPN’s (This will enable delegation tab in Active Directory Active Directory and Users which is not shown by default)

Logon to your domain server (ZUMMI). Start a command prompt as domain adminitrator.

Create fake SPN’s

Setspn -S SP/C2WTS GUMMIS\SP2013_C2WTS
Setspn -S SP/SSRS GUMMIS\SP2013_RS
Setspn -S SP/PPS GUMMIS\SP2013_PPS
Setspn -S SP/Excel GUMMIS\SP2013_ExcelServices
Setspn -S SP/PowerPivot GUMMIS\SP2013_PowerPivot

Please note that SP2013_RS is only needed if you do not run the SSRS service application as the web application account (SP2013_WebApp)

Create web applications SPN’s

Setspn -S HTTP/portal SP2013_WebApp
Setspn -S HTTP/portal.gummis.com SP2013_WebApp
Setspn -S HTTP/mysites SP2013_WebApp
Setspn -S HTTP/mysites.gummis.com SP2013_WebApp

Create SQL server services SPN’s

SPN For SQL Default Instance
Setspn -S MSSQLSvc/tummi:1433 GUMMIS\SQL2012_Services
Setspn -S MSSQLSvc/tummi.gummis.com:1433 GUMMIS\SQL2012_Services
SPN For SSAS Default Instance
Setspn -S MSOLAPSvc.3/tummi GUMMIS\SQL2012_Services
Setspn -S MSOLAPSvc.3/tummi.gummis.com GUMMIS\SQL2012_Services

SPN For SSAS TABULAR Named Instance
(We have not used a TABULAR instance in our installation, but still here to use as reference if needed)

Setspn -S MSOLAPSvc.3/tummi:TABULAR GUMMIS\SQL2012_Services
Setspn -S MSOLAPSvc.3/tummi.gummis.com:TABULAR GUMMIS\SQL2012_Services

In order for named instance to be found you also need to register SQL Browser Server for Kerberos

Setspn -S MSOLAPDisco.3/tummi GUMMIS\SQL2012_Services

When creating the spn you will get a reponse indicating that the object is updated/registred as below:

setspn sharepoint 2013 kerberos

Setup delegation

Trust SharePoint 2013 server for kerberos delegation

Open Active Directory Users and Computers on the domain server (Zummi). Browse to the SharePoint 2013 server. (gummis.com/Computers/GRAMMI).

Right click on server (GRAMMI) -> Properties -> Delegation tab -> Trust this computer for delegation to any service (kerberos only):

ad enable kerberos delegation sharepoint server

Web application(s)

Do the following section for all web applications you wish to use with kerberos.

On SharePoint 2013 server (GRAMMI) open up Central Administration -> Web Applications -> Select web application -> Authentication provider -> Click Default -> Change from NTLM to Kerberos:

sharepoint 2012 sp1 central administration web application kerberos authentication 

Verify IIS settings

Open Internet Information Services Manager (inetmgr). Select SharePotin 2013 web application. Verify that windows authentication is enabled:

sharepoint 2013 kerberos iis auth configuration

Note: If you get a warning message complaining that both forms and windows authentication is enabled then just ignore the message.

Select windows Authentication. In the right action pane select Advanced Settings. Then verify that extended protected mode and kernel authentication mode is switched off. If this gives you another warning – ignore it.

sharepoint 2013 kerberos iis windows auth configuration advanced

 

Also make sure you have kerberos and NTLM in the providers configuration:

sharepoint 2013 kerberos iis windows auth configuration providers

 

Also verify that IIS access mappings and bindings are correct.

Delegate access to services

In Active Directory – Manage users & computers find the users for the services that you configured fake SPN’s in previous steps. E.g. accounts for:

  • Claims to windows token service (GUMMIS\SP2013_C2WTS)
  • Reporting Services (GUMMIS\SP2013_WebApp or SP2013_RS)
    (See Service Account Permissions for Kerberos)
  • Excel Services (GUMMIS\SP2013_ExcelServices)
  • PowerPivot (GUMMIS\SP2013_PowerPivot)
  • Performance Point Services (GUMMIS\SP2013_PPS)

Right click user -> properties -> delegation -> Ensure you do this by the “Trust this computer for delegation to specified services only” option -> Add SPN’s you create earlier for SQL, SSAS, SSAS Tabular etc. (You can find the SPN’s by searching for the service account e.g. GUMMIS\SQL2012_Services)

D SharePoint 2013 kerberos service accounts delegate services

Test

Open your SharePoint web application in the browser. Verify that a kerberos ticket have been issued by running klist command in command prompt (as administrator). You should see a list of all kerberos ticket issued for your user. Make sure that a ticket have been issued for your web application similar as below:

sharepoint 2013 kerberos ticket klist

You can also look in the evnet logs of the domain server for Logon events related to kerberos:

  • 4768 – A Kerberos authentication ticket (TGT) was requested.
  • 4771 – Kerberos pre-authentication failed.
  • 4772 – A Kerberos authentication ticket request failed.

event kerberos ticket

If you do not get a ticket issued when loggin on to your site. Make sure that you have given the C2WTS (SP2013_C2WTS) and Web Application (SP2013_WebApp) proper permissions (See Service Account Permissions for Kerberos).

If it still does not work ensure that you have created all kerberos SPN’s correctly and that no duplicates or invalid entries exists.

When kerberos works for your web applications verify that your BI services work correctly by accessing data sources using “current user” and integrated security.

 

Configure SharePoint 2013 BI Services

This guide will show you how to setup and configure SharePoint 2013 BI (Business Intelligence) services. The guide will cover the following BI services and features:

  • Excel Services
  • Reporting Services
  • Performance Point Services
  • PowerView

We will be using PowerShell for most of the setup and configuration tasks. This will not only speed up the process but will also ensure that it is done in the same way the next time you do it. We will also use PowerShell to provision application pools and user accounts to have one service account for each BI service application. This post is the third post in the series of posts with the theme SharePoint 2013 BI Development Environment:

SharePoint 2013 BI Development Environment tutorial

  1. SharePoint 2013 development environment using Virtualbox
  2. Install SharePoint 2013 SP1 using AutoSPInstaller
  3. >Configure SharePoint 2013 Business Intelligence Services<
  4. SharePoint 2013 Business Intelligence kerberos configuration

Even though this post assumes you have done the previous guides above you can still adapt it to your own environment. Just ensure to change computer/accounts names etc.

If you have followed the previous articles you are now familiar with the picture below, presenting a diagram of the servers in our setup:

sharepoint2013_virtual_environment

If you are a bit familiar with SharePoint setup and BI configuration you have already noticed that in order for our BI services to work properly with user delegation (be able use “current user” login on our BI data sources, cubes, reports etc.) we will need to configure kerberos. This is due to limitations with NTLM and s double hop scenario. This will be described in the 4th post: SharePoint 2013 Business Intelligence kerberos configuration (upcoming).

However we will still be able to use our BI data source if we configure unattended service access accounts for our BI services which is described in additional configuration section of this post.

Contents

Prerequisites

Before continuing ensure you have already have the following setup (please note that certain combinations of the software below with lower versions e.g .not SP1 of SQL/SharePoint which might not work properly together):

  • Microsoft SQL Server 2012 SP1 Installation medium
  • Office 2013 Professional Installation medium (For testing purposes)
  • SharePoint Server/Virtual machine with:
    • Windows Server 2012 R2
    • SharePoint 2013 SP1
  • SQL Server/virtual machine with:
    • Windows Server 2012 R2
    • SQL Server 2012 SP1 CU8 (Cumulative Update 8)
    • SQL Services instances for intended BI services you would like to use:
      • Analysis Services (Multidimensional/Tabular)
      • PowerPivot for SharePoint instance

Prepare sercice accounts

Make sure you have setup the following services accounts from the previous post, Install SharePoint 2013 SP1 using AutoSPInstaller then configure their security rights as specified in table below. Access rights on service application databases etc will be taken care of when they are provisioned using PowerShell script later.

Account Security settings
SP2013_ExcelServices

Excel Services

Security rights:
Admin on Analys Services (SSAS) for SharePoint instance (GRAMMI\POWERPIVOT).

On SQL Server, locate Central Administration database (Usually SharePoint_AdminContent). Expand security folder –> add/open setttings for SP2013_ExcelService account –> in the mappings settings tick the SPDataAccess role checkbox –> OK.

SP2013_PowerPivot

PowerPivot Services

Security rights:
Admin on Analys Services (SSAS) for SharePoint instance. (GRAMMI\POWERPIVOT)

SP2013_PPS

Performance Point Services

Security rights:
No additional settings except what is done automatically when provisioning service application.

SP2013_RS

Reporting Services

(Note, if you plan to configure kerberos later read this: when using reporting services in integrated mode with reporting services service application on the same server as your site web application the web application and reporting services service application need to use the same service account!)

In this setup the SP2013_WepApp will be used as service account for reporting services and no dedicated reporting services account will be used.

Security rights:
Admin rights on Analys Services (SSAS) for SharePoint instance. (GRAMMI\POWERPIVOT)

(Needed if you would like to use PowerPivot as source for PowerView reports)

Admin rights on Analys Services (SSAS) tabular instances that you which to use as source for PowerView.

SP2013_WebApp

Site web application pools

Security rights:
Read text in red and security rights for SP2013_RS account.

SP2013_Install

Installation account

This account will be used for installation and configuration of the BI services with the exception of PowerPivot Configuration Tool which have some issues to detect proper user permissions. (described later)

Security rights:
Member of Local Administrators security group.
Member of Central Administrators group in Central Administration.

SP2013_DataAccess

Unattended Data Access account

Security rights:
Read access to all data sources used by BI services.
(TUMMI default SQL, TUMMI default SSAS)

Note: If you are going for kerberos configuration later you might not need this account.

To add account as administrator on an analysis services instance, connect to the instance in management studio -> right click instance name -> properties -> security tab -> add accounts.

Note: If you have problem to open the properties window of analysis services this might be that you have not applied latest cumulative update (CU) of SQL Server. (This tutorial uses CU8)

Install required software

SharePoint 2013 relies on several add-ins that are a part of Microsoft SQL Server. Download and install the add-ins/software that is required for the services that you wish to use. All software in this post should be installed on your SharePoint 2013 application server running the service(s) you wish to install.

Reporting Services add-in

Install Reporting Services for SharePoint and the Reporting Services add-in from the SQL Server 2012 SP Installation media on your SharePoint Application Server (GRAMMI).

sql server reporting services feature installation

Note: The Reporting Services – SharePoint feature add shared features for reporting services. Even though you are able to install only the Reporting Services Add-in without Reporting Services – SharePoint feature this will not work since you will have issues later when creating service applications etc.

This tutorial is tested with SQL Server CU8 where the reporting services add-in is bundled. Go to the link below, request the ssrs add-in (2012_SP1_RSShrPnt_CU8_2917531_11_0_3401_x64). (You will get a download link by e-mail))

http://support.microsoft.com/kb/2917531

Install and upgrade the Reporting Services add-in to CU8.

PowerPivot for SharePoint

  • Install SQL Server PowerPivot for SharePoint instance 

    For more detailed information see:
    http://msdn.microsoft.com/en-us/library/jj219067.aspx

    Run setup from your SQL Server 20012 SP1 installation medium.
    Select SQL Server PowerPivot for SharePoint:
    sql_7_powerpivot

  • Add Management Tools Complete feature during features selection
  • Configure service accounts:
    sql_8_powerpivot_service_accounts
  • Do not forget to add yourself/domain admin account as administrator account.
  • Finish the rest of the steps in the SQL Server Feature installation wizard.
  • Install and apply SQL Server CU8
  • Download and install PowerPivot for SharePoint 2013 Add-In from:
    http://www.microsoft.com/en-us/download/details.aspx?id=35577

Performance Point SQL 2008R2 ADOMD.Net provider

Allthough SQL 2012 ADOMD.Net provider is installed when installing other SQL Server features (such as PowerPivot above) Performance Point in SharePoint 2013 still needs ADOMD.Net for SQL Server 2008 R2. Download and install it from:

http://www.microsoft.com/en-us/download/details.aspx?id=16978

Click download and scroll down to:
Microsoft® SQL Server® 2008 R2 ADOMD.NET
Install both the 32 & 64 bit version.

Configure SharePoint 2013 BI Services

Run PowerPivot Configuration Tool for SharePoint 2013

For detailed information on PowerPivot configuration please see:
http://msdn.microsoft.com/en-us/library/jj682085(v=sql.110).aspx

We will configure PowerPIvot in three steps involving:

  1. Initial configuration (done in this step)
  2. Provision Powerpivot Service Application (done in next step)
  3. Configure unattended data refresh account and active site features

The PowerPivot Configuration tool will be used in step 1 and 2 above. Find PowerPivot Configuration tool in the start menu/start screen (make sure you start the one for SharePoint 2013 and not 2010).

Note: You might get the error message:

“The user is not a farm administrator. please address the validation failures and try again.”

This is a bit misleading since even if your installation account is farm administrator and local administrator the configuration tool does not seem to pick it up. To solve this issue add your farm account to local administrators group. Logout and login again. Right click PowerPivot Configuration tool and run as administrator. (Remove farm account from local administrators group when done with the rest of the steps in this post)

Go through the configuration settings.

Configure default account. (GUMMIS\SP2013_Farm) and type in password.

Make sure to not create the service application in this step:

PowerPivot Configuration Tool SP2013 Wait Service Application

Click verify and then run to complete this task.

Provision BI Service applications using powershell

You may configure all service applications in this tutorial through the user interface in Central Adminstration (or have them configured automatically when installing SharePoint using the default installer. However for more complex setups I like to use PowerShell and also configure each service application with its own application pool and service account. This makes it easier for troubleshooting, improved security and more flexibility to scale up you farm.

I have put together a PowerShell Script to configure and provision SharePoint 2013 BI service applications. Download it here:

Configure SharePoint 2013 BI Services.ps1 (zipped PowerShell script)

In the beginning of the script there are parameters to configure which service applications you which to provision. There is also a parameter to configure Claims To Windows Token account – we will do this later in upcoming post: SharePoint 2013 Business Intelligence kerberos configuration (upcoming)

For this tutorial configure the script to provision:

  • Reporting Services
  • Excel Services
  • PowerPivot Services
  • PerformancePoint Services

Run the script from PowerShell with the SharePoint installation account or other account with enough permissions. (farm administrator)

Note: You might get some error messages/warnings indicating that some services already are started or enabled depending on the state of your environment.

If everything goes ok you will now have one application pool with one service account for each service. You can look up SharePoint applicaiton pools with the following powershell script:

Get-SPServiceApplicationPool | Select Id, Name, ProcessAccountName

Also verify that all the service applications and proxies have been created in the “Manage Service Applications” page in Central Administration.

If anything goes wrong with the script try to figure out the error messages, remove created databases, service applications and application pools and try to correct the problem and run again. (Use the Get-SPServiceApplicationPool to find out the ID name of your application pool and then remove them from IIS Manager (inetmgr))

Additional configuration steps to complete setup

The following steps creates unattended service accounts for each BI service and other settings. When done you will have 3 unattneded accounts in Central Administration -> Manage Service applications -> Secure Store:

secure store accounts

 

If you do not want to have one account for each service you may also configure one custom secure store account and then use only one account for all desired services.

Reporting Services

  • Add service account for reporting services as sysadmin to PowerPivot instance (GRAMMI\LOCALHOST)
    (Open in SQL Management Studio, right click instance, security tab, add user (SP2013_WebApp)
  • In central administration -> Application Management -> Manage Service applications -> Reporting Services Service application, configure the following:
    • Mail
    • Provision Subscriptions and Alerts
      • Download Script
      • Logon to SQL Server and run the downloaded script
      • Back in central administration: Enter Username and password with account (SP2013_Install) that has sufficient permissions on the SQL server and press OK.
  • Verify that Reporting Services and PowerView features are enabled in site collections and sites where you wish to use them.

Excel Services

  • Add service account for excel services as sysadmin to PowerPivot instance (GRAMMI\LOCALHOST)
  • In central administration -> Application Management -> Manage Service applications -> Excel Services Service application, configure/verify the following:
    • Unattended refresh account:
      Secure Store Excel Services
    • Data Model Server
      Set to LOCALHOST\POWERPIVOT
    • Trusted Data Sources
    • Trusted File Locations
  • Verify that SharePoint Enterprise Site Collection and Site Features is enabled in site collections and sites where you wish to use Excel Services.

PowerPivot Services

  • Add service account for PowerPivot service application as sysadmin on PowerPivot instance (GRAMMI\LOCALHOST)
  • Login as farm administrator account. (make sure account also is local admin)
  • Start PowerPivot for SharePoint 2013 Configuration tool.
  • Enter password for default account (SP2013_Farm)
  • Select your target web application:
    powerpivot site configuration
  •  Configure account for unattended data refresh (SP2013_DataAccess):
    powerpivot unattened data refresh account
  • Click verify (Run will not be enbled) -> If everything goes well you will see that PowerPivot Features and have been enabled and that you unattended account is created in secure store service application.
  • Ensure PowerPivot Feature Integration for Site Collections feature is enabled in the site collections where you wish to use it.

Performance Point Services

  • Configure secure store account (SP2013_DataAccess)
  • Configure PerformancePoint services trusted file location and authentication settings
  • Ensure PerformancePoint Site Collection and site features are enabled in the site collections and sites where you wish to use it.

Test and verify

Note: This test will be without kerberos which will be discussed in another upcoming post. Since kerberos is not enabled yet we need to use a combination of service accounts (SQL/Windows/Secure Store) to make all services work. This will only be covered briefly since there are several different authentication types, data source providers and data sources available for use. Do tests that suits for your own needs.

Add portal to trusted sites in Internet Explorer

Configure service and access accounts on your SQL/SSAS instances

  • Ensure your unattened access account has read access to all data sources you intend to use (SP2013_DataAccess)
  • Ensure your SSAS Service account (SQL2012_Services) have administrator rights on your SQL Server data instance (TUMMI default isntance)

Install software

Install Office 2013 on SharePoint machine (GRAMMI) or any other machine that you would like to use as client for testing)

Install SQL Server Data Tools (former BIDS) on SQL machine (TUMMI or any other machine that you wish to use for SQL/SSRS/SSAS development)

Install sample data

Microsoft provides its classic AdventureWorks database with models for multidimensional and tabular cubes which you can use for testing purposes in your environment if you do not have any other data available. I will only provide basic steps here so please read instructions for howto deploy AdventureWorksDW and projects properly.

http://msftdbprodsamples.codeplex.com/releases/view/55330

  • Download: AdventureWorksDW2012_Data.mdf
  • Attach mdf file to default SQL instance on TUMMI (or if you have other instances/names) with the following script:
    CREATE DATABASE AdventureWorksDW2012
    ON (FILENAME = '<PATH__TO>\AdventureWorksDW2012_Data.mdf')
    FOR ATTACH_REBUILD_LOG;

Note: You might need to enable named pipes in SQL Configuration Manager for the script to work.

  • Download AdventureWorks Multidimensional Models SQL Server 2012.zip
  • You will need to re-configure the data source and enter deployment properties for the projects in Visual Studio before you deploy them.
  • Deploy the multimdensional project to the default SSAS instance (TUMMI)

Configure test site

  • Create a new subsite, use the PowerPivot Site Template
  • Enable Enterprise, Performance Point, BI Data Connection Library and other features you wish to use.
  • Add the following apps:
    • Data Connection Library
    • PerformancePoint Content Library
    • PerformancePoint Data Connection Library
  • Add Reporting Services Content Type to the existing PowerPivot Library (Included in PowerPivot site template)
    (PowerPivot Library -> Library tab -> Library Settings -> Advanced settings -> Allow management of content type -> back to Library settings -> Add Content Type -> Add Report Builder Report.
  • Add BI Semantic Data model, Reporting Data Source and Report Data Model Content Types to Data Connections Library

Test Excel Services

Add an excel document to PowerPivot gallery and verify that you can open the document in the web-browser.

Test PowerPivot Services

  • Create an excel document with a PowerPivot model that has your SQL server (TUMMI default instance) as data source. 
  • In Excel: Data tab -> Connections -> Select connection to SQL server (TUMMI) -> Properties -> Definition -> Authentication Settings -> None.
  • This will make the workbook data refresh use the unattended account that you configured earlier.
  • Add the document to the PowerPivot gallery.
  • Verify that you can open and refresh the data source.

Test Reporting Services and PowerView

  • In PowerPivot Gallery hover over the document you created before, in the upper right corner click the create powerview report.

    Note: If report cannot load make sure that you have configured the service account for reporting services service application as system administrator on the PowerPivot instance.

Test reporting services and report builder

  • In PowerPivot Gallery -> File tab -> New -> Report Builder Report (If the report builder tool does not download make sure you have installed CU8) -> Run and install report builder report -> create a test report against SQL/SSAS data source on SQL server (TUMMI), make sure to use a system account (SP2013_DataAccess) to access the data source. Enter the user information under the credential tab. 

    Note: You might need to add your current user to have read access to the datasource when building the report.

  • You will need to use a system account if your current user not is added to the data source you wish to use. Save and test the report in PowertPivot Gallery.

Test Performance Point Services

  •  Open the Performance Point Content List
  • In Performance Point tab -> Open dashboard designer
  • Create a new datasource, use unattended service account as credentials.
  • Add a new report
  • Save and publish to site and verify that they work.

    Note: You might need to add your site to Internet Explorer Trusted sites for Performance Point Dashboard Designer to work properly.

 

 

 

Install SharePoint 2013 SP1 using AutoSPInstaller

AutoSPInstaller is a set of PowerShell scripts and directory structure that helps to automate SharePoint installations. This tutorial assumes that you already have installed Windows Server 2012 R2 and SQL Server SP1 from the post Setup virtual SharePoint 2013 development environment using Oracle Virtualbox. This is however not a necessity as long as you change computer names, references and user accounts to match your own setup.

Some of the benefits of using AutoSPInstaller to install your SharePoint servers are:

  • The configuration XML file of AutoSPInstaller works like a checklist which can improve the quality of your installations. (For example properly configured service accounts and other security rights, less “quick-fixes” after installation is complete etc…)
  • Automated installation reduce installation and configuration time.
  • Since there are less manual work involved you know that when you do the installation the next time it will be exactly as last time (hopefully – would not make any promises here. After all, it is still SharePoint :))

The architecture of the SharePoint 2013 farm:

sharepoint2013_virtual_environment

This post is the second post in the series of posts with the theme SharePoint 2013 BI Development Environment which includes the following topics:

SharePoint 2013 BI Development Environment tutorial

  1. SharePoint 2013 development environment using Virtualbox
  2. >Install SharePoint 2013 SP1 using AutoSPInstaller<
  3. Configure SharePoint 2013 Business Intelligence Services
  4. SharePoint 2013 Business Intelligence kerberos configuration

Contents

Prerequisites

Below is required software installation medium/ISO files

  • Microsoft Windows Server 2012 R2
  • Microsoft SharePoint Enterprise Server 2013 SP1 (Note that enterprise is required for Business Intelligence features)

Prepare environment

SharePoint 2013 service accounts

In Active Directory Computers And Users create the following service accounts:

Account Description
SP2013_C2WTS Claims to windows token services
SP2013_Crawl Search crawl
SP2013_ExcelServices Excel Services
SP2013_Farm Farm account
SP2013_PowerPivot Power Pivot Services
SP2013_PPS Performance Point Services
SP2013_Profiles Profile Services Application & sync
SP2013_RS Reporting Services
SP2013_Search Search Services
SP2013_Services Various services
SP2013_WebApp Used for web application pools
SP2013_Install Installation account
SP2013_DataAccess Used for unattended data access for BI services

Configure Installation account access

  • Add account SP2013_Install to local admin on SharePoint server (Grammi).
  • Create a login to SP2013_Install on SQL Server (Tummi). Assign the login dbcreator, securityadmin roles:

sp2013 sql install account permissions

 

AutoSPInstaller will take care of any other permissons needed.

Note: If you are lazy and try to run as the domain administrator account you might get this error(s):
secure store error

So take the time to create the installation account 🙂

Prepare for user profile synchronization

Login to your domain controller machine. (created as virtual server Zummi in previous post)

  1. Open Active Directory Users and Computers.
  2. Right click your domain (gummis.com in this tutorial)
  3. Select Delegate Control…
  4. Next -> Add SP2013_Profiles (Created in in previous post) Click next.
  5. Select create a custom task to delegate. Click next.
  6. Select This folder, existing objects in this folder and creation of new objects in this folder. Click next.
  7. Select replicating directory changes:
    ad delegation replicating directory changes
    Click next, finish.

Configure DNS

Prepare app domain

Login to your domain controller machine. (created as virtual server Zummi in previous post) Apps in SharePoint 2013 are running in a different domain. Hence we need to setup that domain in the DNS Manager. Right click Forward Lookup Zones. Go through the wizard excpet where to enter your domain name: gummiapps.com (or whatever name you would like to use). dns manager sharepoint app domain

Right click your newly created domain. Select new alias (CNAME). Fill in properties as shown below:

dns manager sharepoint app domain

 

Add web application(s) DNS a records

To properly be able to access your web application with alternate access mappings and to later on configure kerberos add A records for your web applications (portal and mysites – will be configured in AutoSPInstaller later) by right click your domain (gummis.com) –> Add A record:

dns manager add portal

 

When done the configuration should look as below:

dns manager a records

Disable UAC and IE Enhanced protection

Login to your SharePoint server machine. (created as virtual server Grammi in previous post). In Server Manager –> Local Server disable IE enhanced protection:

disable_IE_enhanced_security

In User Account Control Settings disable UAC:

disable uac

Download and prepare sources

Login to your SharePoint machine using a domain account. In this tutorial I login as domain administrator. For a production environment it would be recommended to use an installation account. For more information on SharePoint user accounts and requirements see: http://absolute-sharepoint.com/2013/01/sharepoint-2013-service-accounts-best-practices-explained.html or google for more information. Before you can use AutoSPinstaller you nee to prepare it with the base installation of SharePoint 2013 and prerequisites. For offline installation you might also want to complement it with installation files from the Windows installation media. Follow the steps below to do this:

  1. Create a working directory for AutoSPInstaller and AutoSPSourceBuilder for example: C:\AutoSP.
  2. Download AutoSPInstaller from http://autospinstaller.codeplex.com/ and extract the zip file into C:\AutoSP\
  3. Download AutoSPSourceBuilder from https://autospsourcebuilder.codeplex.com/ and extract the zip file into C:\AutoSP\
  4. Since we use Service Pack 1 of SharePoint 2013 we do not want any of the older cumulative updates that exists in AutoSPSourceBuilder (version Apr 28, 2014) comment out the following lines in AutoSPSourceBuilder.xml:
    AutoSPSourceBuilder.xml
  5. Open up PowerShell as administrator. If you have not run any PowerShell scripts before you might need to change your execution policy to allow scripts by typing the following command:
    Set-ExecutionPolicy unrestricted

    Note: For a production environment or if you have any security concerns you might want to change your execution policy back to restricted or other policy. Read more at:http://msdn.microsoft.com/en-us/library/microsoft.powershell.executionpolicy(v=vs.85).aspx

  6. Mount your SharePoint media. (G: in this case)
  7. Cd to the AutoSPSourceBuilder directory and run the script:
    cd C:\AutoSP\AutoSPSourceBuilder
    .\AutoSPSourceBuilder.ps1 -SourceLocation "G:" -Destination "C:\AutoSP\AutoSPInstaller\SP\2013\SharePoint" -GetPrerequisites $true

    Note: Since we commented out the cumulative update section before we will get an error message – ignore this, the script will continue.

  8. Mount your Windows 2012 R2 media. (G: in this case)
  9. Copy the contents from G:\sources\sxs to C:\AutoSP\AutoSPInstaller\SP\2013\SharePoint\PrerequisiteInstallerFiles\sxs
    This will make any additional features that might be needed for offline installation available to the prerequisite installer.

Configure AutoSPInstaller for installation

AutoSPInstaller uses an XML file to install and configure your SharePoint installation. If you have followed the tutorial you will have the file at C:\AutoSP\SP\AutoSPInstaller\AutoSPInstallerInput.xml.

The default AutoSPInstaller.xml is quite well structured with comments. Use a XML texteditor of your choice to edit the file. Or download, review and use the AutoSPInstaller.xml used in this tutorial.

If you like you can also use AutoSPInstallerGUI which is a graphical tool to help you administer the AutoSPInstaller.xml file. Follow the images below for using AutoSPInstallerGUI.

Note: If you get tthe following issue with AutoSPInstallerGUI:

net 35 error

 

Then run the following command (where source is the sxs directory from Windows 2012 R2 installation media as you prepared earlier):

dism /online /enable-feature /featurename:NetFX3 /all /Source:C:\AutoSP\SP\2013\SharePoint\PrerequisiteInstallerFiles\sxs /LimitAccess

Install tab

AutoSPInstallerGUI Install Tab

 Farm tab

AutoSPInstallerGUI Farm Tab

AutoSPInstallerGUI Farm-Services Tab

Web applications tab

AutoSPInstallerGUI WebApplications Tab

Service Applications tab

AutoSPInstallerGUI ServiceApplications-Search-SearchServiceApplication Tab

The settings not shown above are left as is by default. If you go with AutoSPInstallerGUI do NOT forget to check all the dropdown menus of service accounts, web applications etc. I would recommend that you open the AutoSPInstaller.xml in an texteditor and search for contoso/gummis to ensure that you have configured all url’s and service accounts.

Install SharePoint 2013 SP1 using AutoSPInstaller

In the virtual machine menubar: File –> Take Snapshot. (If anything goes wrong with the installation it will be easy to revert it to its previous state)

Run C:\AutoSP\SP\AutoSPInstaller\AutoSPInstaller.bat as administrator to start the installation.

The installer will make a few tests with user account verification etc. The installation might restart the computer when installing new features/configurations. This is normal. If there is any error you can go back to your AutoSPInstaller.xml and fix it and then re-run the installation script. It will check which steps that are completed and then continue where it stopped last time.

Tip! If the installation stops at some step and take unusual long time click in the powershell window and press enter. (It seems to wait for a keypress at sometimes).

When the script is ready you will hopefully see the mysites and portal site in front of you:

sharepoint almost ready

Configure user profile service application

Open Central Administration –> Application Management –> Manage Service Applications

Open User Profile Service Application –> Configure Synchronization Connections –> Create New Connection:

SharePoint 2013 User Profile Sync Connection

Go back to User Profile Service Application –> Click Start and see that it is working:

SharePoint 2013 User Profile Sync Status

Optimize SharePoint 2013 for development

If you are low on resources on your development machine you might want to adjust some performance settings such as cache size, search service memory usage, database logging etc. to improve performance. Note that this can cause problems and should not be used in a production environment as is. Below are some PowerShell scripts that you can adjust to fit your needs:

# Set search performance level to reduced
Set-SPEnterpriseSearchService -PerformanceLevel Reduced
# Set noderunner memory limit to 50mb
$nrFile = "C:\Program Files\Microsoft Office Servers\15.0\Search\Runtime\1.0\noderunner.exe.config"
$nrConfig = New-Object XML
$nrConfig.Load($nrFile)
$nrConfig.configuration.nodeRunnerSettings.memoryLimitMegabytes = "50"
$nrConfig.Save($nrFile)
# Set distributed cache to 500 (default is 10% of RAM)
$cacheService ="SPDistributedCacheService Name=AppFabricCachingService"
$cacheServiceInstance = Get-SPcacheServiceInstance | ? {($_.service.tostring()) -eq $cacheService -and ($_.server.name) -eq $env:computername}
$cacheServiceInstance.Unprovision()
Use-CacheCluster
Set-CacheHostConfig -Hostname localhost -cacheport 22233 -cachesize 500 | Out-Null
$cacheServiceInstance.Provision()
# Set SP databases in simple recovery mode (less logging and less recovery possibilities)
$srv = "localhost"
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") | out-null
$msmServer = New-Object ('Microsoft.SqlServer.Management.Smo.Server') -argumentlist $srv
$msmServer.Databases | where {$_.IsSystemObject -eq $false} | foreach {$_.RecoveryModel = [Microsoft.SqlServer.Management.Smo.RecoveryModel]::Simple; $_.Alter()}
$msmServer.Configuration.MaxServerMemory.ConfigValue = 1500
$msmServer.Configuration.Alter()

Download SharePoint-2013-Dev-Optimization.zip

SharePoint 2013 development environment using Virtualbox

This post will be the first post in a series of posts to setup a SharePoint 2013 Business Intelligence development environment. This first post will cover the base setup with virtual servers and virtual SharePoint 2013 development environment using Virtualbox .

Later there will follow posts cover a complete SharePoint 2013 environment configured with all Business Intelligence services (Excel Services, Reporting Services, Power View and Performance Point Services) and kerberos authentication.

Kerberos authentication is usually needed when you are running SharePoint 2013’s Business Intelligence services in an environment where the data (Data warehouse(s)/Cubes) are stored on a different machine than the SharePoint server where reports and analyses are viewed. 

SharePoint 2013 BI Development Environment Tutorial

  1. >SharePoint 2013 development environment using Virtualbox<
  2. Install SharePoint 2013 SP1 using AutoSPInstaller
  3. Configure SharePoint 2013 Business Intelligence Services
  4. SharePoint 2013 Business Intelligence kerberos configuration

Contents

Please note that this post will only cover the installation and setup of the virtual machines and not the actual SharePoint 2013 installation. The SharePoint 2013 installation will follow later in the post Install SharePoint 2013 SP1 using AutoSPInstaller.

The setup will look as the diagram below:

sharepoint2013_virtual_environment

Note: The resources assigned to the machines in this setup are at the bare minimum and only recommended for testing. For any serious development and heavier use you will most likely need to increase the amount of RAM and vCPU’s. I would recommend about twice the resources for all the machines if possible. If you are not using a fast SSD disk go get one – you will need it! If you wonder where the lame server names are from I have taken them from Disney s’ story of The Gummibears. 

Prerequisites

Below is required software installation medium/ISO files. Please note that you should use the R2 version of Windows 2012 and SP1 of both SharePoint 2013 and SQL Server. If you use lower versions some of the Business Intelligence features that are to be installed at a later stage might not work!

For this post:

  • Microsoft Windows Server 2012 R2
  • Microsoft SQL Server 2012 SP1
    (Standard edition or better)
  • Microsoft SQL Server 2012 CU 8 (Cumulative Update 8)
    Download from: http://support.microsoft.com/kb/2917531

For the upcoming series of SharePoint 2013 Development Environment posts

  • Microsoft SharePoint Enterprise Server 2013 SP1
    (Note that enterprise is required for Business Intelligence features)

Install Oracle Virtualbox

Oracle Virtualbox is a software that allows you virtual machines (multiple virtual operating system installations) on your PC. There are several other alternatives around like Microsoft Hyper-V and VMWare vSphere however I choose to write this post on Oracle Virtualbox since it is cross-platform and open source. To get started grab the installer package for your system from:
https://www.virtualbox.org/wiki/Downloads
(
This post is written and tested using Virtualbox 4.3.10 r93012 with Windows 8.1 as host system)

  1. Install Virtualbox from downloaded installation file
  2. Start virtualbox
  3. Click File –> Prefrences –> Verify or change the default machine folder.
    This is where Virtualbox will but Virtual Machines and their corresponding virtual harddisk files as default. Preferably use a hard drive with enough disk space – it should be able to host Windows Server 2012 R2, SharePoint 2013 Enterprise Server SP1, SQL Server 2012 SP1 and then additional contents and software that you plan to use. e.g. Visual Studio, sample data etc. For this guide you will probably need at least 50gb of free space. (Since we use an immutable disc image as base we will only need to reserve space for one Windows installation)
    virtualbox_vm_directory 

Prepare Virtualbox immutable base image

A base image in Virtualbox is called immutable image. An immutable image is a virtual hard disk file that cannot be changed. When you create a immutable image in virtualbox that file can be used as base file for several other virtual machines hence saving time and disk space. In this setup we will create an immutable hard disk file with a base installation of Windows Server 2012 R2. We will then use this file as base for the virtual machines that we will create. Then the other virtual machines create will only hold the file differences from the immutable image. This will save both disk space and time when we create new virtual machines.

To create the immutable hard disk file we will first need to create a new virtual machine that creates and uses that file. When we are ready we can remove the virtual machine but save its’ hard disk file (VDI file) and use it as a base image to other virtual machines.

  1. Create new virtual machine for use as base imageClick Machine in the menubar–> New –>
    Name: Windows Server 2012 R2 Immutable.vdi (or whatever you want), Type: Windows, Version: Windows 2012 (64-bit) –> Next –>
    Memory size: 2048 –> Next –>
    Create virtual harddrive now –> Select VDI –> Next –>
    Select Dynamically allocated –> Choose location,  Drive size: 500GB (Since it is dynamic it will not reserve this amount of space on your drive so there is no reason to be sparse on disk size) –> Click Create
  2. Mount Windows 2012 R2 ISO fileSelect your newly created machine in the list and click settings –>
    Storage –>  Select empty CD/DVD icon –> Click the small CD icon to the upper right:
    virtualbox_choose_cd_dvd_iso
    Browse to and select your Windows 2012 R2 ISO file. Click OK.
  3. Click Start to run your virtual Machine (VM)
  4. Install Windows
  5. Install Virtualbox Guest Additions CD Image.Click Devices –> Insert Guest Additions CD Image
    Open windows explorer –> CD/DVD device –> Run through the installation
    (The installation will provide windows with devices drivers for Virtualbox)
  6. Install Windows Updates and other software that you would like to have in all of your virtual machines
    Run C:\Windows\System32\Sysprep\sysprep.exe –> Click generalize –> Shutdown:
    sysprep
  7. When sysprep is complete next time when the virtual machine is started it will start with a clean windows installation. E.g. you have to fill in administrator password and initial setup will start. (Just as when you buy a new computer and start it for the first time)

Make virtual disk immutable

  1. In Virtualbox main window: Click File –> Virtual media Manager –> Select disk –> modify –> Check immutable:virtualbox_make_disk_immutable
  2. Select your created VM and right click –> Remove . Then make sure to click Remove ONLY. This will remove your VM but not its’ virtual hard drive file which will make an immutable VDI file for use as our base image.
  3. We now have a immutable disk file for use as base imge when we create the rest of our virtual machines.

Configure Virtualbox virtual network

This section will create a virtual network where our virtual machines can communicate with each other. The network will have access to your computer network through NAT (Network Address Translation) which also will provide internet access for your virtual machines.

  1. In Virtualbox main window: Click File –> Preferences –> Network tab –> Click the small “cirtucuitboard icon” to the upper right to add a new NAT Network –> Enter details as below:
    rtualbox_virtual_network

 

Setup AD/DNS virtual machine

  1. Create new virtual machine
    Instead of creating a new disk file during the wizard select use existing disk file and choose the file that we made an immutable disk image in previous step:
    use_existing_vdi_file
  2. In Virtualbox main window: Select the newly created machine –> Settings:
    – General tab –> Advanced tab: Configure Shared Clipboard and Drag’n Drop if you prefer to be able to use it
    – System tab: Adjust RAM and CPU to fit your needs. (I run with 1vCPU, 1GB RAM)
    – Network tab: Select the nat network that we created earlier:
    virtualbox_configure_vm_natnet
    – Click OK.
  3. Start the VM
  4. In menubar: Select Machine –> Take Snapshot –> OK
    (Important! The snapshot will create a new VDI file that holds the VMs file difference towards the base image file so without it all changes will be lost upon reboot)
  5. Enter user information and password
  6. Change computer name (Zummi)
  7. In Control Panel –> Network and Sharing Center –> Change adapter settings –>
    Right click ethernet adapter –> Properties –> Select Internet Protocol Version 4 TCP/IPv4:
    Change network to static and use the ip-addresses that matches our virtual network created earlier:
    windows_network
  8. In Server Manager: click Add Roles and features and add Active Directory (AD) and DNS roles to server:
    add_role_ad_dns
  9. In Server Manager: Click the “flag” and Promote this server to a domain controller:
    promote_domain_controller
    – Add new forest
    – R
    oot domain name: gummis.com
    – 
    Type in domain password
    – Click next addiotnal steps to the prerequisites check:
    domain_check
    – Install
  10. In Active Directory Computers And Users create SQL service account user:
    SQL2012_Services Used for SQL server services (sql server, ssas, tabular, power opivot, agent etc.)
  11. Done

Setup database virtual machine

  1. Create new virtual machine (using the immutable disk file as before)
  2. Take Snapshot
  3. Enter user information and password
  4. Change TCP/IP network settings:
    IP: 10.0.2.231
    Netmask: 255.255.255.0
    Gateway: 10.0.2.1
    DNS: 10.0.2.230
  5. Change computer name (Tummi)
  6. Join gummis.com domain

    Note, if you get the following problem:
    join_domain_error
    – Run C:\Windows\System32\Sysprep\sysprep.exe
    – Select enter system audit mode
    – Reboot

  7. In menubar: Select devicse –> CD/DVD –> Chose from virtual CD/disk file –> Select your Microsoft SQL Server 2012 SP1 ISO file
  8. Install default SQL and Analysis Services instance (MSSQLSERVER). This instance will be used for Business Intelligence content in this scenario and not for SharePoint databases. Open the installation from the CD/DVD drive in windows explorer. Run setup.exe and go through the installation wizard with the following configuration shown in the images below. For more details around SQL Server installation please see:
    http://msdn.microsoft.com/en-us/library/ms143219.aspx

    Select features:
    sql_1_feature_selection

  9. Configure service accounts.
    Use SQL2012_Services account for SQL services:
    sql_3_user_accountsDo not forget to add yourself/domain admin account as administrator account:
    sql_3_user_admins
  10. In the SSAS configuration chose to install Analysis Services in multidimensional mode
  11. Finish the next steps of the SQL installation wizard.
  12. Install another SQL Service instance named SHAREPOINT for use with SharePoint content, administration and service application databases. Feature selection:
    sql_4_sharepoint_instanceChange name of instance:
    sql_4_sharepoint_instance_change_nameConfigure service accounts:
    sql_4_sharepoint_service_accountsDo not forget to add yourself/domain admin account as administrator account:
    sql_3_user_admins
    Finish the rest of the steps in the SQL Server Feature installation wizard.
  13. Install SQL Server CU 8
    – Download from http://support.microsoft.com/kb/2917531
    – Run SQLServer2012-KB2917531-x64.exe to apply the update. Apply the update for all SQL/SSAS instances.
  14. Open and enable service ports. 

    Note: If you disable the firewall you will not need to add firewall rules and configure dynamic ports and can skip those settings.

    Open windows firewall advanced settings and add the following inbound rules:
    windows firewall sql services

  15. Open SQL Server Configuration Manager and enable TCP/IP for all instances:

    sql server configuration manager enable tcpip

    Named instances have dynamic ports by default which we need to change to static ports if the firewall rules above should work. For more information on how to change a SQL Server instance to use static ports please see: http://technet.microsoft.com/en-us/library/ms177440.aspx.

    Configure static ports for our SharePoint named instance as below. Do not forget to scroll down and change all of the bindings to be static.

    sql server named instance static port

  16. Create SQL Alias in SQL Configuration Manager (good practive if you would move/migrate server in the future)
    sql create alias
  17. Done

Setup SharePoint 2013 machine

  1. Create new virtual machine (using the immutable disk file as before)
  2. Do not forget to add more resources to this mahine. In virtualbox main window: select VM –> Settings:
    – System Tab:
    – Increase CPU (I choose 2 vCPU)
    – Increase RAM (I choose 8 GB)
    (As mentioned before these are absolute minimum settings and only recommended for testing – at least double it for heavier use)
  3. Take Snapshot
  4. Enter user information and password
  5. Change TCP/IP network settings:
    IP: 10.0.2.232
    Netmask: 255.255.255.0
    Gateway: 10.0.2.1
    DNS: 10.0.2.230
  6. Change computer name (GRAMMI)
  7. Join gummis.com domainNote, if you get the issue with duplicate SID run sysprep as mentioned in step 6 of Setup database virtual machine
  8. If you are going to access your SharePoint server from the network as well do not forget to login to add firewall inbound rules for TCP/IP ports 80 and 443.
  9. Done (Actual SharePoint 2013 SP1 installation will be done using AutoSPinstaller in upcoming posts)

 

SSRS scale-out deployment configuration error

Introduction

This article describes SSRS scale-out deployment configuration error that can occur during migration of the ReportServer database from SSRS 2008 Enterprise/Developmnet to SSRS 2012 Standard (Native mode). When importing encryption keys from the source server a scale out deployment server is added to the configuration on the new target server that cannot be removed through the user interface. The scale-out deployment configuration is also not supported by the SSRS 2012 Standard edition. I have not verified the reason for this behaviour in detail but the affected system(s) in the article is:
Microsoft SQL Server 2008 SP1 10.0.2841.0 (x64) Developer Edition (source system)
Microsoft SQL Server 2012 SP1 11.0.3401.0 (x64) Standard Edition (target system)
Microsoft Windows Server 2012 R2 Version 6.3 (Build 9600) (target system)

Problem

You have your old server  (SQL 2008 SP1) with lot of reports that you would like to migrate to a new server with a newer version of Reporting Services (SQL 2012). To accomplish this you can migrate the ReportServer database from the old server to the new.

Migration steps:

  1. Backup ReportServer database on source machine
  2. Backup encryption keys on the source machine
  3. Restore of ReportServer on the target machine
  4. Change/connect the database to ReportingServices on the target machine using Microsoft Reporting Services Configuration Manager
  5. Restore encryption keys on the target machine

For more detailed explanations please see:
Migrate a Reporting Services Installation (Native Mode)
http://msdn.microsoft.com/en-us/library/ms143724.aspx

Moving the Report Server Databases to Another Computer
http://msdn.microsoft.com/en-us/library/ms156421.aspx

Scale-out deployment configuration error
However when doing step 5 the old server will be added for scale-out deployment on the target machine. If the source and target machine are using different licenses of Reporting Services you might encounter issues that some features are not supported when migrating to a less featured sql server license.

One such issue can be when you try to browse to the report manager url and get the following error:
ssrs scale out deployment error
The feature: “Scale-out deployment” is not supported in this edition of Reporting Services. (rsOperationNotSupported)

Normally that should not impose a problem since you would be able to remove the old server from scale-out deploment from the list in Microsoft Reporting Services Configuration Manager:
SSRS Delete Scale Out Deployment Server Error

At least one would think so… however as seen in the image above this might not work as expected. If you click the link “Tell me more about the problem and how to resolve it” you will get the following:

Microsoft.ReportingServices.WmiProvider.WMIProviderException: No report servers were found. ---> System.Management.ManagementException: Invalid namespace
 at System.Management.ManagementException.ThrowWithExtendedInfo(ManagementStatus errorCode)
 at System.Management.ManagementScope.InitializeGuts(Object o)
 at System.Management.ManagementScope.Initialize()
 at System.Management.ManagementScope.Connect()
 at ReportServicesConfigUI.WMIProvider.RSInstances.GetInstances(String machineName)
 --- End of inner exception stack trace ---
 at ReportServicesConfigUI.WMIProvider.RSInstances.GetInstances(String machineName)
 at ReportServicesConfigUI.WMIProvider.RSInstances.GetInstance(String machineName, String instanceName)
 at ReportServicesConfigUI.Panels.ClusterManagementPanel.ConfigureWebFarm(Object sender, RSReportServerInfo[] rsInfos)

Solution

Fortunately there is an easy solution for this problem. Open up table dbo.Keys in the ReportServer database on the target server and look at its contents:
ReprtServer dbo.keys table

Delete the old server from the list and the scale-out deployment error should be gone when you open up report manager url. (At least if you do not have any additional enterprise/developer features referenced from the reportserver database)

DELETE FROM [ReportServer].[dbo].[Keys]
WHERE MachineName = 'YourSourceServerName'

Reporting Services in SharePoint 2013 SP1 – There is a compatibility range mismatch between the Web server and database

Do you ever get tired of misleading error messages? I did my first installation of SharePoint SP1 on Windows Server 2012 R2 today. After I installed Reporting Services Add-In and confgiured Reporting Services Service Application I got the following error when clicking System Settings in the Manage Reporting Services Application – Reporting Services Service Application page in Central Admin:

Throwing Microsoft.ReportingServices.Diagnostics.Utilities.InternalCatalogException: , Microsoft.ReportingServices.Diagnostics.Utilities.InternalCatalogException: An internal error occurred on the report server. See the error log for more details. ---> Microsoft.SharePoint.Upgrade.SPUpgradeCompatibilityException: There is a compatibility range mismatch between the Web server and database "SharePoint_AdminContent", and connections to the data have been blocked to due to this incompatibility. This can happen when a content database has not been upgraded to be within the compatibility range of the Web server, or if the database has been upgraded to a higher level than the web server. The Web server and the database must be upgraded to the same version and build level to return to compatibility range. 
 at Microsoft.SharePoint.Administration.SPPersistedUpgradableObject.ValidateBackwardsCompatibility() 
 at Microsoft.SharePoint.SPSite.PreinitializeServer(SPRequest request) 
 at Microsoft.SharePoint.SPWeb.InitializeSPRequest() 
 at Microsoft.SharePoint.SPWeb.EnsureSPRequest() 
 at Microsoft.SharePoint.SPWeb.InitWebPublic() 
 at Microsoft.SharePoint.SPWeb.get_AppInstanceId() 
 at Microsoft.SharePoint.SPWeb.get_Url() 
 at Microsoft.ReportingServices.SharePoint.Server.Utility.GetSPItemMetaDataAndContent(ItemSpecifier itemSpecifier, UserContext userContext, Boolean returnContent, Boolean wrapFileNotFoundOnOpenWeb, Byte[]& content) 
 at Microsoft.ReportingServices.SharePoint.Server.SharePointAuthorizationExtension.InternalCheckAccess(UserContext userContext, ExternalItemPath itemPath, RSSPBasePermissions requiredRights) 
 at Microsoft.ReportingServices.SharePoint.Server.SharePointAuthorizationExtension.CheckAccess(UserContext userContext, ExternalItemPath path, CatalogOperation requiredOperation) 
 at Microsoft.ReportingServices.Library.GetSystemPropertiesAction.PerformActionNow() 
 at Microsoft.ReportingServices.Library.RSSoapAction`1.Execute() -

Solution:
The user that you have configured as service application account for reporting services does not have enough permissions on the SharePoint_Admin Content database. Adding the service application account to the role SPDataAccess of SharePoint_Admin Content database seem to solve the problem.

Installing mongodb on Raspberry Pi (using pre-compiled binaries)

For some time I have been struggling to install mongodb on Raspberry Pi v2 (512mb) by compiling it from source. This guide show you how to download, install and configure a pre-compiled version of mongodb.

I am using Raspbian Debian Wheesy:
Linux <hostname> 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l GNU/Linux

At first I could not find any suitable pre-compiled binaries around and then tried several tutorials which involved to compile mongodb from source:
http://c-mobberley.com/wordpress/2013/10/14/raspberry-pi-mongodb-installation-the-working-guide/
and
http://mongopi.wordpress.com/2012/11/25/installation/

However no matter how I did it seemed that the poor Raspberry died or crashed each time I tried to compile the source without any useful errors or explanations. It simple just died. I tried to adjust my overclokcing settings, swap size, free up RAM etc. But still it did not want to work.

After days and hours of struggling with compling the sources I decided to give up and went to search for a pre-compiled version and found one that I got working:
https://github.com/brice-morin/ArduPi

Thank you Brice Morin!

Contents

Step-by-step guide

Update system

Run as root or with sudo:

apt-get update
apt-get upgrade

Download mongodb

I have put together a package with the binaries from Brice which you can download here or you can use git to fetch the files directly from his repository.
Download mongodb-rpi_20140207 binaries

Install and configure

Run as root or with sudo:

adduser --firstuid 100 --ingroup nogroup --shell /etc/false --disabled-password --gecos "" --no-create-home mongodb

cp -R mongodb-rpi/mongo /opt
chmod +x /opt/mongo/bin/*

mkdir /var/log/mongodb 
chown mongodb:nogroup /var/log/mongodb
mkdir /var/lib/mongodb
chown mongodb:nogroup /var/lib/mongodb

cp mongodb-rpi/debian/init.d /etc/init.d/mongod
cp mongodb-rpi/debian/mongodb.conf /etc/

ln -s /opt/mongo/bin/mongod /usr/bin/mongod
chmod u+x /etc/init.d/mongod

update-rc.d mongod defaults
/etc/init.d/mongod start

Browse to http://localhost:28017/ to see status of mongo db.

Mongod Status

Enjoy!