<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-3124192946915692460</atom:id><lastBuildDate>Wed, 17 Apr 2013 15:19:27 +0000</lastBuildDate><category>Configuresoft</category><category>Visual Studio</category><category>MOF/ITIL</category><category>Cloud Computing</category><category>Data Mining</category><category>SQL Server</category><category>EMC</category><category>SharePoint</category><category>MVP</category><category>Visual Studio Team System</category><category>Reporting Services</category><category>Big Data</category><category>Business Intelligence</category><category>TechEd</category><category>Statistical Analysis</category><category>The Cloud</category><category>PDC2008</category><category>Community</category><category>ALM</category><category>Scrum</category><category>SQL Azure</category><category>Windows Azure</category><category>Virtual Conference</category><category>Virtualization</category><category>Agile Development</category><category>General Ramblings</category><category>R</category><category>Decision Tree</category><title>Business Intelligence and Agile Development</title><description>Agile methods for the Database and BI Developer</description><link>http://blog.sqltrainer.com/</link><managingEditor>noreply@blogger.com (Ted Malone)</managingEditor><generator>Blogger</generator><openSearch:totalResults>188</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-3519820865243962858</guid><pubDate>Tue, 17 Jan 2012 21:41:00 +0000</pubDate><atom:updated>2012-01-17T14:41:04.709-07:00</atom:updated><title>Installing and Configuring Apache Hadoop for Windows</title><description>&lt;p&gt;As has been mentioned in previous posts, I’ve been spending a lot of time recently working with “Big Data” technologies and have been working more recently with &lt;a href="http://hadoop.apache.org/" target="_blank"&gt;Apache Hadoop&lt;/a&gt; and associated distributed computing and analytics mechanisms.     &lt;br /&gt;&lt;/p&gt;  &lt;h4&gt;What Exactly Is Hadoop?&lt;/h4&gt;  &lt;p&gt;If you haven’t been exposed to it, Apache Hadoop is an open-source framework that enables distributed processing of very large scale data sets. In a nutshell, Hadoop is comprised of a distributed filesystem, and a framework that allows you to execute distributed MapReduce jobs.&lt;/p&gt;  &lt;p&gt;Hadoop has been developed for various flavors of Linux, but is created in such a manner that it can actually run on Windows as long as you have a shell that can support running Linux commands and scripts. While I don’t think anyone would argue that you’d want to do this for a large-scale production environment, running Hadoop on Windows allows folks like me who are very comfortable with installing, using and maintaining Windows servers to also work with exciting technologies such as Hadoop.&lt;/p&gt;  &lt;p&gt;The purpose of this post is to demonstrate how you can install and configure a Hadoop cluster using Windows Server 2008 R2. In future posts I’ll demonstrate using Hadoop and the Hadoop Filesystem (HDFS) to perform large-scale analytics on unstructured data (which is what it excels at!)&lt;/p&gt;  &lt;h4&gt;Preparing to Install Hadoop&lt;/h4&gt;  &lt;p&gt;The first thing you’ll need to do in order to install Hadoop is to prepare a “virgin” Windows Server. While I suppose it’s not really necessary, from my perspective the fewer things that are installed on the server the better. In my case, I’m using Windows Server 2008 R2 with SP1 installed and nothing else. No roles are enabled and no extra software has been installed. &lt;/p&gt;  &lt;p&gt;Once you have a server ready (multiple servers if you want to install an actual Hadoop cluster) you’ll want to install &lt;a href="http://www.cygwin.com/" target="_blank"&gt;Cygwin&lt;/a&gt;. If you aren’t familiar, Cygwin is basically a Linux bash shell for Windows. Cygwin is open source and is available as a free download from: &lt;a href="http://cygwin.com/install.html"&gt;http://cygwin.com/install.html&lt;/a&gt; Keep in mind that this is just a web installer. To support Hadoop, we will need to ensure that we install the openssh package and it’s associated pre-requisites. In order to do this, start the setup.exe program, select c:\cygwin as the root folder, and then click next. When you get to the screen that asks you to select a package, search for openssl and then click the “skip” (Not exactly intuitive, but it works) text to enable the checkbox for install as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-GgBz4tXtfzs/TxXqtPEUqVI/AAAAAAAABCM/eS395iBLALw/s1600-h/image3.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-I-SHnQjawUE/TxXqtfX6jHI/AAAAAAAABCU/laFlD6J3XHQ/image_thumb1.png?imgmax=800" width="554" height="403" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Once you have selected the openssh library, click next and then answer “Yes” when asked if you want to install the package pre-requisites. Click next and then finish the wizard for install. This will take some time, so be patient.&lt;/p&gt;  &lt;p&gt;Once the install is complete, you’ll want to start the Cygwin terminal as administrator (right-click on the icon and select “run as administrator”) This will then setup your shell environment as follows:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-yeJoTxdpZpI/TxXqttPyDGI/AAAAAAAABCc/160QUbDwNKU/s1600-h/image7.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-6FBxX-xENgg/TxXqtyWEkbI/AAAAAAAABCk/okc6utQt4Pg/image_thumb3.png?imgmax=800" width="554" height="279" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Once Cygwin is installed and running properly, you’ll need to configure the ssh components in order for the hadoop scripts to execute properly.&lt;/p&gt;  &lt;h4&gt;Configuring Openssh&lt;/h4&gt;  &lt;p&gt;The first step in configuring the ssh server is to run the configuration wizard. Do this by executing &lt;strong&gt;&lt;em&gt;ssh-host-config&lt;/em&gt;&lt;/strong&gt; from the cygwin terminal window, which will start the wizard. Answer the questions as follows:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-02Lv525DA6k/TxXquH1qm0I/AAAAAAAABCs/jwg25CdK4lA/s1600-h/image%25255B20%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-VtXF1dT2SCQ/TxXqutah3xI/AAAAAAAABC0/wQGckND8-gI/image_thumb%25255B9%25255D.png?imgmax=800" width="554" height="304" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;(No to Privilege separation, Yes to install as a service, and CYGWIN as the value)&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-98MU6Y5CGCE/TxXqu6pbKAI/AAAAAAAABC8/UDNDPGpBZ50/s1600-h/image%25255B28%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-ZVHYPCgRCAM/TxXqvCKe_9I/AAAAAAAABDE/Spwc3wmipyE/image_thumb%25255B13%25255D.png?imgmax=800" width="554" height="304" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;(yes to different name, and sshd for the name, yes to new privileged user, and a password you can remember)&lt;/p&gt;  &lt;p&gt;Once the configuration is complete, open the Services control panel (start/administrative tools/services) and right-click on the Cygwin sshd Service and select start. It should start.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-SGlXea56YAc/TxXqvq9jm6I/AAAAAAAABDM/DIsrCiNU8U0/s1600-h/image%25255B32%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-5RsSQuPsuTQ/TxXqv5hd0EI/AAAAAAAABDU/lwJLb55okEs/image_thumb%25255B15%25255D.png?imgmax=800" width="554" height="407" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If the service doesn’t start, the most likely cause is that the ssh user was not created properly. You can manually create the user and then add it to the service startup and it should work just fine.&lt;/p&gt;  &lt;p&gt;Once the service is started, you can test it by entering the following command in the Cygwin terminal:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;ssh localhost&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-vHMv9dRSpXE/TxXqwD6m6lI/AAAAAAAABDc/CUmj-W-GDok/s1600-h/image%25255B36%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-MOTUSwlzBEw/TxXqwhRoTsI/AAAAAAAABDk/B-S_bV7MVMU/image_thumb%25255B17%25255D.png?imgmax=800" width="554" height="302" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Answer “yes” when prompted about the fingerprint, and you should be ready to go. &lt;/p&gt;  &lt;p&gt;The next step in configuring ssh for use with Hadoop is to configure the authentication mechanisms (otherwise you’ll be typing your password a lot when running Hadoop commands).&lt;/p&gt;  &lt;p&gt;Since the Hadoop processes are all invoked via shell scripts and make use of ssh for all operations on the machine (including local operations), you’ll want to generate key-based authentication that can be used so that ssh doesn’t require the use of a password every time it’s invoked. In order to do this, execute the following command in the Cygwin terminal:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;ssh-keygen –t dsa –P ‘’ –f ~/.ssh/id_dsa&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-X9JK0nz5Qws/TxXqwzMWS3I/AAAAAAAABDs/kj978sQFupI/s1600-h/image%25255B40%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-Vp_gLiityZ0/TxXqxH8wYnI/AAAAAAAABD0/WFBARBjwLrM/image_thumb%25255B19%25255D.png?imgmax=800" width="554" height="302" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Once you have the key generated and saved, we’ll need to copy it with the following command:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;cat ~/.ssh/id_dsa.pub &amp;gt;&amp;gt; ~/.ssh/authorized_keys&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-O45dLBH7Rqg/TxXqxa27OXI/AAAAAAAABD8/Dmh2YSOnFFs/s1600-h/image%25255B46%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-U9SR5oyAyeM/TxXqxu87iPI/AAAAAAAABEE/GvJ_uOUutCs/image_thumb%25255B23%25255D.png?imgmax=800" width="554" height="109" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This will take the key you just generated and save it to the list of authorized keys for ssh.&lt;/p&gt;  &lt;p&gt;Now that ssh is properly configured, you can move on to installing and configuring Hadoop.&lt;/p&gt;  &lt;h4&gt;&lt;/h4&gt;    &lt;h4&gt;Downloading and installing Hadoop&lt;/h4&gt;  &lt;p&gt;Apache Hadoop is available for download from one of many mirror sites linked from the following page: &lt;a href="http://www.apache.org/dyn/closer.cgi/hadoop/common/"&gt;http://www.apache.org/dyn/closer.cgi/hadoop/common/&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Basically just choose one of the sites, and it should bring you to a page that looks similar to the following figure. Note that I am going to use version 1.0.0 which as of this writing is the latest, but the release train for hadoop sometimes moves pretty fast, so there will likely be more releases available very soon.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-TtwS_OOEdjM/TxXqxxE7QXI/AAAAAAAABEM/-HqRqCj1Rps/s1600-h/image11.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-8jzr0obUjEU/TxXqyLqaQwI/AAAAAAAABEU/RF0USII9u70/image_thumb5.png?imgmax=800" width="554" height="604" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Click on the version you want to install (again, I am going to be using 1.0.0 for this post) and then download the appropriate file. In my case, that will be &lt;strong&gt;&lt;em&gt;hadoop-1.0.0-bin.tar.gz&lt;/em&gt;&lt;/strong&gt; as shown in the following figure:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-xF4liGPWlcM/TxXqyrUeMgI/AAAAAAAABEc/0n1aWZ2gmYE/s1600-h/image15.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-aQsxPInSRcU/TxXqy4GjqfI/AAAAAAAABEk/84wovB52-qo/image_thumb7.png?imgmax=800" width="554" height="550" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Once you have the file downloaded, you’ll want to use a program such as &lt;a href="http://www.rarlab.com/" target="_blank"&gt;Winrar&lt;/a&gt; which can understand the .tar.gz file format. &lt;/p&gt;  &lt;p&gt;The actual install process is very simple. You simply open the downloaded file in Winrar (or other program that can understand the format) and extract all files to c:\cygwin\usr\local as shown below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-oaMTRCWNnN8/TxXqzdynUmI/AAAAAAAABEs/lOLItMk4TxA/s1600-h/image19.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-B0PtUevxBvM/TxXqzi8vdiI/AAAAAAAABE0/aaRwrkf_ZJg/image_thumb9.png?imgmax=800" width="554" height="400" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Once you are done, you’ll want to rename the c:\cygwin\usr\local\hadoop-1.0.0 folder to hadoop. (This just makes things easier as you’ll see once we start configuring and testing hadoop)&lt;/p&gt;  &lt;p&gt;You will also need the latest version of the Java SDK installed, which can be downloaded from &lt;a href="http://www.java.com"&gt;http://www.java.com&lt;/a&gt; (make sure you download and install the SDK and not the runtime, you’ll need the server JVM) . To make things easier, you can change the target folder for the java install to c:\java (although this is not a requirement – I find it easier to use than the default path, as you’ll have to escape all of the spaces and parens when adding this folder to the configuration files)&lt;/p&gt;  &lt;h4&gt;Configuring Hadoop&lt;/h4&gt;  &lt;p&gt;For the purposes of this post, I’m just going to configure a single node Hadoop cluster. This may seem counter-intuitive, but the point here is that we get a single node up and running properly and then we can add additional nodes later. &lt;/p&gt;  &lt;p&gt;One of the key configuration needs for Hadoop is the location where it can find the Java Runtime. Note that the configuration files we’re going to work with are Unix/Linux files and thus don’t look very good (or work very well) when you use a standard Windows text editor like Notepad. To keep things simple, use a text editor that supports Unix formats such as &lt;a href="http://liquidninja.com/metapad/" target="_blank"&gt;MetaPad&lt;/a&gt;. Assuming that you extracted the Hadoop files as described above and renamed the root folder to Hadoop, open the C:\cygwin\usr\local\hadoop\etc\hadoop\hadoop-env.sh file. Locate the line that contains JAVA_HOME, remove the # in front of it, and replace the folder with the location you installed the Java sdk (in my example, C:\java\jre. (Note that this is a Unix file, so special characters must be escaped with a “\”, so in my case the path is c:\\java\\jre). &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-YGXZz1ppXXg/TxXqz2ao-PI/AAAAAAAABE8/oYy5mzP-9as/s1600-h/image%25255B86%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-4smRmzP06Lw/TxXq0XgarcI/AAAAAAAABFE/BCvwefcr_CY/image_thumb%25255B43%25255D.png?imgmax=800" width="554" height="460" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This is really all that is required to change in this file, but if you want to know more about the contents you can check out the Hadoop documentation here: &lt;a href="http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html"&gt;http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html&lt;/a&gt; (&lt;strong&gt;&lt;em&gt;Note that this guide is based on the 0.20.2 release and *not* the 1.0.0 release I detailing here. The docs haven’t quite caught up with the release at the time of this posting&lt;/em&gt;&lt;/strong&gt;)&lt;/p&gt;  &lt;p&gt;Once you save and close that file, you can verify that Hadoop is properly running by executing the following command inside of Cygwin. (Make sure you change to the /usr/local/hadoop directory first)&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;strong&gt;bin/hadoop version&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-lKAwEqJ9rws/TxXq0j_40lI/AAAAAAAABFM/QRAaN7CpCB4/s1600-h/image%25255B8%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-BNbRleYTRNs/TxXq06xXYII/AAAAAAAABFU/3dgMzpo2iYM/image_thumb%25255B3%25255D.png?imgmax=800" width="554" height="306" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;You should see a Cygwin warning about MSDOS file paths, and then a version of Hadoop and Subversion located as shown in the figure above. If you do not see a similar output, you likely do not have the path to your Java home set correctly. If you installed Java in the default path, remember that all spaces, slashes and parens must be escaped first. The default path would look like: &lt;em&gt;&lt;strong&gt;C:\\Program\ Files\ \(x86\)\\Javaxxxx (you get the idea and probably understand now why I said it would be better to put it in a simple folder)&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Once you have the basic Hadoop configuration working, the next step is to configure the site environment settings. This is done via the C:\cygwin\usr\local\hadoop\etc\hadoop\hdfs-site.xml file. Again open this file with MetaPad or a similar editor, and add the following configuration items to the file. Of course replace “hadoop2” with your host name as appropriate:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color: blue"&gt;&amp;lt;?&lt;/span&gt;&lt;span style="color: #a31515"&gt;xml &lt;/span&gt;&lt;span style="color: red"&gt;version&lt;/span&gt;&lt;span style="color: blue"&gt;=&lt;/span&gt;&amp;quot;&lt;span style="color: blue"&gt;1.0&lt;/span&gt;&amp;quot; &lt;span style="color: red"&gt;encoding&lt;/span&gt;&lt;span style="color: blue"&gt;=&lt;/span&gt;&amp;quot;&lt;span style="color: blue"&gt;utf-8&lt;/span&gt;&amp;quot;&lt;span style="color: blue"&gt;?&amp;gt;&lt;br /&gt;&amp;lt;?&lt;/span&gt;&lt;span style="color: #a31515"&gt;xml-stylesheet &lt;/span&gt;&lt;span style="color: gray"&gt;type=&amp;quot;text/xsl&amp;quot; href=&amp;quot;configuration.xsl&amp;quot;&lt;/span&gt;&lt;span style="color: blue"&gt;?&amp;gt;&lt;br /&gt;&amp;lt;!-- &lt;/span&gt;&lt;span style="color: green"&gt;Put site-specific property overrides in this file. &lt;/span&gt;&lt;span style="color: blue"&gt;--&amp;gt;&lt;br /&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;configuration&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;fs.default.name&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;hdfs://hadoop2:47110&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;mapred.job.tracker&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;hadoop2:47111&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;dfs.replication&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;2&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;configuration&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;This basically configures the local host (in my case it’s named hadoop2) filesystem and job tracker, and sets the dfs replication to 2 blocks. You can read about this file and it’s values here: &lt;a href="http://wiki.apache.org/hadoop/HowToConfigure"&gt;http://wiki.apache.org/hadoop/HowToConfigure&lt;/a&gt; (&lt;strong&gt;&lt;em&gt;again remember that the docs are outdated for my particular installation, but they still work&lt;/em&gt;&lt;/strong&gt;)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;We will also need to configure the mapred-site.xml file to specify the configuration for the mapreduce service:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="code"&gt;&lt;span style="color: blue"&gt;&amp;lt;?&lt;/span&gt;&lt;span style="color: #a31515"&gt;xml &lt;/span&gt;&lt;span style="color: red"&gt;version&lt;/span&gt;&lt;span style="color: blue"&gt;=&lt;/span&gt;&amp;quot;&lt;span style="color: blue"&gt;1.0&lt;/span&gt;&amp;quot;&lt;span style="color: blue"&gt;?&amp;gt;&lt;br /&gt;&amp;lt;?&lt;/span&gt;&lt;span style="color: #a31515"&gt;xml-stylesheet &lt;/span&gt;&lt;span style="color: gray"&gt;type=&amp;quot;text/xsl&amp;quot; href=&amp;quot;configuration.xsl&amp;quot;&lt;/span&gt;&lt;span style="color: blue"&gt;?&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;!-- &lt;/span&gt;&lt;span style="color: green"&gt;Put site-specific property overrides in this file. &lt;/span&gt;&lt;span style="color: blue"&gt;--&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;configuration&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;mapred.job.tracker&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;name&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;    &amp;lt;&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;/span&gt;hadoop2:8021&lt;span style="color: blue"&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;value&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;  &amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;property&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;&amp;lt;/&lt;/span&gt;&lt;span style="color: #a31515"&gt;configuration&lt;/span&gt;&lt;span style="color: blue"&gt;&amp;gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;This basically configures the mapreduce job tracker to use port 8021 on the local host. There is a LOT more to both of these configuration files, I’m only presenting the basics to get things up and running here.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Now that we have the basic environment setup, we need to format the HDFS filesystem that Hadoop will use. In the configuration file above, I did not specify a location for the DFS files. This means that they will be stored in /tmp. This is OK for our testing and for a small cluster, but for production systems&amp;#160; you’ll want to make sure that you specify the location by using the dfs.name.dir and dfs.data.dir configuration items (which you can read about in the link I provided above).&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;To format the filesystem, enter the following command in cygwin:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;bin\hadoop namenode –format&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/-3zgEnAUcWWc/TxXq1PiBoJI/AAAAAAAABFc/MQAsWCvG8Lc/s1600-h/image%25255B12%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-XluTUAw1J5w/TxXq1lFW0QI/AAAAAAAABFk/FUWwNDW41Vk/image_thumb%25255B5%25255D.png?imgmax=800" width="554" height="304" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;If you have properly configured the hdfs-site xml file, you should see output that is similar to the above. Note in my case that the default folder location is &lt;em&gt;&lt;strong&gt;/tmp/hadoop-tmalone/dfs/name&lt;/strong&gt;&lt;/em&gt;. Remember this directory name as you will need it later.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Now that we have the configuration in place and the filesystem formatted, we can start the Hadoop subsystems. The first thing we’ll want to do is start the DFS subsystem. Do this with the following command in the Cygwin terminal:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;sbin/start-dfs.sh &lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-0Q-Lmkk9TX8/TxXq1spVm2I/AAAAAAAABFs/5gA1LKVRvcU/s1600-h/image%25255B50%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-jns4EqtGkXw/TxXq2PjeAGI/AAAAAAAABF0/GuOJUn2QY-A/image_thumb%25255B25%25255D.png?imgmax=800" width="554" height="304" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;This should take a few moments, and you should see output as shown in the figure above. Note that the logs are stored in &lt;strong&gt;&lt;em&gt;/usr/local/hadoop/logs&lt;/em&gt;&lt;/strong&gt;. You can verify that DFS is running by examining the namenode log:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/-rnlDEUtaYR8/TxXq27kOZSI/AAAAAAAABF8/uBkB8kCh2A0/s1600-h/image%25255B54%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-ZS9NeUnSW2I/TxXq3vGH3ZI/AAAAAAAABGE/c2JmCcXiNvo/image_thumb%25255B27%25255D.png?imgmax=800" width="554" height="327" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;You can also verify that the DFS service is running by checking the monitor (assuming you used the ports as I described them in the hdfs-site xml configuration above). To check the monitor, open a web browser and navigate to the following site:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://localhost:50070"&gt;http://localhost:50070&lt;/a&gt; &lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/-Cgj1Pc2yxiw/TxXq3_ct2hI/AAAAAAAABGM/FtR-8TUgecA/s1600-h/image%25255B90%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-DFEm1AZYmxQ/TxXq4TsBB8I/AAAAAAAABGU/ZJkDBMRVoR4/image_thumb%25255B45%25255D.png?imgmax=800" width="554" height="525" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;If the DFS service is properly running, you will see a status screen like the above. If you get an error when attempting to open that page, it is likely that DFS is not running and you will need to check the log file to determine what has gone wrong. Most common problem is a misconfiguration of the site-xml file, so double check that the file is correct.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Once DFS is up and running, you can start the mapreduce process as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;sbin/start-mapred.sh&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-r0SVW8ofZhc/TxXq4Vq-UVI/AAAAAAAABGc/-zadqTWgpKU/s1600-h/image%25255B66%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-TYopiQXGRFU/TxXq4rGRYbI/AAAAAAAABGk/i-TEeUniwVY/image_thumb%25255B33%25255D.png?imgmax=800" width="554" height="121" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;You can test that the Mapreduce process is running by checking the monitor. Open a web browser and navigate to:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://localhost:50030"&gt;http://localhost:50030&lt;/a&gt; &lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/-rwB14hZPN_Y/TxXq5LLs4CI/AAAAAAAABGs/hZpKsZ33qz8/s1600-h/image%25255B94%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-Fvme4kUcrTg/TxXq5vLMsUI/AAAAAAAABG0/4TG7qC0lEu4/image_thumb%25255B47%25255D.png?imgmax=800" width="554" height="410" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Now we have success! We’ve installed and configured Hadoop, formatted the DFS filesystem, and started the basic processes necessary to use the power of Hadoop for processing!&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Testing the installation&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Each Hadoop distribution comes with a set of samples that can be used to very that the system is functional. The samples are stored in Java “jar” files and are located in the hadoop/share/hadoop directory. One simplistic test would be to copy some text files into DFS and then use the sample mapreduce job to enumerate them. First though, you will likely want to setup an alias to make entering the commands a little easier. In my case, I will alias the hadoop dfs command to simply “hdfs”. Do accomplish this, type the following command in the Cygwin terminal window:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;alias hdfs=”bin/hadoop dfs”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-mmdYwrErvnc/TxXq50Uc3jI/AAAAAAAABG8/zoqmCBNwGEw/s1600-h/image%25255B74%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-aFEqlwEuj1w/TxXq6OQcxyI/AAAAAAAABHE/uGEQ_DR_gkc/image_thumb%25255B37%25255D.png?imgmax=800" width="554" height="89" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;For the first part of our test, we will copy the configuration files from the hadoop directory into DFS. In order to do this, we will use the dfs –put command (for more information on the put command, see the docs here: &lt;a href="http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html"&gt;http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html&lt;/a&gt; (again remember that the docs are a little behind)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;In the Cygwin terminal window (still in the /usr/local/hadoop directory) execute the following command:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;hdfs –put etc/hadoop conf&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/-_gcJZz2kWZk/TxXq6LN0IOI/AAAAAAAABHQ/xNVvwWN8ol0/s1600-h/image%25255B78%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-CY-K4DAEJ7E/TxXq6sUKOrI/AAAAAAAABHU/j1YfBXuts7g/image_thumb%25255B39%25255D.png?imgmax=800" width="554" height="91" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;This will load all of the files in the /usr/local/hadoop/etc/hadoop directory into HDFS in the conf folder. Since the conf folder doesn’t exist, the –put command will create it. Note that you receive a warning about the platform, but that is OK the files will still copy.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;You can verify that the files were copied by using the dfs –ls command as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;hdfs –ls conf&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-EOYIqzBL418/TxXq7MFDgtI/AAAAAAAABHg/5Ed0A7IBWKw/s1600-h/image%25255B82%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-niuw0l5W-Ag/TxXq7fPmlRI/AAAAAAAABHo/ROHbzfeMnGo/image_thumb%25255B41%25255D.png?imgmax=800" width="554" height="304" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Now that we are sure the files are stored in HDFS, we can use one of the examples that is shipped with Hadoop to analyze the text in the files for a certain pattern. The samples are located in the /usr/local/hadoop/share/hadoop folder, and it’s easiest to change to that folder and execute the sample there. In the Cygwin terminal, execute the following command:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;cd /usr/local/hadoop/share/hadoop&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Once we’re in the folder, we can run a simple IO test to determine how well our cluster DFS IO will perform. In my case, since I’m running this on a VM with a slow disk, I don’t expect much out of the cluster, but it’s a very nice way to test to see that DFS is indeed functioning as it should. Execute the following command to test DFS IO:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;../../bin/hadoop jar hadoop-test-1.0.0.jar testDFSIO –write –nrFiles 10 –filesize 1000&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/-21crASTYcEA/TxXq7t_kRiI/AAAAAAAABHw/aB8rbVrzb7o/s1600-h/image%25255B102%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-OhpO9s4wZnk/TxXq70EqpSI/AAAAAAAABH4/O7lhS4A-uwc/image_thumb%25255B51%25255D.png?imgmax=800" width="554" height="142" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;If you don’t see any exceptions in the output, you have successfully configured Hadoop and the DFS cluster is operational.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Conclusion&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;While it isn’t exactly a simple process, you can indeed get Apache Hadoop up and running on a Windows platform. I’ve taken the path of configuring Hadoop with Cgwin as the shell, but there are those who claim to have installed and configured Hadoop on windows without the use of Cygwin. Either way, I’m just glad it works and folks who don’t want to invest in building a Linux environment have the ability to play around with and use Hadoop.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2012/01/installing-and-configuring-apache.html</link><author>noreply@blogger.com (Ted Malone)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-I-SHnQjawUE/TxXqtfX6jHI/AAAAAAAABCU/laFlD6J3XHQ/s72-c/image_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-6450073502124042302</guid><pubDate>Fri, 30 Dec 2011 23:13:00 +0000</pubDate><atom:updated>2011-12-30T16:13:30.213-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Decision Tree</category><category domain='http://www.blogger.com/atom/ns#'>Data Mining</category><category domain='http://www.blogger.com/atom/ns#'>SQL Server</category><category domain='http://www.blogger.com/atom/ns#'>R</category><title>Fun with Decision Trees using R and SQL Server</title><description>&lt;p&gt;As those who have been reading this blog know, I’ve recently been spending a lot of time doing statistical analysis with R and SQL Server and have really been enjoying digging in to the various bits of functionality that is offered. &lt;/p&gt;  &lt;p&gt;One thing I’ve learned over the years is that I am a very “classification-oriented” individual when it comes to working with data. I work best with data sets that I understand and that I’ve been able to sort into the various buckets that make sense to me based on what I’m trying to accomplish with the data.&lt;/p&gt;  &lt;p&gt;The problem with this need for nice and tidy data classification is that it doesn’t work well when you don’t really have a complete understanding of the data set you’re working with. This especially becomes an issue if you’re trying to mine the data you have in order to predict future outcomes based on past performance (a requirement that is becoming more and more importing to “Data Scientists” as more and more organizations make that shift to true data-driven processes.&lt;/p&gt;  &lt;h4&gt;Understanding Data&lt;/h4&gt;  &lt;p&gt;If you read my &lt;a href="http://blog.sqltrainer.com/2011/12/building-large-real-world-sql-server.html" target="_blank"&gt;previous post&lt;/a&gt; on creating a large SQL Server database, you’ve seen some of the data that I am playing around with. Obviously there is a lot of interesting information locked within the mortgage loan application data stored in the HMDA database. One specific use-case with this data might be to look at some of the deciding factors related to home loans being granted and create some predictive models based on the data. For example, I might want to predict the outcome of a home loan application process based on factors like the purpose of the loan, the income of the applicant, and some other factors such as race. This could be very useful for a mortgage company to look at and see who to target for an ad campaign, or maybe to research things such as do race or sex have any correlation to the amount of a home loan given. In order to answer this use case, we need to have a good understanding of the data we’re working with and see what it looks like when shaped into a “Decision Tree” (for more information on what exactly a decision tree is, take a look at: &lt;a href="http://en.wikipedia.org/wiki/Decision_tree"&gt;http://en.wikipedia.org/wiki/Decision_tree&lt;/a&gt; ) &lt;/p&gt;  &lt;h4&gt;Creating a Data Set to Work With&lt;/h4&gt;  &lt;p&gt;If we wanted to use the home mortgage disclosure data to look at past performance of home loans, the first thing we need to do is create a manageable data set to use to create a model. Since I live in El Paso County, Colorado, I’ll create a table of data that just details information on Home Loans in this area. Given the database that was created earlier (see &lt;a href="http://blog.sqltrainer.com/2011/12/building-large-real-world-sql-server.html" target="_blank"&gt;previous post&lt;/a&gt;) We can create a subset of the data with the following query:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color: blue"&gt;SELECT &lt;br /&gt;    &lt;/span&gt;&lt;span style="color: teal"&gt;loan_purpose &lt;/span&gt;&lt;span style="color: blue"&gt;AS &lt;/span&gt;&lt;span style="color: teal"&gt;[purpose]&lt;br /&gt;    &lt;/span&gt;&lt;span style="color: gray"&gt;,&lt;/span&gt;&lt;span style="color: teal"&gt;loan_amount &lt;/span&gt;&lt;span style="color: blue"&gt;AS &lt;/span&gt;&lt;span style="color: teal"&gt;[amount]&lt;br /&gt;    &lt;/span&gt;&lt;span style="color: gray"&gt;,&lt;/span&gt;&lt;span style="color: blue"&gt;CASE WHEN &lt;/span&gt;&lt;span style="color: teal"&gt;applicant_race &lt;/span&gt;&lt;span style="color: gray"&gt;= &lt;/span&gt;&lt;span style="color: red"&gt;'White' &lt;/span&gt;&lt;span style="color: blue"&gt;THEN &lt;/span&gt;&lt;span style="color: teal"&gt;applicant_race &lt;/span&gt;&lt;span style="color: blue"&gt;ELSE &lt;/span&gt;&lt;span style="color: red"&gt;'Non-White' &lt;/span&gt;&lt;span style="color: blue"&gt;END AS &lt;/span&gt;&lt;span style="color: teal"&gt;[race]&lt;br /&gt;    &lt;/span&gt;&lt;span style="color: gray"&gt;,&lt;/span&gt;&lt;span style="color: teal"&gt;applicant_sex &lt;/span&gt;&lt;span style="color: blue"&gt;AS &lt;/span&gt;&lt;span style="color: teal"&gt;[sex]&lt;br /&gt;    &lt;/span&gt;&lt;span style="color: gray"&gt;,&lt;/span&gt;&lt;span style="color: teal"&gt;applicant_income &lt;/span&gt;&lt;span style="color: blue"&gt;AS &lt;/span&gt;&lt;span style="color: teal"&gt;[income]&lt;br /&gt;    &lt;/span&gt;&lt;span style="color: gray"&gt;,&lt;/span&gt;&lt;span style="color: blue"&gt;CASE WHEN &lt;/span&gt;&lt;span style="color: teal"&gt;denial_reason &lt;/span&gt;&lt;span style="color: gray"&gt;= &lt;/span&gt;&lt;span style="color: red"&gt;'Approved' &lt;/span&gt;&lt;span style="color: blue"&gt;THEN &lt;/span&gt;&lt;span style="color: teal"&gt;denial_reason &lt;/span&gt;&lt;span style="color: blue"&gt;ELSE &lt;/span&gt;&lt;span style="color: red"&gt;'Denied' &lt;/span&gt;&lt;span style="color: blue"&gt;END AS &lt;/span&gt;&lt;span style="color: teal"&gt;[status]&lt;br /&gt;&lt;/span&gt;&lt;span style="color: blue"&gt;INTO &lt;br /&gt;    &lt;/span&gt;&lt;span style="color: teal"&gt;tblElPasoCountyLoanStatus&lt;br /&gt;&lt;/span&gt;&lt;span style="color: blue"&gt;FROM&lt;br /&gt;    &lt;/span&gt;&lt;span style="color: teal"&gt;vColoradoLoans&lt;br /&gt;&lt;/span&gt;&lt;span style="color: blue"&gt;WHERE &lt;/span&gt;&lt;span style="color: teal"&gt;county&lt;/span&gt;&lt;span style="color: gray"&gt;=&lt;/span&gt;&lt;span style="color: red"&gt;'El Paso County'&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&amp;#160;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;What this will do is create a simple table that has 6 columns. I’ve simplified the data slightly so that some of the factors (race, denial reason) are binary values as opposed to continuous. While this is not a necessary transformation for the most part, it helps simplify the output for the purposes of this discussion.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Loading the Appropriate Packages and Libraries in R&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Once I’ve created the table above in SQL Server, I can load the data into R and begin analysis. Before we get too deep in the creation of the decision tree, I should mention that I am going to use a package called “rpart” as well as a package called “rpart.plot”. If you are really curious and would like to know the science behind the rpart package, there is a very dry document here that explains the algorithm in detail: &lt;a href="http://www.mayo.edu/hsr/techrpt/61.pdf"&gt;http://www.mayo.edu/hsr/techrpt/61.pdf&lt;/a&gt;&amp;#160;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The rpart package can be found here: &lt;a title="http://cran.stat.ucla.edu/web/packages/rpart/index.html" href="http://cran.stat.ucla.edu/web/packages/rpart/index.html"&gt;http://cran.stat.ucla.edu/web/packages/rpart/index.html&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;and rpart.plot can be found here: &lt;a title="http://cran.stat.ucla.edu/web/packages/rpart.plot/index.html" href="http://cran.stat.ucla.edu/web/packages/rpart.plot/index.html"&gt;http://cran.stat.ucla.edu/web/packages/rpart.plot/index.html&lt;/a&gt;&amp;#160;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Of course you really don’t need to know where exactly the packages are, you can install them with the R command as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/-OQYY6WnWS-s/Tv5FgQfaC1I/AAAAAAAAA_E/yFsX3doOC0k/s1600-h/image%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-2U9TfUCm_bA/Tv5FguOR8pI/AAAAAAAAA_M/Pn168niNsq0/image_thumb%25255B1%25255D.png?imgmax=800" width="554" height="52" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;When you execute the above command in the R console, you will be prompted for the mirror site you wish to use, and the package will be downloaded, unpacked and installed. Once the packages are installed, you can load them into your R environment as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/-6Uzb95DcKg0/Tv5Fg6tpJJI/AAAAAAAAA_U/mBJSmqCM4e8/s1600-h/image%25255B7%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-XWsOP9ZCglI/Tv5FhCghocI/AAAAAAAAA_c/bvP8iQ1lL9c/image_thumb%25255B3%25255D.png?imgmax=800" width="554" height="48" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Connecting to SQL Server Data&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Once the libraries are loaded, we need to obtain the data we’re going to work with. If you haven’t worked with SQL Server data in R before, you might want to read my &lt;a href="http://blog.sqltrainer.com/2011/12/statistical-analysis-with-r-and.html" target="_blank"&gt;previous post&lt;/a&gt; on connecting R to SQL Server via the ODBC library. First we need to setup an ODBC channel to connect to the table we created above. Is is done via the following command:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/-Fk8Ea-hxCc4/Tv5FhO3vv3I/AAAAAAAAA_k/JTEWQmjIk6A/s1600-h/image%25255B15%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-dZ_rsdE8Ag0/Tv5FhU4JHDI/AAAAAAAAA_s/42hNfrzUh9k/image_thumb%25255B9%25255D.png?imgmax=800" width="554" height="91" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(Remember that “HMDAData” is the name of the ODBC DSN I created to connect to my database)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Once the channel is created, I can load the data from the table via the sqlFetch command in R as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-sy7NPjpRHcI/Tv5FhoQsdwI/AAAAAAAAA_0/PplaWjjsX5I/s1600-h/image%25255B19%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-PwpX13_XP7I/Tv5Fh-WQMfI/AAAAAAAAA_8/wDzLz3DL_Mg/image_thumb%25255B11%25255D.png?imgmax=800" width="554" height="25" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Examining the Data&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;This loads the data from the table into the R variable “loanstatus”. You can view a summary of the loanstatus as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-ysyx_spXEDk/Tv5Fi0DrclI/AAAAAAAABAE/Usw_okGWm9Y/s1600-h/image%25255B23%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-ju6Rs-hTwc4/Tv5FjVjlB5I/AAAAAAAABAM/_IEZmO-vIF8/image_thumb%25255B13%25255D.png?imgmax=800" width="554" height="256" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Already you can see that we’ve extracted some good information out of the database. (as a side note here, I think this is &lt;strong&gt;VERY COOL&lt;/strong&gt;! Think of all the SQL Queries I’d have to run to get this same information from the table I created earlier)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Now that we have the data in an R variable, we can start to create a decision tree. Since we ultimately want to use this data to predict the amount of a loan based on certain factors such as race, sex and income, we’ll create a regression tree shaped to those parameters. We will use the rpart function in order to create the tree. rpart is a very simplistic function that accepts the following parameters:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;  &lt;li&gt;&lt;strong&gt;formula&lt;/strong&gt; – The formula is specified in the following format: &lt;em&gt;outcome&lt;/em&gt; ~ &lt;em&gt;predictor1&lt;/em&gt; + &lt;em&gt;predictor2&lt;/em&gt; + &lt;em&gt;predictor3&lt;/em&gt; etc.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;  &lt;li&gt;&lt;strong&gt;data&lt;/strong&gt; – The specific data frame to use&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;  &lt;li&gt;&lt;strong&gt;method&lt;/strong&gt; – “class” for a classification tree or “anova” for a regression tree&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Remember that within R, you can always type ? &amp;lt;&lt;em&gt;function&lt;/em&gt;&amp;gt; to get a full description of a particular function.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;In our case, the rpart command would look like this:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-weP6zEzpS3A/Tv5Fjq4wplI/AAAAAAAABAU/3wDX-UflDBM/s1600-h/image%25255B27%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-UukvUWqZzMQ/Tv5Fjzc_isI/AAAAAAAABAc/f3AXmKyt3O4/image_thumb%25255B15%25255D.png?imgmax=800" width="554" height="43" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Once we’ve created the tree variable (this can be named anything, I just kept it simple and named it “tree” here) we can look at a summary and determine what it looks like:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/-BrtgWNCATa4/Tv5FkESem8I/AAAAAAAABAk/UcQEIGBR56M/s1600-h/image%25255B31%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-YOC_zQPdJK0/Tv5Fktj8WQI/AAAAAAAABAs/qEBS1puNOPg/image_thumb%25255B17%25255D.png?imgmax=800" width="554" height="233" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(there are a total of 13 nodes in my specific example, and I can’t paste the entire tree here in text form)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;One big factor in determining how useful a particular decision tree going to be is to examine the “complexity parameter” (“cp” for short) for each of the nodes. The CP is used to cross-validate the data and make sure that the tree is pruned to remove the portions of the tree that are “over fitted” (You can read about the terminology here: &lt;a href="http://www.statmethods.net/advstats/cart.html"&gt;http://www.statmethods.net/advstats/cart.html&lt;/a&gt; ) . Since I am a very visual person, I like to see this cross-validation data in chart form as follows:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/-0BIAKjBERlw/Tv5Fks9fK7I/AAAAAAAABA0/ds21STkoZgw/s1600-h/image%25255B35%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-xZGYPduNL-8/Tv5Fk4qu__I/AAAAAAAABA8/GKzmr7wPjOA/image_thumb%25255B19%25255D.png?imgmax=800" width="554" height="54" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Which generates a nice graph showing me the relative size of the tree, the cross-validated error that is generated, and the resulting cp:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/-8elRGRIzhN4/Tv5FlCmVfaI/AAAAAAAABBE/mDrD9vQkS_0/s1600-h/image%25255B40%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-PC2cO2mAqNk/Tv5FlbRAC-I/AAAAAAAABBM/f6MX99FPOXY/image_thumb%25255B22%25255D.png?imgmax=800" width="554" height="565" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Since the tree that I am working with is relatively small, I am not going to worry about pruning it here and removing the “bad” cp values. To generate the decision tree, use the following command:&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/-HkxCYsaegkM/Tv5Flg6Ln9I/AAAAAAAABBU/Z6Biww3-S_o/s1600-h/image%25255B44%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-9A6ar0xmgXA/Tv5FltricTI/AAAAAAAABBc/rB4aV_-3ukc/image_thumb%25255B24%25255D.png?imgmax=800" width="554" height="52" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;(Remember that you can use the ? command to get a full listing of all options for a given function. In this case, by using type=4 I am instructing R to generate a plot containing all nodes and all information, and by using extra=1 I am instructing R to include the number of nodes in each branch)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/-DvhjrFqJLa8/Tv5Fl-QB42I/AAAAAAAABBk/2VroMyTzAa8/s1600-h/image%25255B48%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-01GTVaUzG30/Tv5FmbDzqUI/AAAAAAAABBs/7r9_sugKZxQ/image_thumb%25255B26%25255D.png?imgmax=800" width="554" height="568" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;With this visual of the tree, I can see that income seems to be a deciding factor, and it splits at approximately 90,000. Following the tree to the left, for those with less than 90K income, we see a split for Home Improvement loans versus Home Purchase and refinance. For the purchase and refinance, we see another split at approximately 52K income. Back on the right side of the tree we see a split at approximately 208K income, with the same split for home improvement loans versus purchase and refinance.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Conclusion&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Being the data geek that I am, I could continually refine this model and graph it to start finding the patterns and determining just exactly how the data is shaped. At that point I could feed additional data into the model and use it to predict outcomes based on past performance. There are many things that can be done with Decision Trees and I’ve only scratched the surface here with this post. I’ll be writing many more of these posts in the future as I continue to explore the magic of data mining with R.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/12/fun-with-decision-trees-using-r-and-sql.html</link><author>noreply@blogger.com (Ted Malone)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-2U9TfUCm_bA/Tv5FguOR8pI/AAAAAAAAA_M/Pn168niNsq0/s72-c/image_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-5387329019384605162</guid><pubDate>Wed, 28 Dec 2011 18:53:00 +0000</pubDate><atom:updated>2011-12-28T11:53:49.995-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>SQL Azure</category><category domain='http://www.blogger.com/atom/ns#'>SQL Server</category><category domain='http://www.blogger.com/atom/ns#'>Windows Azure</category><category domain='http://www.blogger.com/atom/ns#'>Cloud Computing</category><title>To The Cloud and Back Again! – SQL Saturday # 104</title><description>&lt;p&gt;For those that are not aware, the &lt;a href="http://www.sqlpass.org" target="_blank"&gt;Professional Association for SQL Server&lt;/a&gt; (PASS) has chapters throughout the world that put on one day events called “&lt;a href="http://www.sqlsaturday.com/" target="_blank"&gt;SQL Saturday&lt;/a&gt;”. As the name implies, these events take place on a Saturday and generally are a full day of targeted learning for those who want to know more about SQL Server and SQL Server technologies.&lt;/p&gt;  &lt;p&gt;This year, the first US SQL Saturday event (There is also an event in Bangalore that same day, and given the time zones, I’d say they qualify as the “first” one of the year!)&amp;#160; is happening right here in Colorado Springs! &lt;a href="http://www.sqlsaturday.com/104/eventhome.aspx" target="_blank"&gt;SQL Saturday #104&lt;/a&gt; has a very distinguished list of speakers, including people like Jason Horner, TJ Belt, Chris Shaw, Thomas LaRock, Karen Lopez and a whole host of very impressive speakers. There’s going to be 5 simultaneous tracks and somehow they even invited me to speak as well, so I’ll be speaking at 0830 in room #4 on “&lt;a href="http://www.sqlsaturday.com/viewsession.aspx?sat=104&amp;amp;sessionid=5968" target="_blank"&gt;To the Cloud and Back Again!&lt;/a&gt;”.&lt;/p&gt;  &lt;h4&gt;Session Description&lt;/h4&gt;  &lt;p&gt;In this session, I’ll be introducing some basic Cloud Computing patterns and will talk about some specific cloud computing security concerns. I’ll then talk about some of the specific technologies that accompany the Windows Azure and SQL Azure platforms that enable a hybrid approach to cloud computing. I’ll demonstrate how Windows Azure roles can be “Domain Joined” that will then allow Azure-based applications to use SQL Server Trusted Connections to connect to on-premises SQL Server databases. All in all I hope it will be a very informative session on Cloud Computing technologies. Hope to see you there!&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-o-Wgwekf-jc/Tvtlu3xWOOI/AAAAAAAAA-0/MzuyD9tc7Vc/s1600-h/image%25255B5%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px auto 5px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-N6i4SYQ1fhs/TvtlvWKVqoI/AAAAAAAAA-8/8HmXw0D4bEE/image_thumb%25255B3%25255D.png?imgmax=800" width="554" height="348" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/12/to-cloud-and-back-again-sql-saturday.html</link><author>noreply@blogger.com (Ted Malone)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-N6i4SYQ1fhs/TvtlvWKVqoI/AAAAAAAAA-8/8HmXw0D4bEE/s72-c/image_thumb%25255B3%25255D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-5462150722258417123</guid><pubDate>Mon, 26 Dec 2011 22:04:00 +0000</pubDate><atom:updated>2011-12-26T15:04:15.703-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Big Data</category><category domain='http://www.blogger.com/atom/ns#'>Data Mining</category><category domain='http://www.blogger.com/atom/ns#'>SQL Server</category><title>Building a large real-world SQL Server demo database</title><description>&lt;p&gt;In my previous post, I discussed a large data set that I was using for demonstrations and working with R. Instead of using the normal AdventureWorks or Northwind databases that Microsoft makes available to us via &lt;a href="http://www.codeplex.com" target="_blank"&gt;Codeplex&lt;/a&gt;, I wanted something a bit more real-world as well as large. I also wanted something that I could continue to build on as more data became available. I had never really found anything that I liked and basically relegated the thought to a background task.&lt;/p&gt;  &lt;p&gt;As I mentioned in the last post,&amp;#160; I had the opportunity to attend an alpha delivery of the &lt;a href="http://www.emc.com" target="_blank"&gt;EMC&lt;/a&gt; “Data Science and Big Data Analytics” training course. As we were working through the labs, I couldn’t help but think that the data set being used was very interesting, and would almost fit what I had been looking for. In speaking with the creators of the class, I learned that they had built the database based on information obtained from the publically-available “Home Mortgage Disclosure Act” reporting site. Here in the US, when you apply for a home mortgage loan you must provide certain information to the lending institution, and they in turn must report information on all applications that they process, whether they are approved or denied. The information itself is stripped of personal-identifiers when it is submitted, but the overall type of data that is reported is extremely interesting from both a volume perspective as well as a content perspective. I decided to spend a bit of time looking in to how I could gain access to this data, and thanks to the magic of the interwebs, I was able to piece together everything I needed in order to build the database I was looking for.&lt;/p&gt;  &lt;h4&gt;Gaining Access to the Home Mortgage Disclosure Act Data&lt;/h4&gt;  &lt;p&gt;(I realize that the information posted here is really only relevant to those of us in the United States, however I believe the resulting data is useful and relevant worldwide for purposes of learning or demo) &lt;/p&gt;  &lt;p&gt;The Home Mortgage Disclosure Act was enacted by Congress in 1975 and is administered by the “&lt;a href="http://www.ffiec.gov/" target="_blank"&gt;Federal Financial Institution Examination Council&lt;/a&gt;”. Because it is a government institution and funded by US taxpayer dollars, the data that they collect and maintain is made available to the public free of charge. Basically what they do is chunk the data into yearly “drops” that are made available according to a specific timeline. You can read about the timeline here: &lt;a title="http://www.ffiec.gov/hmda/timeline.htm" href="http://www.ffiec.gov/hmda/timeline.htm"&gt;http://www.ffiec.gov/hmda/timeline.htm&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The data itself is available from the following link: &lt;/p&gt;  &lt;p&gt;&lt;a title="http://www.ffiec.gov/hmda/" href="http://www.ffiec.gov/hmda/"&gt;http://www.ffiec.gov/hmda/&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;There are multiple ways to obtain the data for a given year. The easiest way to get to the data is to download the “LAR/TS Windows Application” for a given year. (LAR = “Loan Accounts Register”, TS=”Transmittal Sheet”). You can download the application from the following link: &lt;a title="http://www.ffiec.gov/hmda/hmdaproducts.htm" href="http://www.ffiec.gov/hmda/hmdaproducts.htm"&gt;http://www.ffiec.gov/hmda/hmdaproducts.htm&lt;/a&gt; (note the Windows application download links towards the bottom of the page). When downloaded, the application contains all of the data for a given year within a SQL Server Compact Edition (SQLCE) database. The database itself is about 5GB per year. Of course the problem with this format is that it’s strictly a SQLCE database, which means you’ll want to extract the data for use with SQL Server. Another issue with the data is that it is organized by a seemingly random collection of states, meaning that there is no single dataset that contains the data within a given year for the entire country. &lt;/p&gt;  &lt;p&gt;Another way to obtain the data is to download the text files directly from the site. The files are zipped and can be found by downloading the “ALL” file in the LAR table for the year that you are interested in. (At the time of this posting, there are 3 years available, 2008, 2009 and 2010). These files are tab-delimited and will need to be imported to SQL Server.&lt;/p&gt;  &lt;h4&gt;&lt;/h4&gt;  &lt;h4&gt;Creating the HMDA Database&lt;/h4&gt;  &lt;p&gt;Because I want to create a database that will have a single “fact” table containing all of the LAR records, I will need to first create the database and then the table structure necessary. The HMDA data is very “flat” and denormalized, so it works very well as a fact table. There are 45 fields contained in the text file and column names are NOT included in the first row. The data dictionary can be found here: &lt;a href="http://www.ffiec.gov/pmicrawdata/FORMATS/2010PMICLARRecordFormat.pdf" target="_blank"&gt;LAR Record Format&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Since the resulting database will work out to be about 5GB per year, and since we’ll be importing 3 years of data, I’ll start by creating a 15GB database with the following T-SQL(I am using SQL Server 2012 as my destination, so some of the syntax might be slightly different than you are used to) command:&lt;/p&gt;  &lt;p&gt;CREATE DATABASE [HMDAData]   &lt;br /&gt; CONTAINMENT = NONE    &lt;br /&gt; ON&amp;#160; PRIMARY     &lt;br /&gt;( NAME = N'HMDAData', FILENAME = N'C:\SQLData\MDF\HMDAData.mdf' , SIZE = 15GB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024MB )    &lt;br /&gt; LOG ON     &lt;br /&gt;( NAME = N'HMDAData_log', FILENAME = N'C:\SQLData\LDF\HMDAData_log.ldf' , SIZE = 8GB , MAXSIZE = 2048GB , FILEGROWTH = 10%);&lt;/p&gt;  &lt;p&gt;Once the database is created, we can create the table to hold the LAR records. The table can be created with the following command:&lt;/p&gt;  &lt;p&gt;   &lt;br /&gt;CREATE TABLE [dbo].[lar_data](    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [year] [int] NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [respid] [nchar](10) NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [agycd] [nchar](1) NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [loan_type] [int] NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [property_type] [int] NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [loan_purpose] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [occupancy] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [loan_amount] [nchar](5) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [preapproval] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [action_type] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [msa_md] [nchar](5) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [state_code] [int] NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [county_code] [int] NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [tract_code] [nchar](7) NOT NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_ethnicity] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_ethnicity] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_race_1] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_race_2] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_race_3] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_race_4] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_race_5] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_race_1] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_race_2] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_race_3] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_race_4] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_race_5] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_sex] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [co_applicant_sex] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [applicant_income] [nchar](4) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [purchaser_type] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [denial_reason_1] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [denial_reason_2] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [denial_reason_3] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [rate_spread] [nchar](5) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [HOEPA_status] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [lien_status] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [edit_status] [nchar](1) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [seq_number] [nchar](7) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [population] [int] NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [minority_population_percent] [numeric](18, 0) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [median_income] [int] NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [tract_msa_income_percent] [numeric](18, 0) NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [owner_occ_units] [int] NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [owner_occ_1_to_4_family] [int] NULL,    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; [app_date_pre_2004] [nchar](1) NULL    &lt;br /&gt;) ON [PRIMARY];&lt;/p&gt;  &lt;p&gt;Once the table is created, you can either use the SQL Server Import/Export wizard or SSIS to import each of the files to the lar_data table. Since it is a simple data load process without any conversions needed, the Import/Export Wizard works just fine. Here’s an example of importing the 2010 file to the lar_data table:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-cuINkWTmxGM/TvjvTMmiP_I/AAAAAAAAA80/VTNGf5mswOY/s1600-h/image%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-qHJJvQRYO-s/TvjvTmmIg6I/AAAAAAAAA88/NLEAlVAnBAA/image_thumb%25255B1%25255D.png?imgmax=800" width="554" height="565" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-z7CVAOBRLCg/TvjvT1O6p6I/AAAAAAAAA9E/ANNWxATDA0k/s1600-h/image%25255B7%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-DUHFpL4_Pm8/TvjvUHkLQVI/AAAAAAAAA9M/IQpbUy781yw/image_thumb%25255B3%25255D.png?imgmax=800" width="554" height="563" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-R52VloaoKlM/TvjvUfqv-4I/AAAAAAAAA9U/x-C9mSey0jM/s1600-h/image%25255B11%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-BP9cZCmPci8/TvjvUhwn8jI/AAAAAAAAA9c/5JwT_X9bHJ4/image_thumb%25255B5%25255D.png?imgmax=800" width="554" height="565" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-C9tRl4izYd8/TvjvVK8KCkI/AAAAAAAAA9k/hIVGeJLb_4Q/s1600-h/image%25255B15%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-LUCECIIFOiQ/TvjvVj_WaQI/AAAAAAAAA9s/6SDrPT293r4/image_thumb%25255B7%25255D.png?imgmax=800" width="554" height="706" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Because the text file will contain character data there will be CASTs required for most columns. In the event that an error occurs within a specific CAST, it’s best to just ignore it. We’re not trying to create a perfect database, just one that works for most conditions. &lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-8sfcpBUXTv4/TvjvV8zY4yI/AAAAAAAAA90/LGQkRgraFo4/s1600-h/image%25255B19%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-tXz8G_lwVAY/TvjvWXKNjEI/AAAAAAAAA98/LjTJHo8IeO8/image_thumb%25255B9%25255D.png?imgmax=800" width="554" height="565" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-ikfaI8SbtTA/TvjvWqeDrfI/AAAAAAAAA-E/h3HLQ_ps3Dg/s1600-h/image%25255B23%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-1FIWf3Jk0FU/TvjvXNXAOqI/AAAAAAAAA-M/4ASEJSwcOQ8/image_thumb%25255B11%25255D.png?imgmax=800" width="554" height="561" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;When the import for 2010 is complete, there will be approximately 16.3 million rows of data inserted into the table.&lt;/p&gt;  &lt;p&gt;To complete the task, import the remaining years into the table. You should end up with approximately 53.2 million rows in the table if you copy 2008, 2009 and 2010 data.&lt;/p&gt;  &lt;p&gt;You may notice that several of the columns are codes and not well described. You can find the information for each of the columns in the data dictionary linked above, however the data for each of the columns can be a pain to enter. Since I have already extracted the data and created appropriate tables, you can download the following ZIP file (which contains text files that are in .csv format with column names as the first row and the file is named the same as the table it comes from. These files are simple to import using the Import/Export wizard. There is also a .sql file there to create the remaining tables and insert descriptive data): &lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.sqltrainer.com/pics/blogpics/HMDADimensions.zip" target="_blank"&gt;Zip file containing dimension tables and descriptive data&lt;/a&gt;&lt;/p&gt;  &lt;h4&gt;Creating an Appropriate “State” View&lt;/h4&gt;  &lt;p&gt;Now that you have the fact table data loaded, and you’ve used the files I’ve supplied to create the dimensional tables, you’ll likely want to create a subset of the data for specific analysis. Since I live in Colorado, I decided to create a view that shows only Colorado data. The view definition is included below, and you can modify it accordingly to isolate the data for the state you are interested in:&lt;/p&gt;  &lt;p&gt;CREATE VIEW [dbo].[vColoradoLoans]   &lt;br /&gt;AS    &lt;br /&gt;SELECT    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; lt.description AS [loan_type]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,lpt.description AS [property_type]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,lp.description AS [loan_purpose]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,o.description AS [occupancy_type]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,CAST(loan_amount AS money) * 1000 AS [loan_amount]&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,p.description AS [preapproval]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,at.description AS [action_taken]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,s.state_name AS [state]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,c.county_name AS [county]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,e.description AS [applicant_ethnicity]&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,r.description AS [applicant_race]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,sx.description AS [applicant_sex]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,CAST(applicant_income AS money) * 1000 AS [applicant_income]&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,pt.description AS [purchaser_type]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,dr.description AS [denial_reason]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,CASE WHEN rate_spread = 'NA' then 0 ELSE CAST(rate_spread AS numeric) END AS [rate_spread]&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,population AS [tract_population]    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,minority_population_percent    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,median_income    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ,owner_occ_units    &lt;br /&gt;FROM     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; lar_data ld    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblLoanType lt    &lt;br /&gt;ON     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.loan_type = lt.loan_type    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblPropertyType lpt    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.property_type = lpt.property_type    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblLoanPurpose lp    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.loan_purpose = lp.loan_purpose    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblOwnerOccupancy o    &lt;br /&gt;ON     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.occupancy = o.owner_occupancy    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblPreapproval p    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.preapproval = p.preapproval    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblAction at    &lt;br /&gt;ON     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.action_type = at.action_taken    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblState s    &lt;br /&gt;ON     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.state_code = s.state_code    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblCounty c    &lt;br /&gt;ON     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.county_code = c.county_code AND s.state_name=c.state_name    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblEthnicity e    &lt;br /&gt;ON     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.applicant_ethnicity = e.ethnicity    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblRace r    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.applicant_race_1 = r.race    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblSex sx    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.applicant_sex = sx.sex    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblPurchaserType pt    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.purchaser_type = pt.purchaser_type    &lt;br /&gt;JOIN    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; tblDenialReason dr    &lt;br /&gt;ON    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.denial_reason_1 = dr.reason    &lt;br /&gt;WHERE     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.state_code = '08' -- Colorado    &lt;br /&gt;AND    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.property_type = 1 -- Single Family Homes    &lt;br /&gt;AND    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ld.applicant_income &amp;lt;&amp;gt; 'NA' -- remove invalid income reports    &lt;br /&gt;;&lt;/p&gt;  &lt;h4&gt;Conclusion&lt;/h4&gt;  &lt;p&gt;Once you have the view in place, you now have a very flexible large database that you can use that has a real-world use-case and can be used for demos, performance tuning work, statistical analysis, etc.. Of course you’ll want to add your own indexes and possibly partitions depending on your use case.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-u9RUJhrN87A/TvjvXPUMASI/AAAAAAAAA-U/WnppUZJnHZc/s1600-h/image%25255B27%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-RmNxx4ulJMM/TvjvXjhaKtI/AAAAAAAAA-c/iTOfuPPvP3A/image_thumb%25255B13%25255D.png?imgmax=800" width="554" height="575" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;I know that I’ve often wanted such a database when presenting or demonstrating specific functions within SQL Server, so I hope this database proves useful.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/12/building-large-real-world-sql-server.html</link><author>noreply@blogger.com (Ted Malone)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-qHJJvQRYO-s/TvjvTmmIg6I/AAAAAAAAA88/NLEAlVAnBAA/s72-c/image_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-6121755229709945631</guid><pubDate>Fri, 23 Dec 2011 18:00:00 +0000</pubDate><atom:updated>2011-12-23T11:00:05.984-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Big Data</category><category domain='http://www.blogger.com/atom/ns#'>Data Mining</category><category domain='http://www.blogger.com/atom/ns#'>SQL Server</category><category domain='http://www.blogger.com/atom/ns#'>Statistical Analysis</category><category domain='http://www.blogger.com/atom/ns#'>R</category><title>Statistical Analysis with R and Microsoft SQL Server 2012</title><description>&lt;p&gt;It’s been awhile since I’ve written a blog post, but that doesn’t mean that I haven’t been thinking about things to write about and discuss here. Recently, I had the opportunity to attend an alpha delivery of EMCs “Data Science and Big Data Analytics” course (Read about the course on &lt;a href="http://education.emc.com" target="_blank"&gt;EMC Education Services&lt;/a&gt; site here: &lt;a href="http://education.emc.com/guest/campaign/data_science.aspx"&gt;http://education.emc.com/guest/campaign/data_science.aspx&lt;/a&gt; ) and was really taken by a couple of points that the course brought home:&lt;/p&gt;  &lt;p&gt;1) There’s much more to statistical analysis than I had ever thought about. (Being a Microsoft SQL Server and Microsoft BI Stack kinda guy, I always figured that you needed Excel and SSAS to do real statistical analysis.&lt;/p&gt;  &lt;p&gt;2) Big Data Analytics is a really cool technology discipline!&lt;/p&gt;  &lt;p&gt;The course itself was based on the &lt;a href="http://www.greenplum.com/products/greenplum-database" target="_blank"&gt;EMC Greenplum Database&lt;/a&gt; (Community Edition, which you can download and use for free!) which is an amazing piece of technology (I am very impressed with it’s feature/functionality and integration with things like Hadoop for real parallel computing capabilities) as well as the open source “R” statistical analysis language. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.r-project.org/" target="_blank"&gt;&lt;img style="margin: 0px 0px 5px" alt="R logo" src="http://www.r-project.org/Rlogo.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;            &lt;p&gt;While it’s true that my role at &lt;a href="http://www.emc.com" target="_blank"&gt;EMC&lt;/a&gt; means that I focus more on the non-Microsoft stack these days, that doesn’t stop me from thinking about how I can apply things that I lean to the Microsoft platform. With that in mind, one of the things that I kept coming back to in the class was, “How would I do this using SQL Server?&amp;quot; As it turns out using R with SQL Server isn’t all that difficult, and it really does open up an entirely new way of thinking about statistical analysis (for me anyway)&lt;/p&gt;  &lt;h4&gt;&lt;/h4&gt;  &lt;h4&gt;&lt;/h4&gt;  &lt;h4&gt;R and Statistical Analysis&lt;/h4&gt;  &lt;p&gt;R is an open source “software environment” that is used primarily for statistical analysis. A huge part of “Data Science” is of course statistical analysis, so the two go hand-in-hand. One very cool aspect of R is the fact that the graphics environment is “built in” (I put that in quotes because R is very modular and requires you to load packages for just about anything you do, although a basic “plot” command is included in the base distribution) and allows you not only to analyze data, but also visualize it “on the fly” as well. You can read about (and download) R from the main website here: &lt;a href="http://www.r-project.org/"&gt;http://www.r-project.org/&lt;/a&gt;. If you are really interested in R, you should make a point of reading the R Journal here: &lt;a title="http://journal.r-project.org/current.html" href="http://journal.r-project.org/current.html"&gt;http://journal.r-project.org/current.html&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;One thing that is very clear though about R is that it’s a “data source agnostic” environment, but many of the examples that use data either deal with flat files or connect to open source databases like MySQL or Postgres. This of course doesn’t mean you can’t use R with SQL Server, it just means you have to dig a little deeper and understand how to connect the R environment to your SQL Server database.&lt;/p&gt;  &lt;h4&gt;R and SQL Server&lt;/h4&gt;  &lt;p&gt;Once you download and install the R environment (the screen shots and examples I provide here will be from the Windows version of RGUI version 2.14.0 which I downloaded from the UC Berkeley mirror here: &lt;a href="http://cran.cnr.berkeley.edu/"&gt;http://cran.cnr.berkeley.edu/&lt;/a&gt; ) you will need to install the RGUI environment and decide whether you will use the 32 or 64-bit client. This is a very important distinction, since R connects to databases via ODBC, and ODBC drivers are very platform (32 versus 64 bit) specific. &lt;/p&gt;  &lt;p&gt;In my case, I am going to use the 64 bit GUI and will be using SQL Native Client 11 to connect to SQL Server 2012. (There is no specific reason for me to use SQL Server 2012 here, other than I’ve been playing around with the release candidate and my development environment is all setup for it) I have a large database that I use for “big data” type demonstrations that also works well for statistical analysis work. I will likely write another article on how this database was constructed, but know that the data is very real world (it is built from 2010 data collected via the US Home Mortgage Disclosure Act) and well-suited for testing statistical analysis theories and data mining. &lt;/p&gt;  &lt;p&gt;Once you decide what client you will be using, you will need to configure an ODBC DSN (I decided to use a System DSN for my work, so I’ll walk though the creation of that DSN) to connect to your database. To create a new DSN, use the platform-specific version of the ODBC control panel (for Windows Server 2008 R2, simply go to control panel and search for “ODBC”, you will then see the “Set up data sources (ODBC)” as shown in the following figure: &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-p6_t99nl3o0/TvTBjOdTCpI/AAAAAAAAA5g/Ifu9TQ-dNVo/s1600-h/image%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px 0px 5px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-3aMwATaOuu4/TvTBjZDRSoI/AAAAAAAAA5o/SgBXEvk0V28/image_thumb%25255B1%25255D.png?imgmax=800" width="554" height="364" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Click the “System DSN” tab, and then click the Add button and walk through the wizard to connect to your database. Remember the name of the DSN you create, as you’ll need to specify it from within R in order to connect. In my case, the DSN is named “HMDAData”.&lt;/p&gt;  &lt;p&gt;In order to use the ODBC connection within R, you’ll need to download the “RODBC” library, which can be found here: &lt;a title="http://cran.cnr.berkeley.edu/web/packages/RODBC/index.html" href="http://cran.cnr.berkeley.edu/web/packages/RODBC/index.html"&gt;http://cran.cnr.berkeley.edu/web/packages/RODBC/index.html&lt;/a&gt;. Select the appropriate zip file and download it to a folder on the machine where you installed R. Once it is downloaded, from within the RGUI, select Packages, and then select “Install packages from local ZIP file” as shown in the following figure:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-pehYYL6JAto/TvTBjmd65-I/AAAAAAAAA5w/6Bhwy3qYjaE/s1600-h/image%25255B7%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-bZdwx7evhf0/TvTBj65hmuI/AAAAAAAAA54/NtJkwT0_tuw/image_thumb%25255B3%25255D.png?imgmax=800" width="554" height="215" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;Point to the zip file you just downloaded and R will install the appropriate package and make it available. Once it is available, you can connect to SQL Server by using the following R commands as shown in the figure:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-pWUhesJzuYo/TvTBkA4_5pI/AAAAAAAAA6A/VmZiLuMKoJs/s1600-h/image%25255B15%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-PtbyaeAFs9g/TvTBkui1WiI/AAAAAAAAA6I/15AoOgF0Qu0/image_thumb%25255B7%25255D.png?imgmax=800" width="554" height="442" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Note: R is case sensitive for most operations. Also note that I am entering the commands directly into the R console. Another way to do this is to use the File command and create a new R script, and submit commands from the script to the console. I’ll show this in later posts. Also note that assignment is done in R by using “&amp;lt;-“, this line is basically saying, “assign an object named ‘ch’ to the output of the odbcConnect function that has been passed the value ‘HMDA’”. This will make more sense as you get into R more. What I have done with these commands is load the RODBC library and create a “channel” object that I will use to query SQL Server. &lt;/p&gt;  &lt;h4&gt;&lt;/h4&gt;  &lt;h4&gt;Data Mining with R&lt;/h4&gt;  &lt;p&gt;In my database, I have a table named “tblIncome” that has 2 columns. Each row is a county in Colorado and the average salary of all people who have applied for a home loan in 2010 within that county. If I wanted to find some “clusters” of salaries within Colorado and see how the income among potential home buyers/refinancers is grouped together, I would take the data and apply K-means clustering techniques to identify the clusters. Normally I’d use SSAS Data Mining, or maybe Excel with the Predixion add-in, but now thanks to R, I can do that analysis directly within R. &lt;/p&gt;  &lt;p&gt;The first step is to obtain the data from the SQL Server table and load it into a matrix in R. This can be accomplished using the following command:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-V58Tn0X8A2U/TvTBk00eG9I/AAAAAAAAA6Q/v5kLmI12JM8/s1600-h/image%25255B19%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-m0gaVlx1KNI/TvTBlHWXzYI/AAAAAAAAA6Y/Ig5qbDuo_jc/image_thumb%25255B9%25255D.png?imgmax=800" width="554" height="176" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The “sqlFetch” command simply attaches to a table and does a SELECT * from that table. The “as.matrix” ensures that the data is loaded into a matrix that matches the table structure. You can get a summary of the data with the following command:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-jvxUlxpNVgM/TvTBlVepDhI/AAAAAAAAA6g/Dl26989gEs0/s1600-h/image%25255B23%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-cjnx-UimdY0/TvTBlklgU2I/AAAAAAAAA6o/Lr2bT6Kq5nI/image_thumb%25255B11%25255D.png?imgmax=800" width="554" height="167" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If you just want to see what the income object looks like, you can issue the following command:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-BqF_slJdx70/TvTBloMPbEI/AAAAAAAAA6w/W8KXSF0fwRo/s1600-h/image%25255B27%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/--cT1dgUlQ00/TvTBmG3vL0I/AAAAAAAAA64/kiXKzo4bSFU/image_thumb%25255B13%25255D.png?imgmax=800" width="554" height="279" /&gt;&lt;/a&gt;&lt;/p&gt;          &lt;p&gt;Now that we have the data loaded into a matrix, we should sort it to make it easier to cluster. Issue the sort command as follows:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-vnbGQRiyR9o/TvTBmMcNzlI/AAAAAAAAA7A/JEonnBXpgjY/s1600-h/image%25255B31%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-iNAeOr6rkQc/TvTBmYucJLI/AAAAAAAAA7I/w2oToc0BInE/image_thumb%25255B15%25255D.png?imgmax=800" width="554" height="43" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Now that we have the matrix loaded and sorted, we can feed it into the kmeans clustering algorithm. As a note here, anything that you want help with in R you can simply use the ? followed by the command. For example, issue ? kmeans to read all about the kmeans command. For the purposes of this blog entry, I’m just going to use the default algorithm and I’m going to make a guess at 3 clusters to start with and iterate 15 times. I’ll assign the output to the object “km”. The command looks like:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-SkHFXPTLwCE/TvTBmgi8roI/AAAAAAAAA7Q/f4zPDPZ-2cc/s1600-h/image%25255B35%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-wkBxKN5o6Fc/TvTBmw7ArZI/AAAAAAAAA7Y/iLWdRJkt7x0/image_thumb%25255B17%25255D.png?imgmax=800" width="554" height="323" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;If you want to know what the km object contains, you can issue the following command:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/-ftPcawGRKcU/TvTBnD504XI/AAAAAAAAA7g/Rj5lpIS-A4Y/s1600-h/image%25255B39%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-cW6GDA2q2t0/TvTBnXeYQ9I/AAAAAAAAA7o/bKvhFkEpUUQ/image_thumb%25255B19%25255D.png?imgmax=800" width="554" height="382" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Note that the output has given us 3 clusters with means at 83K, 371K and 161K. We also can see that the object contains various components. To statisticians, this information is very easy to understand, but if you’re like me you probably want to visualize the data. Since I am interested in seeing the cluster associations, I can plot the cluster component. I can use the following command to create the plot:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-plomWbE-gBY/TvTBnj95_4I/AAAAAAAAA7w/xEVW3DKgO9Q/s1600-h/image%25255B43%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-RPt8LfQd1EI/TvTBnxeBL5I/AAAAAAAAA74/3moRjIZ4JQg/image_thumb%25255B21%25255D.png?imgmax=800" width="554" height="36" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The command generates a plot graph that looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-kLMVTsJAzKI/TvTBoH3Wb8I/AAAAAAAAA8A/qPT87pFb9Mw/s1600-h/image%25255B47%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-pg9C51s539c/TvTBoZk_ypI/AAAAAAAAA8I/SwUvr-aaJNo/image_thumb%25255B23%25255D.png?imgmax=800" width="554" height="568" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The colors represent our clusters. Since I just guessed that 3 income clusters would be appropriate, the graphic is likely not a very good representation of true income clusters. In order to determine what the true number of clusters should be, I can take the income matrix and compute the sum of squares of each group and determine how many clusters I should have. (You can read about this at &lt;a href="http://www.statmethods.net/advstats/cluster.html"&gt;http://www.statmethods.net/advstats/cluster.html&lt;/a&gt; ) &lt;/p&gt;  &lt;p&gt;R has the capability of creating loops, so we can iterate through the matrix and plot the resulting sum of squares within the group. We can then plot the results and look for an “elbow” to determine how many clusters would be appropriate with the data that we have. You can accomplish this with the following command:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/-juaDQQIjwZI/TvTBohiNHLI/AAAAAAAAA8Q/jL0uugjgwqo/s1600-h/image%25255B51%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-XkK9X1tRdkE/TvTBo09HS6I/AAAAAAAAA8Y/hw8ylDtkRvY/image_thumb%25255B25%25255D.png?imgmax=800" width="554" height="77" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;which generates the following plot:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-_2OchzeUjX0/TvTBpAdmZ-I/AAAAAAAAA8g/MCLgkrd9DOo/s1600-h/image%25255B55%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/-XLgMP091nOc/TvTBpXl_vuI/AAAAAAAAA8o/dsH6Sfx2jTs/image_thumb%25255B27%25255D.png?imgmax=800" width="554" height="577" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;which tells us that the appropriate number of clusters is 4. &lt;/p&gt;  &lt;h4&gt;&lt;/h4&gt;              &lt;h4&gt;Conclusion&lt;/h4&gt;  &lt;p&gt;The intent of this post wasn’t to teach you how to perform statistical analysis using k-means clustering, but rather to demonstrate how some very advanced statistical analysis can be performed from SQL Server data and R without SSAS modeling or advanced Excel use. &lt;/p&gt;  &lt;p&gt;Since I am spending a lot of time in the Data Science discipline, I will be posting a lot of R examples using SQL Server data.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/12/statistical-analysis-with-r-and.html</link><author>noreply@blogger.com (Ted Malone)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-3aMwATaOuu4/TvTBjZDRSoI/AAAAAAAAA5o/SgBXEvk0V28/s72-c/image_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-8940734671724516610</guid><pubDate>Fri, 08 Jul 2011 21:55:00 +0000</pubDate><atom:updated>2011-07-08T15:55:36.012-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Big Data</category><title>Big (say it with me now) Data</title><description>&lt;p&gt;Over the last few months I’ve been working with more and more people who have the buzz-phrase, “Big Data” on their minds. For some reason, many people think that “Big Data” (in this context, I’m referring to “Big Data Analytics”) has magical powers and doesn’t require the same careful planning and modeling process that traditional analytics does. &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Let’s go ahead and put this thought to rest now. “Big Data” is still “Data” and the data model will make or break your project. The Data Architect (note that I didn’t say “Data Scientist”, that’s a bit different and we’ll get into that in a later post) is still a key position on any Big Data project, and as a matter of fact due to the overwhelming amount of data that is crunched, analyzed and stored, the Data Architect is likely your most important asset.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;It is very true that traditional, narrowly-focused data modeling techniques will not work in the Big Data world. Data Architects working with Big Data must understand that Big Data requires thought on an entirely new dimension (forgive the pun) in the modeling of data structures. Not only must you consider the traditional modeling of relationships, structures and indexes, you must also understand how parallel processing will impact your queries, how data will be loaded in partitioned systems, and how adding new nodes into a Big Data cluster will impact existing data models. These are new skills that Data Architects who plan to work with Big Data must learn.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/07/big-say-it-with-me-now-data.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-3556210492758309302</guid><pubDate>Fri, 01 Jul 2011 17:55:00 +0000</pubDate><atom:updated>2011-07-01T11:55:24.089-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>MVP</category><category domain='http://www.blogger.com/atom/ns#'>ALM</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio</category><title>Microsoft MVP for another year!</title><description>&lt;p&gt;Was honored to receive “the email” from the &lt;a href="http://blogs.msdn.com/b/mvpawardprogram/archive/2011/06/29/congratulations-to-the-new-and-re-awarded-microsoft-mvps.aspx" target="_blank"&gt;Microsoft MVP Award Program&lt;/a&gt; letting me know that I have been awarded MVP for another year in Visual Studio ALM. Have to say that I am very honored to be a part of this program!&lt;/p&gt;  &lt;p&gt;The upcoming year will be a very good one for ALM folks, and I think those of us with a data focus will especially be happy with new and improved tools.&lt;/p&gt;  &lt;p&gt;Thanks Microsoft!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/07/microsoft-mvp-for-another-year.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-3394780227114918524</guid><pubDate>Thu, 30 Jun 2011 15:05:00 +0000</pubDate><atom:updated>2011-06-30T09:05:19.360-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>The Cloud</category><title>The Cloud and Software Engineering</title><description>&lt;p&gt;In my role here at &lt;a href="http://www.emc.com" target="_blank"&gt;EMC&lt;/a&gt;, I have the opportunity to speak to many of our software development teams about how software development is changing. One of the presentations that I give discusses industry trends that are impacting the way we think about designing and developing software. In that particular presentation, there’s a rather boring graphic that looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-uQs6AU7xlDs/TgyQqndwF5I/AAAAAAAAAzw/lVfGW5_1inQ/s1600-h/image%25255B9%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px auto 5px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-fLwfpNWRHDs/TgyQrlwVnKI/AAAAAAAAAz0/uOMcQeel3Wc/image_thumb%25255B5%25255D.png?imgmax=800" width="338" height="277" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p align="left"&gt;This graphic represents the fact that “the cloud” is an important concept that software developers (especially those of us working in the IT Infrastructure management realm) need to be aware of and need to understand.&lt;/p&gt;  &lt;h4&gt;How “The Cloud” Impacts Software Engineering&lt;/h4&gt;  &lt;p&gt;Many people would argue that “The Cloud” (be it a private cloud, a public cloud or a hybrid) doesn’t really change anything about how software is engineered since it’s all about infrastructure, and after all, infrastructure is not the concern of the software engineer, right? The problem with that mode of thinking is that one big promise of the cloud movement is flexibility, and software that runs “in the cloud” needs to embrace that flexibility. Consider these points:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Cloud offerings billing themselves as “Platform as a Service (PaaS)” generally attempt to abstract the physical hardware environment from the virtualized application platform that software is built on. At a very high level, what this means is that developers who build applications for PaaS offerings don’t have any control over how the hardware is configured, maintained and protected. &lt;/li&gt;    &lt;li&gt;Cloud offerings billing themselves as “Infrastructure as a Service (IaaS)” generally provide a the ability for hosting of virtual machines that are fully-controlled by the customer, but like PaaS offerings, abstract the physical implementation and associated maintenance from the customer. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;In both cases above, software engineers need to understand that the environment that their software is executing on is subject to change without notice. In the case of PaaS offerings, developers need to think about persistent storage and volatile storage rather than simply writing to disk. For example, consider an normal application that writes data to a storage device. In many cases, the developer would choose to write that data to a logical disk hosted by the server that the application is executing on. The issue here is that for PaaS, in most cases the “disk” storage available to your application is volatile in nature and in the event that the environment hosting your app changes (failover, patch maintenance, network maintenance, etc) the state of the “disk” is not guaranteed, so for data that must be persisted, developers need to create and maintain non-volatile storage. In the case of IaaS offerings, in the event of a failover or other need to move the hosted virtual machine, there is no guarantee that the state of the VM will be maintained (i.e., it’s possible that the VM could be reset to its initial state, and thus developers need to consider building in mechanisms to deal with that possibility.&lt;/p&gt;  &lt;p&gt;The point of the above is that software engineers need to consider the inherit flexibility requirements of the cloud infrastructure when designing and building software that is deployed “to the cloud”. &lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/06/cloud-and-software-engineering.html</link><author>noreply@blogger.com (Ted Malone)</author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-fLwfpNWRHDs/TgyQrlwVnKI/AAAAAAAAAz0/uOMcQeel3Wc/s72-c/image_thumb%25255B5%25255D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-2899870241054668241</guid><pubDate>Wed, 29 Jun 2011 16:52:00 +0000</pubDate><atom:updated>2011-06-29T10:52:57.333-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>Virtual Conference</category><title>SQL Server Worldwide User Group (SSWUG) PowerPivot Analytics Expo</title><description>&lt;p&gt;The folks at &lt;a href="http://www.sswug.org" target="_blank"&gt;SSWUG&lt;/a&gt; are putting on a PowerPivot Analytics expo on July 15th. I’ll be speaking at the event on “SharePoint 2010 Business Intelligence Feature Review”. My session will be broadcast from 11:30am – 1:00pm Mountain Daylight Time. The session will cover Excel Services, and how it fits into the larger Business Intelligence ecosystem for SharePoint 2010. I will also be covering an introduction to PowerPivot and how it helps to deliver “BI for the masses” when coupled with SharePoint as a delivery mechanism.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;These expos are designed to give people a one day seminar of focused content, and also introduces the Virtual Conference system, which works very well. People can sit at their desks and still have that “going to a technical conference” feeling.&lt;/p&gt;  &lt;p&gt;Check it out! To learn more and to register for the event, see &lt;a href="http://www.vconferenceonline.com/event/home.aspx?id=281"&gt;http://www.vconferenceonline.com/event/home.aspx?id=281&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/06/sql-server-worldwide-user-group-sswug.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-2352104887325043440</guid><pubDate>Wed, 29 Jun 2011 16:42:00 +0000</pubDate><atom:updated>2011-06-29T10:42:07.043-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>EMC</category><category domain='http://www.blogger.com/atom/ns#'>Community</category><title>Moving the Blog Back Here!</title><description>&lt;p&gt;Well, after playing around with the blog on my local server, I’ve decided that it really does work best here on blogger. I have changed the url to &lt;a href="http://blog.sqltrainer.com"&gt;http://blog.sqltrainer.com&lt;/a&gt; so hopefully that won’t confuse things too much.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;I always say this, but I’m hoping to spend more time blogging about stuff here over the next year. Work has been exceptionally exciting and I’ve had the awesome opportunity to work with some very cool technologies that I’d love to spend more time educating others about. Time is always an issue though, so we’ll see how it goes.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/06/moving-blog-back-here.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-4456510478816948875</guid><pubDate>Thu, 06 Jan 2011 15:45:00 +0000</pubDate><atom:updated>2011-01-06T08:45:57.649-07:00</atom:updated><title>Moving the Blog to a New Home</title><description>&lt;p&gt;Blogger has been very good to me for this blog, but I am moving it to a new home on my personal website. I’ve been playing around with the incredibly functional &lt;a href="http://www.mojoportal.com/" target="_blank"&gt;mojoPortal&lt;/a&gt; (check it out, it’s VERY cool) for my website and the blog technology seems very useful, so I’m going to start using that platform instead.&lt;/p&gt;  &lt;p&gt;Check it out: &lt;a href="http://www.sqltrainer.com/blog.aspx" target="_blank"&gt;Business Intelligence and Agile Development blog&lt;/a&gt; on &lt;a href="http://www.sqltrainer.com" target="_blank"&gt;SQLTrainer.com&lt;/a&gt;.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2011/01/moving-blog-to-new-home.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-4926098844448624672</guid><pubDate>Thu, 05 Aug 2010 01:02:00 +0000</pubDate><atom:updated>2010-08-04T19:02:11.377-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio</category><title>A Couple of Major Visual Studio Announcements</title><description>&lt;p&gt;This week Microsoft is hosting the &lt;a href="http://vslive.com/events/vslive-summer-2010/home.aspx" target="_blank"&gt;Visual Studio Live&lt;/a&gt; conference in Redmond. This conference is one of those that is fun to get to if you can, but I have unfortunately not been able to attend this time around. Microsoft has chosen this conference as a platform to make some pretty interesting announcements.&lt;/p&gt;  &lt;h4&gt;Developing Line of Business Applications from Templates&lt;/h4&gt;  &lt;p&gt;The first major announcement came yesterday when “&lt;a href="http://www.microsoft.com/visualstudio/en-us/lightswitch" target="_blank"&gt;Visual Studio Lightswitch&lt;/a&gt;” was announced. In short, Lightswitch is a Rapid Application Development platform. The marketing states, “Visual Studio Lightswitch enables you to quickly create professional-quality line of business applications regardless of your development skills” and from what I can tell, it definitely does that. While this tool isn’t targeted at developers who like to build code from the ground up, it certainly does make the lives of those who want to quickly build and deploy applications easier.. I could see this as a great prototyping platform, or even as a mechanism to quickly solve specific problems. I’ll play around with the bits and maybe post a few more articles on it here as I get the time.&lt;/p&gt;  &lt;h4&gt;Test and Lab Management&lt;/h4&gt;  &lt;p&gt;If you’ve spent any time at all looking at the Application Lifecycle Management (ALM) features of Visual Studio 2010, you have heard about “&lt;a href="http://www.microsoft.com/visualstudio/en-us/products/2010-editions/lab-management" target="_blank"&gt;Test and Lab Management&lt;/a&gt;” and are likely excited about it. When Visual Studio 2010 released, the Lab management capabilities were unfortunately not quite ready, and were released in “Release Candidate” form. There are a number of stories and rumors behind this, but the reality is that the team just couldn’t be comfortable with where they were based on the amount of customer feedback that they had received. There was also a lot of confusion around how much this was going to cost organizations to deploy, and how things were going to be licensed. (The idea was you needed the client, the server, and agents for each of the machines deployed, which meant that larger organizations potentially were going to have to make a sizable and serious investment)&lt;/p&gt;  &lt;p&gt;The announcement today is that the “RTM” bits for Test and Lab Management will be available by the end of August! The other announcement is that the licensing has been very simplified. The client portion will be available by purchasing either Visual Studio 2010 Ultimate or Visual Studio 2010 Test Professional. The server portion is automatically included with Team Foundation Server, and the agents will NOT require a CAL (basically they are included with your TFS license).&lt;/p&gt;  &lt;p&gt;For those of us who’ve deployed Test and Lab Management already, there will be an update package that will upgrade the Release Candidate bits to RTM, but will also include fixes for each of the components (Client, Server, Agent) and the “things” they are packaged with (Team Foundation Server, etc).&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2010/08/couple-of-major-visual-studio.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-5154639640771242210</guid><pubDate>Wed, 21 Jul 2010 01:33:00 +0000</pubDate><atom:updated>2010-07-20T19:33:44.366-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>MVP</category><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><category domain='http://www.blogger.com/atom/ns#'>Community</category><title>A Very Interesting “Community-Focused” Project.</title><description>&lt;p&gt;As a &lt;a href="http://blogs.msdn.com/b/mvpawardprogram/" target="_blank"&gt;Microsoft Most Valuable Professional&lt;/a&gt; (MVP), I’ve had the opportunity to work with some pretty amazing people over the years, and have always been impressed with how “Community-Focused” this group of people can be.&lt;/p&gt;  &lt;p&gt;I was, however, recently blown away by a friend of mine who’s also an MVP, and who I’ve had the pleasure of working with on numerous occasions. &lt;a href="http://sqlblog.com/blogs/arnie_rowland" target="_blank"&gt;Arnie Rowland&lt;/a&gt;, who runs a consulting company based in the Pacific Northwest called &amp;quot;Westwood Consulting, Inc.” came up with what I think is a brilliant idea. (More on this in a second, but let me set the stage first)&lt;/p&gt;  &lt;p&gt;One of the perks of being awarded the MVP award from Microsoft is you receive special attention from the product teams from time to time. Once in awhile a product team will decide to do something very nice for MVPs, like send some special SWAG, put on a special chat session, or offer an invite to take part in face to face meetings. Sometimes, in conjunction with the marketing teams, they offer SWAG that can be very impressive. Well, this year the Developer Division decided that developer MVPs would receive Not-For-Resale MSDN Subscription vouchers that they could give out any way that they chose. (Since many MVPs spend a fair amount of time in public speaking engagements, offering one as a giveaway for the event is likely what they had in mind). These things retail for just over $12K each, so it was certainly a very generous give-away. Of course it now begs the question, how do you maximize the value of these things and get them to people who could really benefit from them.&lt;/p&gt;  &lt;p&gt;Here’s where Arnie comes in. He came up with this great idea, which in a nutshell says, “If you’re an unemployed or underemployed developer, we’ll give you free software and the information you need to use it if you’re willing to use it to help out a non-profit agency --- Oh, and you have to prove that you are willing to treat this seriously by submitting a proposal for the work you’ll do”. Arnie discusses this all in a blog post here: &lt;a title="http://sqlblog.com/blogs/arnie_rowland/archive/2010/07/12/while-you-don-t-get-a-free-lunch-you-will-get-your-just-deserts.aspx" href="http://sqlblog.com/blogs/arnie_rowland/archive/2010/07/12/while-you-don-t-get-a-free-lunch-you-will-get-your-just-deserts.aspx"&gt;http://sqlblog.com/blogs/arnie_rowland/archive/2010/07/12/while-you-don-t-get-a-free-lunch-you-will-get-your-just-deserts.aspx&lt;/a&gt; All in all this is an amazing “Win-Win” type project. The un-or-underemployed developer receives over $12K worth of software (there’s more than just the MSDN subscription on the table) and some deserving non-profit organization gets a problem solved!&lt;/p&gt;  &lt;p&gt;When I talked to Arnie about this, I realized that I definitely wanted to be involved, so I donated the MSDN subscriptions that I had been given to him for this project. As it turns out several other MVPs have decided to do the same, so this is starting to almost go viral. The guys on the &lt;a href="http://channel9.msdn.com/shows/PingShow/Ping-66-Kinect-WPC-Bing-Next-of-Kin-Free-MSDN-Universal-Subscription/" target="_blank"&gt;ping show&lt;/a&gt; over on &lt;a href="http://channel9.msdn.com/" target="_blank"&gt;MSDN Channel 9&lt;/a&gt; picked it up in a recent episode, and it was also mentioned in a recent MSDN Flash.&lt;/p&gt;  &lt;p&gt;So, if you’re reading this and you’re interested in helping out a non-profit organization, head on over to Arnies Blog, and submit an idea. Who knows, you may end up with a very cool pack of software.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2010/07/very-interesting-community-focused.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-8643956644909708761</guid><pubDate>Mon, 19 Jul 2010 23:46:00 +0000</pubDate><atom:updated>2010-07-19T17:46:21.075-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><category domain='http://www.blogger.com/atom/ns#'>Scrum</category><category domain='http://www.blogger.com/atom/ns#'>EMC</category><title>It’s been awhile – and Visual Studio Scrum 1.0 is Released!</title><description>&lt;p&gt;It has really been a LONG time since I’ve posted here. In my defense, I’ve spent the last year blogging about my dogs and their fight with cancer (See &lt;a href="http://nikkitherott.tripawds.com"&gt;http://nikkitherott.tripawds.com&lt;/a&gt; and &lt;a href="http://buddytherott.tripawds.com"&gt;http://buddytherott.tripawds.com&lt;/a&gt; for more info). We lost Buddy a few weeks back, and Nikki seems to have beat it for now, so we’re a getting back to normal in a lot of ways.&lt;/p&gt;  &lt;p&gt;A number of things has happened in my professional life since the last post (Which incidentally, I’m still not happy about, and have since had the need to replace my HP Tablet. Guess which brand I DID NOT buy?) and here they are in no particular order:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;My product, “EMC Ionix Configuration Analytics Manager”, was sold to VMWare, along with all of the former Configuresoft assets, including a number of my team members. My Product Manager, a couple of key architects and myself stayed behind at EMC and are now working to define a new Business Intelligence Product that is currently being called, “EMC Ionix Infrastructure Insight&amp;quot; or I3..&amp;#160; More on this in future posts, because it’s been an extremely exciting time and we’re really having some fun delving into the depths of the Storage Resource Management (SRM) domain.&lt;/li&gt;    &lt;li&gt;Due to the fact that the building we had occupied is now the property of VMWare, my team and I have relocated to a new office just a bit Southwest of where we were. My new office has a much better view of Pikes Peak, and given that I mostly work East Coast hours these days, I’m in the office early enough to enjoy the deer and other wildlife that roam through the grounds before most folks get in.&lt;/li&gt;    &lt;li&gt;I was re-awarded the Microsoft MVP Award (They have retired the “Visual Studio Team System” name, so now I’m a “Visual Studio ALM MVP”) for another year. As I told my lead, I didn’t really deserve it, but they gave it to me, so I’m dedicated to earning it this year.&lt;/li&gt;    &lt;li&gt;I was able to speak at a number of Visual Studio launch events, most notably the “.NET Forever” event in Stockholm, Sweden. (Thanks Tibi!) &lt;/li&gt;    &lt;li&gt;EMC announced the acquisition of &lt;a href="http://www.greenplum.com/" target="_blank"&gt;Greenplum, Inc&lt;/a&gt;. If you haven’t heard about Greenplum, you might want to read up on them. This is definitely a major game changer in the world of Cloud Computing. I really can’t say much at this point about the acquisition, but don’t be surprised if you start seeing a lot more “Massively Scalable Data Warehouse” type posts here in the future.&lt;/li&gt;    &lt;li&gt;This may be the first year since the mid 90’s that I have not flown enough to maintain top-tier status on my airline of choice (American Airlines). I’m not yet sure how I feel about this.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Anyway, I think that’s enough of that. &lt;/p&gt;  &lt;p&gt;On to the real reason behind this post.&lt;/p&gt;  &lt;p&gt;Today marks an important day for developers who practice agile development. Microsoft has release version 1.0 of the &lt;a href="http://blogs.msdn.com/b/aaronbjork/archive/2010/07/19/announcing-microsoft-visual-studio-scrum-1-0.aspx" target="_blank"&gt;Visual Studio Scrum&lt;/a&gt; template and guidance. The reason that this is an exciting and important announcement is that Microsoft has finally seen fit to release a template that can be used with Visual Studio and Team Foundation Server out of the box! (This is a point of debate for many people I know, but I’ve always seen the existing MSF guidance as a starting point that must be customized. Now we have a solution that is good to go from the first install!).&lt;/p&gt;  &lt;p&gt;This template was developed from the start by Microsoft, who engaged with some very well-known heavy hitters in the scrum community. This means that the template and guidance provided is not something that won’t hold up to real world development, but rather has been vetted by those who live and breathe Scrum on a daily basis. &lt;/p&gt;  &lt;p&gt;Check it out! &lt;a title="http://blogs.msdn.com/b/aaronbjork/archive/2010/07/19/announcing-microsoft-visual-studio-scrum-1-0.aspx" href="http://blogs.msdn.com/b/aaronbjork/archive/2010/07/19/announcing-microsoft-visual-studio-scrum-1-0.aspx"&gt;http://blogs.msdn.com/b/aaronbjork/archive/2010/07/19/announcing-microsoft-visual-studio-scrum-1-0.aspx&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2010/07/its-been-awhile-and-visual-studio-scrum.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-1081123492485817103</guid><pubDate>Fri, 21 Aug 2009 20:37:00 +0000</pubDate><atom:updated>2009-08-21T14:37:39.286-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>General Ramblings</category><title>Insane(ly stupid!) HP Customer Service</title><description>&lt;p&gt;Those that know me are aware of the fact that I tend to be rather vocal in my support for the things that I use. I’m an advocate for most Microsoft Technologies, an advocate for the much-maligned Zune MP3 player, Harley Davidson motorcycles, etc etc etc.. If I own/use something and I like it, I let people know about it…&lt;/p&gt;  &lt;p&gt;I’ve been a huge fan of HP hardware for many years, and when I became unhappy about the performance of my work-provided laptop, I decided that I was going to take matters in my own hand and bought my own HP Tablet PC. I love this thing, it has been one of the most versatile (and downright useful) portable devices that I have ever owned. Coupled with Windows 7, I couldn’t be happier.I purchased it at Costco, and honestly can’t imagine ever being unhappy with the process. &lt;/p&gt;  &lt;p&gt;As circumstances would have it, I am in need of a more powerful portable computing platform (in addition to the tablet mind you, not in replace of) for some field-work that I’m going to be doing. Rather than try and navigate the hardware requisition maze to get a new work-provided one, I decided that I would buy another one of my own. Of course since I love HP products, I decided to see what HP has available. I found *exactly* what I was looking for on the HP site, and unfortunately found out that Costco wouldn’t carry that model with the options that I wanted, so I decided to go ahead and order it from the HP Website. I configured the beast with all the custom options, paid the price, and had it shipped.&lt;/p&gt;  &lt;p&gt;I received the laptop (An HP HDX 16T Premium “CTO” model with every imaginable bell/whistle) on Monday of this week. Did my normal pave/rebuild with Windows 7, and after a couple of false starts was pretty happy with my purchase.&lt;/p&gt;  &lt;p&gt;And then the fun began…..&lt;/p&gt;  &lt;p&gt;On Tuesday evening, I received an email from the HP store talking about their latest/greatest offer… Turns out it was an offer for the exact laptop that I just bought…. &lt;strong&gt;AND IT’S $500 CHEAPER THAN WHAT I PAID!&lt;/strong&gt; Ok, probably some hidden tricks and nothing to get excited about, right? Well, it turns out that one of the guys who works for me decided he too wants a new laptop and decided to buy one, so I sent him the link from the email I’d received, and sure enough, he configured &lt;strong&gt;THE EXACT SAME&lt;/strong&gt; laptop with the &lt;strong&gt;EXACT SAME&lt;/strong&gt; options, and paid exactly $500 less than I did…&lt;/p&gt;  &lt;p&gt;No worries I’m sure. I can just contact them, explain the situation (hey, just 1 day after receiving this you drop the price?) and I’m sure they will at least offer me a store credit…. Well, I send an email and I get the following response:&lt;/p&gt;  &lt;p&gt;“&lt;em&gt;I understand that you are inquiring about the received coupon after receiving your ordered HP HDX 16t laptop and would like to know if you can avail the said discount on the recently received order, right. I appreciate your inquiry, HP continues to provide you the best in all computer technology. Thank you for choosing HP. Ted, that coupon code is applicable on your next purchase. However, you also need to check on the validity of the coupon before using. &lt;/em&gt;”&lt;/p&gt;  &lt;p&gt;To which I respond that the answer is not acceptable… I receive the following response:&lt;/p&gt;  &lt;p&gt;“&lt;em&gt;I am very sorry to hear that we were not able to provide you with a satisfactory purchase experience. I want to apologize for the inconvenience that you have experienced due of the $500.00 coupon that you would like to add on your recent purchase.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;Feedback like yours is always appreciated, and it lets us know how well our website and staff have assisted you. I have registered your complaint with our management team and we will assess the corrective action that needs to take place.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;Ted, we have the 21-day Price Protection Policy. As much as I wanted to process the $500.00 credit on your order, it is beyond my credit limit. However, I would refer you to the Resolutions team to process the credit, if applicable. They will validate the coupon and provide an option for you to avail such discount.&lt;/em&gt; ”&lt;/p&gt;  &lt;p&gt;OK, so now we’re getting somewhere!&lt;/p&gt;  &lt;p&gt;I call the number that they suggest, and basically am told that the email is wrong, that I cannot use the discount. When I tell them that due to their satisfaction policy I could easily box this laptop up, return it, and then buy it again for $500 less, it just doesn’t make sense to do it this way… Their response, “That’s the way the policy works”…&lt;/p&gt;  &lt;p&gt;By now I’m highly irritated, and can’t believe that a loyal customer is being treated this way in this economy… So, I send another email basically stating that I will return the laptop and find something else to buy…&lt;/p&gt;  &lt;p&gt;Their response:&lt;/p&gt;  &lt;p&gt;“&lt;em&gt;Thank you for contacting the HP Home &amp;amp; Home Office Sales Center. &lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;I understand that you got an email with the coupon code after receiving the hdx16t laptop that you ordered, and you wanted to know if you can just get the credit for that coupon code.&amp;#160; I'm really sorry if this has caused you a hassle.&amp;#160; Don't worry since HP is willing to assist you with this.&amp;#160; &lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;I just want to let you know that there is no retroactive application on any of our coupon codes.&amp;#160; Also, all of our coupon codes have&amp;#160; limited redemptions.&amp;#160; What if we credit the amount, but the limit has been reached already?&amp;#160; You really have to apply the coupon code to the actual order to see if it's valid or not.&amp;#160; In this case, your only option is to return the laptop and place a new order using that coupon code.&amp;#160; I'm really sorry for this inconvenience.&amp;#160; &lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;I know that this has caused you a trouble, but I still want to thank and commend you for choosing HP and considering our products.&amp;#160; We have received a lot of wonderful comments and recognition about our products from customers through emails, chats, and phone calls.&lt;/em&gt;” &lt;/p&gt;  &lt;p&gt;So now it becomes a matter of what to do… I can box and return this laptop, order a new one for $500 less, and be done with it, &lt;strong&gt;OR&lt;/strong&gt; I can choose to simply return it and go elsewhere…..&lt;/p&gt;  &lt;p&gt;Given this display of customer service, I don’t think there is really much of a debate….&lt;/p&gt;  &lt;p&gt;Guess I’ll be checking out the Dell laptops….&lt;/p&gt;  &lt;p&gt;If anyone from HP is reading this, please fix your customer service process. If you drive away your satisfied and happy customers, what will you be left with? How can I, with clear conscience, recommend your products to anyone at this point? &lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/08/insanely-stupid-hp-customer-service.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>3</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-580998149268162905</guid><pubDate>Tue, 23 Jun 2009 13:45:00 +0000</pubDate><atom:updated>2009-06-23T07:45:11.341-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><title>Cross-Domain Business Intelligence</title><description>&lt;p&gt;One problem that seems to plague organizations these days is a lack of understanding of how Business Intelligence techniques apply to more than just the typical “what’s our sales performance” question. (See my previous post for more on applying BI techniques to IT data). &lt;/p&gt;  &lt;p&gt;I think the root of this problem is much deeper than simply trying to understand BI technologies. I believe the problem stems from a fundamental lack of understanding how to apply a technology solution to a business problem. This may sound a bit like Dilbert-esque crazy talk, but I think I can make a pretty good case for this argument.&lt;/p&gt;  &lt;p&gt;There’s a lot of talk and “noise” in the industry right now about “Cross Domain BI” (some call this pervasive BI, but I don’t agree with applying that term to this problem) where BI techniques are being used to tie together data from multiple dissimilar sources within an organization. This provides a unified view of just how well each aspect of the organization is performing. I think that this movement is destined for a very bumpy road unless organizations fundamentally change how they approach problem solving in general, and “BI” in particular.&lt;/p&gt;  &lt;h4&gt;Distilling the Problem&lt;/h4&gt;  &lt;p&gt;Anyone who’s been in an engineering role (not necessarily limited to software engineering by the way) for awhile has been faced with the problem of imprecise requirements or specifications. As engineers, we tend to understand how to deal with that problem (it depends a lot on the engineer, sometimes the lack of a good spec makes for a great excuse not to get the job done, or worse, leads to a product simply “built to spec” and sometimes it forces the engineer to become more involved in understanding the problem they are trying to solve) and move on. Unfortunately the trait doesn’t always hold true with those outside of engineering who typically drive Business Intelligence projects.&lt;/p&gt;  &lt;h4&gt;&lt;/h4&gt;  &lt;h4&gt;Agile Business Intelligence&lt;/h4&gt;  &lt;p&gt;I’ve made the base before that BI projects *must* be driven by Agile methodologies if they are going to succeed. The main point of my argument there is that a successful BI project must be able to adapt to changing requirements along the way, and must be extremely flexible in terms of the data provided to the end-user. I believe it’s also true that for “Cross-Domain” BI to succeed, there must be an Agile component to the business as a whole. If an organization is rigidly structured, with well-defined “silos” of information, any attempt to develop cross-domain BI will likely end up in several BI silos that ultimately become useless when combined. For a cross-domain BI project to succeed, each of the silos of information must understand how data from other silos can be used to improve their own performance. In order to accomplish this, there needs to be an over-arching description of the business goals for the BI project, as well as a description of the goals for each silo. Generally speaking, this is done by following the “Business Scenario”-focused process such as the Microsoft Solutions Framework (MSF).&lt;/p&gt;  &lt;p&gt;This brings me back to my original point. In order to properly apply BI techniques to the “Cross Domain” problem, organizations must first understand the problem that they are trying to solve. If they do this by creating an over-arching “cross domain” Business Scenario that contains the following steps:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;What questions are you trying to answer?&lt;/li&gt;    &lt;li&gt;What data do you need to answer the questions?&lt;/li&gt;    &lt;li&gt;Where does the data exist?&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;They are more likely to succeed at delivering a useful solution. If they don’t follow this simple approach, they are likely to be left wondering what happened.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/06/cross-domain-business-intelligence.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>5</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-1057817195220075376</guid><pubDate>Tue, 16 Jun 2009 11:56:00 +0000</pubDate><atom:updated>2009-06-16T05:56:54.431-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>General Ramblings</category><title>BI for the IT Guy (or Gal)</title><description>&lt;p&gt;One of the things that I have struggled with in the past in explaining “my” product (&lt;a href="http://www.emc.com/products/detail/software/configuration-analytics-manager.htm" target="_blank"&gt;Configuration Analytics Manager&lt;/a&gt; – CAM) to people who’ve not seen it, is connecting the dots from the “elevator pitch” to the real business proposition. When most IT people think of Business Intelligence (BI) products, they automatically think sales management or maybe they think “Advanced Reporting”. Since people generally tend to classify things into buckets that they can understand, once they hear “BI”, they’re automatically framing everything else said about the product into one of the above categories. When people hear about CAM and they hear, “It’s BI for IT data”, there is either a blank look of, “Why do I need *that*?” or even, “That’s cool, I can have charts and graphs on my reports now”.&lt;/p&gt;  &lt;p&gt;The problem that I have with the above is IT management is a maturing industry. IT used to be the cost center that simply provided the business with the tools it needs to prosper. IT used to be “those people” that you only had to talk to when something was wrong, or when something new was needed. (Unless of course you are in IT management, in which case you only got to talk to people when something was wrong or when something new was needed). These days IT has become a first class citizen in most corporate environments, and is being seen as much more than a simple cost center. IT is being measured by much more than just how well they managed their budget. &lt;/p&gt;  &lt;p&gt;Given all of the above, sometimes it feels like I’m on a crusade, and the first step is to try and get IT managers and directors to start thinking of their work in terms of business profitability. If IT managers start thinking in terms of a given process (be it keeping an application running, managing a service desk, monitoring an application, or any of the other day to day tasks an IT person performs) as their “product”, and improvement of that process as being their “profit”, then BI solutions built using IT management data will start to become a pervasive requirement for IT organizations. (What better way to help improve profit than to analyze what is and isn’t working?) CAM is poised to be *the* product to help IT when that day comes.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/06/bi-for-it-guy-or-gal.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>2</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-2676617363193182101</guid><pubDate>Fri, 12 Jun 2009 16:45:00 +0000</pubDate><atom:updated>2009-06-12T10:45:05.272-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>EMC</category><title>More on the transition to EMC</title><description>&lt;p&gt;If you follow this blog, you know that I’ve recently become an employee of EMC. As a matter of fact, I just got my badge today. (As a side note, why do badge photos have to be so horrible?) It has been a whirlwind transition, and has actually been only my second experience of being part of an acquisition by a larger entity. &lt;/p&gt;  &lt;p&gt;I realize that it’s early in the game, but I have to honestly state that I have been impressed by the attention that I’ve received by various folks throughout the EMC organization. From the management of my new organization, down to bloggers and tweeters who’ve sent their “Welcome Aboard” messages, I’ve felt as if I’ve personally been welcomed to the EMC family.&lt;/p&gt;  &lt;p&gt;Which brings me to the point of this post..&lt;/p&gt;  &lt;p&gt;Joe Tucci (CEO of EMC) wrote an &lt;a title="open letter to Data Domain employees" href="http://www.emc.com/about/announcements/0509-data-domain.htm"&gt;open letter to Data Domain employees&lt;/a&gt; basically telling them what life at EMC would be like. (If you haven’t heard the story, EMC has made a bid for Data Domain – read about that here: &lt;a title="http://www.theregister.co.uk/2009/06/01/emc_bids_for_data_domain/" href="http://www.theregister.co.uk/2009/06/01/emc_bids_for_data_domain/"&gt;http://www.theregister.co.uk/2009/06/01/emc_bids_for_data_domain/&lt;/a&gt; ) and several EMC employees have added to the conversation by stating why they feel EMC is a great place to work. (See Polly Persons blog here: &lt;a title="http://www.pollypearson.com/main/2009/06/emc-folks-add-to-the-discussion-why-do-i-work-at-emc.html" href="http://www.pollypearson.com/main/2009/06/emc-folks-add-to-the-discussion-why-do-i-work-at-emc.html"&gt;http://www.pollypearson.com/main/2009/06/emc-folks-add-to-the-discussion-why-do-i-work-at-emc.html&lt;/a&gt;). As a brand-new-to-the-EMC-culture person, I feel that I don’t really have much to add to the discussion, but I can say one thing for sure, if all people that EMC have absorbed through acquisition are treated as I have been (yeah yeah, I know that it’s still early!) then the folks over at Data Domain are in for a very pleasant surprise!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/06/more-on-transition-to-emc.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-8292703134618234220</guid><pubDate>Wed, 10 Jun 2009 19:54:00 +0000</pubDate><atom:updated>2009-06-10T13:54:13.114-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>General Ramblings</category><category domain='http://www.blogger.com/atom/ns#'>Configuresoft</category><title>and Hello EMC!</title><description>&lt;p&gt;If you read &lt;a href="http://portal.sqltrainer.com/2009/06/so-long-configuresoft.html" target="_blank"&gt;my previous post&lt;/a&gt;, you know that I still had some issues to clean up around the &lt;a href="http://www.configuresoft.com" target="_blank"&gt;Configuresoft&lt;/a&gt; to &lt;a href="http://www.EMC.com" target="_blank"&gt;EMC&lt;/a&gt; transition. I’m happy to say that these issues are all now nicely put to bed, so I can now move forward as a member of the EMC team!&lt;/p&gt;  &lt;p&gt;My official title is, “&lt;strong&gt;&lt;em&gt;Principal Software Engineer&lt;/em&gt;&lt;/strong&gt;”, which, in an organization like EMC, is not a bad title to have. I will be part of the (soon to be rebranded, more on that in future posts) Resource Management Software Group (RMSG), which is essentially the group that &lt;a href="http://www.emc.com/products/category/it-management.htm" target="_blank"&gt;manages the datacenter&lt;/a&gt;. Effectively my world went from building tools that satisfy a small slice of the data center management problem, to now being a part of a much larger and more exciting data center management strategy. (Kind of a hoot to click &lt;a href="http://www.emc.com/products/category/subcategory/data-center-automation-compliance.htm" target="_blank"&gt;here&lt;/a&gt; and see “my” product at the top of the list)&lt;/p&gt;  &lt;p&gt;Anyway, I’m still learning my way around, and will likely keep a low profile for the next several weeks. Having said that though, I’ll be travelling to the “mother ship” (kind of cool to have a “mother ship” after so long *being* part of the mother ship) next week for more integration meetings.&lt;/p&gt;  &lt;p&gt;Exciting times await!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/06/and-hello-emc.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-2381986993222729797</guid><pubDate>Thu, 04 Jun 2009 14:09:00 +0000</pubDate><atom:updated>2009-06-04T08:09:49.479-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>General Ramblings</category><category domain='http://www.blogger.com/atom/ns#'>Configuresoft</category><title>So long, Configuresoft…….</title><description>&lt;p&gt;For those that know me, you know I’ve spent untold hours over the last seven years working with a fantastic company and some of the best people I’ve ever had the privilege of knowing. When I first came aboard, &lt;a href="http://www.configuresoft.com" target="_blank"&gt;Configuresoft&lt;/a&gt; was a very-small-yet-vibrant company trying hard to make a mark in the IT Configuration Management field. Over the years, I’d say we hit that goal and then some.&lt;/p&gt;  &lt;p&gt;With all of the consolidation in the Configuration Management market, and the economic realities of the day, it was only a matter of time before we became part of something bigger. That day came last week as it was announced that &lt;a href="http://itmanagement2.com/?p=498" target="_blank"&gt;Configuresoft would be acquired&lt;/a&gt; by EMC corporation. My personal thoughts on this vary, it’s sad to see the company that has been such a huge part of my life over the last several years disappear, but on the other hand, the forward-looking opportunities are simply amazing. We became part of the (soon to be rebranded) RMSG group within EMC, which means that we have the resources of a $15B, 47,000 employee company behind us, and a suite of products in our arsenal that are second to none in the industry.&lt;/p&gt;  &lt;p&gt;I don’t yet know what this means for me personally. I think I am going to stay, but there are some legal issues that need to be worked out. I do know that “my” product, which has been branded by EMC as “Configuration Analytics Manager” (CAM) will play a huge role in RMSGs strategy moving forward, and if it works out, I’m going to be very excited to play a role in all of that.&lt;/p&gt;  &lt;p&gt;So, goodbye Configuresoft, it’s been a wonderful ride!&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/06/so-long-configuresoft.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-7869108633288045971</guid><pubDate>Thu, 23 Apr 2009 02:42:00 +0000</pubDate><atom:updated>2009-04-22T20:42:43.059-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>Virtual Conference</category><category domain='http://www.blogger.com/atom/ns#'>General Ramblings</category><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio Team System</category><title>What a Surreal Day!</title><description>&lt;p&gt;Today was one of those VERY strange days.. &lt;/p&gt;  &lt;p&gt;For starters, today was the first day of the &lt;a href="https://www.vconferenceonline.com/shows/spring09/sql/register/multireg.asp?show=sql" target="_blank"&gt;SSWUG Ultimate Virtual Conference&lt;/a&gt; (by the way, there’s still 2 days left and you can still register. Use code &lt;strong&gt;VCTAF457840-140&lt;/strong&gt; when you do!) during which I presented 3 sessions on Visual Studio Team System Database Edition. They had been filmed a couple of weeks ago, but this was the day they went live.&lt;/p&gt;  &lt;p&gt;Next up, today was also the Microsoft &lt;strong&gt;Team System Big Event &lt;/strong&gt;in Denver (see my previous post about that). This event had a lot of great speakers, including &lt;a href="http://blogs.msdn.com/slange/" target="_blank"&gt;Steve Lange&lt;/a&gt;, &lt;a href="http://jerrytech.blogspot.com/" target="_blank"&gt;Jerry Nixon&lt;/a&gt;, &lt;a href="http://joeshirey.com/" target="_blank"&gt;Joe Shirey&lt;/a&gt;, &lt;a href="http://blogs.msdn.com/bags/" target="_blank"&gt;Rob Bagby&lt;/a&gt; and &lt;a href="http://www.peterprovost.org/blog/" target="_blank"&gt;Peter Provost&lt;/a&gt;. For some reason they also let me present a session.&lt;/p&gt;  &lt;p&gt;So you may wonder, what’s so surreal about that? Well, it turns out Rob was presenting the session on “Data Dude” (or Database Edition as he called it) at the Big Event. My 3 sessions at the SSWUG conference were also on Data Dude. The timing worked out such that Rob presented his session about the same time as my first one was being repeated at the SSWUG event. We used slides built from the same master deck, so I was literally watching my video (muted) while in the room with Rob and noticing that he was talking about many of the same topics at the same time…. Quite surreal!&lt;/p&gt;  &lt;p&gt;Anyway, days 2 and 3 of the SSWUG conference are up now, and I’ve still got 3 Business Intelligence sessions to “deliver”.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/04/what-surreal-day.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-6151750652536453985</guid><pubDate>Mon, 06 Apr 2009 13:38:00 +0000</pubDate><atom:updated>2009-04-06T07:38:26.860-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Business Intelligence</category><category domain='http://www.blogger.com/atom/ns#'>Virtual Conference</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio</category><title>SSWUG Online FREE Community Event</title><description>&lt;p&gt;&lt;a href="https://www.vconferenceonline.com/shows/spring09/sql/s09event.asp"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="SSWUGBanner" border="0" alt="SSWUGBanner" src="http://www.sqltrainer.com/pics/blogpics/SSWUGOnlineFREECommunityEvent_6B70/SSWUGBanner.jpg" width="727" height="103" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;The good people over at the &lt;a href="http://www.sswug.org" target="_blank"&gt;SQL Server Worldwide Users Group&lt;/a&gt; (SSWUG) are hosting a FREE online event to showcase some of the best sessions from past Virtual Conferences. The event will be held online Friday April 17th, starting at 9am Pacific time. Info about the event:&lt;/p&gt;  &lt;p&gt;&lt;b&gt;About the SSWUG Community Virtual Event&lt;/b&gt;    &lt;br /&gt;We're working to bring you real-world information about SQL Server 2008, Share Point, Silver Light, with tips about new features, functionality and much more. &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;b&gt;Microsoft SQL Server 2008 Analysis Services - Designing and Managing High Performance Cubes: Donald Farmer&lt;/b&gt;. With Microsoft SQL Server 2008, Analysis Services offers advanced features for design and manageability. This session will explore in data two of these features: the best practices design alerts and dynamic management views. Design alerts guide you with important advice throughout the development stage of a cube. We'll show how to work with the alerts, and how to manage their various subtleties. Having deployed a more efficient cube, the dynamic management views enable the administrator to query for information regarding connections, sessions, and server performance. We will introduce these views and drill down into many examples of their usage. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Useful, effective, pre-made SharePoint templates-- from Microsoft, for free: Callahan&lt;/b&gt;. Need a help desk, timecard, or vacation request site? Considering just creating them yourself? Don't. Not until you explore the pre-existing application templates available from Microsoft. Free for download, Microsoft has 40 fantastic application templates, not to mention the Community Kit for SharePoint with offerings such as enhanced blog templates, or sites for user groups. So before you invest time and technology in rolling your own, check out this session and get an idea of what's been rolled for you. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Silverlight for Beginners: Tim Heuer&lt;/b&gt;. XAML, WPF, VS, Blend...what are all these acronyms? Let’s take a deep breath and step back to look at the spectrum of what Silverlight is (and isn’t) and what you need to know starting from ground zero. No knowledge of WPF or Silverlight is required and well get you started building your first Silverlight application in no time! &lt;/li&gt;    &lt;li&gt;&lt;b&gt;Introduction to Data Dude: Ted Malone&lt;/b&gt;. In this session attendees will learn about Microsoft Visual Studio Team Edition for Database Developers aka Data Dude. This product provides database developers with tools for database development, change management and testing. This session will walk through the available features of data dude and detail where they can be best utilized. &lt;/li&gt;    &lt;li&gt;&lt;b&gt;SQL Server Round Table: Paul Nielsen, Chris Shaw, Stephen Wynkoop&lt;/b&gt;. Stephen, Chris and Paul sit down in an open forum and discuss questions about SQL Server 2008 and questions that have come up from the past conferences. This session will show some of the different opinions that developers and administrators have when working with SQL Server. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;To register for this FREE event, check out the URL here: &lt;a title="https://www.vconferenceonline.com/shows/spring09/sql/s09event.asp" href="https://www.vconferenceonline.com/shows/spring09/sql/s09event.asp"&gt;https://www.vconferenceonline.com/shows/spring09/sql/s09event.asp&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/04/sswug-online-free-community-event.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-6042059390008632023</guid><pubDate>Fri, 13 Mar 2009 00:07:00 +0000</pubDate><atom:updated>2009-03-13T11:02:14.416-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio</category><title>The Visual Studio Team System Big Event!</title><description>&lt;p&gt;One of the great things about being a Team System MVP is you get to participate in many different types of events. I really have to hand it to Steve Lange and Microsoft on this one though, this looks to be a fantastic idea and should make for a fun and educational time.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.sqltrainer.com/pics/blogpics/TheVisualStudioTeamSystemBigEvent_FC1E/image.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://www.sqltrainer.com/pics/blogpics/TheVisualStudioTeamSystemBigEvent_FC1E/image_thumb.png" width="498" height="74" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;The Visual Studio Team System Big Event is an &lt;strong&gt;invitation only&lt;/strong&gt; event being held in several cities throughout the west. If you are involved with Team System at all, or if you’ve always wondered what VSTS was all about, this is an event you won’t want to pass up. Here is some more information on the Denver event, which is being held on April 22 at the Denver Office. This is an &lt;strong&gt;invite only&lt;/strong&gt; event, but the good news is, well, here’s your invite! Click the “Register Online” button below, and when prompted for the secret code, use &lt;strong&gt;DD1A7F. &lt;/strong&gt;Oh, I will be presenting the, “Bang for your Buck, Getting the Most out of Team Foundation Server” session. Hope to see you all there…&lt;/p&gt;&lt;p&gt;&lt;a href="http://msevents.microsoft.com/CUI/InviteOnly.aspx?EventID=B3-17-E7-0F-3D-A9-CF-93-D5-46-14-B9-D3-7C-38-D5&amp;amp;Culture=en-US"&gt;Register Online&lt;/a&gt; (Remember to use code &lt;strong&gt;DD1A7F&lt;/strong&gt; when prompted)&lt;/p&gt;&lt;p&gt;Wednesday, April 22, 2009 8:30 AM - Wednesday, April 22, 2009 5:00 PM Mountain Time (US &amp;amp; Canada)&lt;br /&gt;Welcome Time: 8:00 AM&lt;/p&gt;&lt;p&gt;Microsoft Corporation &lt;/p&gt;&lt;p&gt;7595 Technology Way, Suite 400&lt;br /&gt;Denver Colorado 80237&lt;br /&gt;United States&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Language(s):&lt;/strong&gt;&lt;br /&gt;English.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Product(s):&lt;/strong&gt;&lt;br /&gt;Microsoft Visual Studio, Microsoft Visual Studio 2008 and Microsoft Visual Studio 2010.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Audience(s):&lt;/strong&gt;&lt;br /&gt;Developer, IT Professional and Professional Developer/Coder.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Event Overview&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;i&gt;How do you take an idea from conception to completion? How can you truly do more with less? &lt;/i&gt;&lt;/p&gt;&lt;p&gt;Please join us for this unique, invitation-only event to discover how both product and processes help your organization succeed in today’s environment. We will explore how Team System assists teams across the board to be successful in today’s tough times. This “break through” event will not only provide you with best practices around development and testing, but will demonstrate key capabilities of both Visual Studio Team System 2008 and the upcoming 2010 release. It’s a day that promises to have something for everyone!&lt;/p&gt;&lt;h4&gt;SESSIONS&lt;/h4&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Test Driven Development: Improving .NET Application Performance &amp;amp; Scalability&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;This session will demonstrate how to leverage Test Driven Development in Team System. We’ll highlight both writing unit tests up front as well as creating test stubs for existing code.&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;"It Works on My Machine!" Closing the Loop Between Development &amp;amp; Testing&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;In this session, we will examine the traditional barriers between the developer and tester; and how Team System can help remove those walls.&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Treating Databases as First-Class Citizens in Development&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Team System Database Edition elevates database development to the same level as code development. See how Database Edition enables database change management, automation, comparison, and deployment.&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Architecture without Big Design Up Front &lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Microsoft Visual Studio Team System 2010 Architecture Edition, introduces new UML designers, use cases, activity diagrams, sequence diagrams that can visualize existing code, layering to enforce dependency rules, and physical designers to visualize, analyze, and refactor your software. See how VSTS extends UML logical views into physical views of your code. Learn how to create relationships from these views to work items and project metrics, how to extend these designers, and how to programmatically transform models into patterns for other domains and disciplines.&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Development Best Practices &amp;amp; How Microsoft Helps &lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Sometimes development teams get too bogged down with the details. Take a deep breath, step back, and re-acquaint yourself with a review of current development best practice trends, including continuous integration, automation, and requirements analysis; and see how Microsoft tools map to those practices.&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;"Bang for Your Buck" Getting the Most out of Team Foundation Server&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Today’s IT budgets are forcing teams to do as much as they can with as little as possible. Why not leverage Team Foundation Server to its full potential? In this session we’ll highlight some capabilities of TFS that you may or may not already know about to help you maximize productivity.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Registration Options&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;b&gt;Event ID: &lt;/b&gt;&lt;br /&gt;1032408398&lt;/p&gt;&lt;p&gt;Register by Phone&lt;br /&gt;1-877-673-8368&lt;/p&gt;&lt;p&gt;There are other cities on the tour as well:&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Mountain View, CA April 28, 2009&lt;/strong&gt;&lt;br /&gt;Click &lt;a href="http://msevents.microsoft.com/CUI/InviteOnly.aspx?EventID=74-21-6B-AF-11-28-B7-87-29-81-DA-38-74-DD-AC-92&amp;amp;Culture=en-US" target="_blank"&gt;here&lt;/a&gt; to register with invitation code: &lt;strong&gt;80D459&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Irvine, CA April 30, 2009&lt;br /&gt;&lt;/strong&gt;Click &lt;a href="http://msevents.microsoft.com/CUI/InviteOnly.aspx?EventID=74-21-6B-AF-11-28-B7-87-DB-CC-81-EA-88-C6-4F-AB&amp;amp;Culture=en-US" target="_blank"&gt;here&lt;/a&gt; to register with invitation code: &lt;strong&gt;A86389&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Portland, OR May 5, 2009&lt;br /&gt;&lt;/strong&gt;Click &lt;a href="http://msevents.microsoft.com/CUI/InviteOnly.aspx?EventID=74-21-6B-AF-11-28-B7-87-09-F0-F4-7A-CA-9E-8E-67&amp;amp;Culture=en-US" target="_blank"&gt;here&lt;/a&gt; to register with invitation code: &lt;strong&gt;2DC0A9&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Phoenix, AZ May 7, 2009&lt;br /&gt;&lt;/strong&gt;Click &lt;a href="http://msevents.microsoft.com/CUI/InviteOnly.aspx?EventID=74-21-6B-AF-11-28-B7-87-D5-BB-15-AE-35-CB-41-14&amp;amp;Culture=en-US" target="_blank"&gt;here&lt;/a&gt; to register with invitation code: &lt;strong&gt;90BC47&lt;/strong&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/03/visual-studio-team-system-big-event.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-1594356916815715423</guid><pubDate>Wed, 11 Mar 2009 20:40:00 +0000</pubDate><atom:updated>2009-03-11T14:40:48.333-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>SharePoint</category><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><category domain='http://www.blogger.com/atom/ns#'>Visual Studio</category><title>Visual Studio Team System (VSTS) Rangers Ship New SharePoint guidance!</title><description>&lt;p&gt;Before I get into the purpose of this post, it’s probably important to define just exactly what a “VSTS Ranger” is. The following definition was taken from &lt;a href="http://blogs.msdn.com/willy-peter_schaub" target="_blank"&gt;Willy-Peter Schaub’s&lt;/a&gt; blog:&lt;/p&gt;  &lt;p&gt;&lt;em&gt;“Rangers are responsible for the creation of reusable “out of band” solutions for missing functionality in the TFS and VSTS family of products, striving for active community readiness knowledge sharing and are influencing VSTS.vNext … the next generation of the tools.&lt;/em&gt;”&lt;/p&gt;  &lt;p&gt;There are Core Rangers and Extended Rangers. Members of the “Extended Rangers” team do not have to be Microsoft Employees. There are a number of non-Microsoft Extended Rangers. One of the cool opportunities that exist for MVPs in Team System is the ability to become part of the Extended Rangers Team. One of the cool benefits of being an Extended Ranger is participating on high-visibility projects that ultimately help make life easier for VSTS customers.&lt;/p&gt;  &lt;p&gt;So, having said all of that (From an email I just received)….&lt;/p&gt;  &lt;p&gt;In the last couple of days, Rangers shipped important guidance packages for MOSS TFS development. For maximum reach, we have simultaneously posted to the &lt;a href="http://msdn.microsoft.com/en-us/teamsystem/default.aspx"&gt;Team System Home&lt;/a&gt; and &lt;/p&gt;  &lt;p&gt;&lt;a href="http://msdn.microsoft.com/en-us/office/cc990283.aspx"&gt;Application Lifecycle Management Resource Center for SharePoint Server&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;The two whitepapers are:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://go.microsoft.com/fwlink/?LinkId=141577"&gt;VSTS Rangers - SharePoint Server Custom Application Development: Document Workflow Management Project&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Read about the real-world design, construction, and deployment of a custom SharePoint Server 2007 application to a mid-market enterprise customer using Team Foundation Server as an ALM platform.&lt;/p&gt;  &lt;p&gt;and&lt;/p&gt;  &lt;p&gt;&lt;u&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/cc948982.aspx"&gt;VSTS Rangers - Using Team Foundation Server to Develop Custom SharePoint Products and Technologies Applications&lt;/a&gt;&lt;/u&gt;&lt;/p&gt;  &lt;p&gt;Learn how to use TFS to support your SharePoint application development, and provide an integrated development environment and single source code repository for process activities, integrated progress reporting, and team roles.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The first article was created during a real world customer engagement and answers dozens of frequently asked questions and how-tos in a real world context vs. theoretical discussions. The article addresses very common questions around setting up and using TFS features for a MOSS development project.&lt;/p&gt;  &lt;p&gt;Combined with the following guidance from P&amp;amp;P posted &lt;a href="http://msdn.microsoft.com/en-us/office/cc990283.aspx"&gt;here&lt;/a&gt;, we have a good and almost complete story for our customers and partners. The two teams worked together to align these stories.&lt;/p&gt;  &lt;p&gt;&lt;u&gt;&lt;a href="http://msdn.microsoft.com/library/dd203468.aspx"&gt;patterns &amp;amp; practices: SharePoint Guidance&lt;/a&gt;&lt;/u&gt;&lt;/p&gt;  &lt;p&gt;The SharePoint Guidance contains a sample implementation of an intranet application based on SharePoint Server 2007 that demonstrates solutions to many ALM challenges.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/03/visual-studio-team-system-vsts-rangers.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-3124192946915692460.post-4508685565891466054</guid><pubDate>Sun, 08 Mar 2009 02:47:00 +0000</pubDate><atom:updated>2009-03-07T19:47:56.932-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>General Ramblings</category><category domain='http://www.blogger.com/atom/ns#'>Agile Development</category><title>The MVP Summit and Agile Arguments</title><description>&lt;p&gt;I spent the earlier part of this week in Seattle and Redmond for the 2009 MVP Summit (I am actually somewhere in the crowd for &lt;a href="http://www.microsoft.com/video/en/us/default.aspx" target="_blank"&gt;this video&lt;/a&gt;, please don’t hold it against me…) and came away with a ton of stuff that I can’t talk about, but more importantly I came away with an understanding that the economy may be tanking, but there is a TON of opportunity out there for people in our space.. I talked with several of my peers, and to a person, they are all extremely excited about the opportunities that they see. Hopefully the optimism will continue through the coming months, but I suspect it will.&lt;/p&gt;  &lt;p&gt;About a month ago I sent &lt;a href="http://visualstudiomagazine.com/columns/article.aspx?editorialsid=3020" target="_blank"&gt;some comments&lt;/a&gt; to &lt;a href="http://visualstudiomagazine.com" target="_blank"&gt;Visual Studio Magazine&lt;/a&gt; in reference to an article they printed on the &lt;a href="http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2952" target="_blank"&gt;longevity of Agile&lt;/a&gt;. They saw fit to publish an edited version of my comments here: &lt;a title="http://visualstudiomagazine.com/columns/article.aspx?editorialsid=3020" href="http://visualstudiomagazine.com/columns/article.aspx?editorialsid=3020"&gt;http://visualstudiomagazine.com/columns/article.aspx?editorialsid=3020&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;(c) 2011 Ted Malone, all rights reserved&lt;/div&gt;</description><link>http://blog.sqltrainer.com/2009/03/mvp-summit-and-agile-arguments.html</link><author>noreply@blogger.com (Ted Malone)</author><thr:total>0</thr:total></item></channel></rss>