Apache Storm on Windows

In a release in February Apache Storm community added Windows platform support for Storm 0.9.1

I for one have been very excited to see this.  The Hortonworks distribution of Hadoop (HDP) is the only one that runs on both Windows and Linux and this gives a lot more choice to traditional enterprise clients.  I’ve been working with HDP for about a year and a half now and really like the experience – both on Linux and Windows. 

Storm is a very exciting development in real time data processing using a Hadoop cluster.  This is useful for running models that you’ve created by more traditional batch processing and map reduce within Hadoop.  Storm uses a simple spout and bolt topology for processing tuples of information at scale and in real time.  More information can be found at the storm site: http://storm.incubator.apache.org/

I am now wondering if this technology, now running on Windows, will make it into the Windows Azure HDInsight service.  I certainly don’t have any inside information on this, but I’d be interested to see it. 

Upcoming Speaking Engagements

This is a busy month for me.  I will be at both the GPU Technology Conference and Hadoop Summit Europe.  Both events are in the same week with my dates on March 19th and 21st respectively, which will make for fun travel  Both promise to be amazing conferences with a lot of knowledge share and I am honored to be a part of each. 

Being from the Microsoft camp as I am both my sessions will involve these technologies from a Microsoft context or standpoint.  In the case of GTC this will be using .NET to write CUDA (GPU) applications.  For Hadoop this will be using Hadoop within the Microsoft ecosystem (which if you have not noticed is a very large ecosystem). 

I’m very excited for both of these events and eagerly looking forward to them and the discussions and learning that accompany both. 

Video from Service Technology Symposium

In September I spoke at the http://www.servicetechsymposium.com/ in London, which was a terrific event.  I met some great people and got to sit in on a lot of really interesting sessions.  I cannot say enough about this conference because it really was awesome.  I’d definitely recommend it and will certainly attend this year (I’d love to be a presenter again if possible).  They did everything well: quality sessions, great venue, great schedule and tempo.  They even had good food (I miss having a tea break in the afternoon). 

They also had great AV and production, which is evident in the videos they’re posting from the conference.  My session has just been posted and is available at: http://www.infoq.com/presentations/HPC-Cloud

I hope you enjoy it and appreciate any feedback.  Also, I tuned my GPU code a little bit and got even BETTER performance out of it.

HDInsight – Hive part 1

I decided I’d write a little about my first experiences with HDInsight over the last few weeks.  There is a great getting started guide located here (and samples that come with the software): http://gettingstarted.hadooponazure.com/gettingStarted.html

So I was excited to play with my Windows Hadoop cluster (HDInsight) and thought I’d break out some of the samples for it.  The good news is they all worked! 

After loading some data I decided to browse the example for running a hive job.  I then decided to go through some of the steps to work with hive without using the pre-packaged script.  Here’s where things got interesting create.hql shows the query for creating with and populating the hive table.  It’s shown below:

drop table w3c;
 logdate string,
 logtime string,
 c_ip string,
 cs_username string,
 s_ip string,
 s_port string,
 cs_method string,
 cs_uri_stem string,
 cs_uri_query string,
 sc_status int,
 sc_bytes int,
 cs_bytes int,
 time_taken int,
 cs_agent string,
 cs_Referrer string)
row format delimited
fields terminated by ' '

Run this command and you will receive the following:

The syntax of the command is incorrect.


I messed around with this a little bit at this point and then decided to give Bing a turn with it using the following search query: “HDInsight Hive The syntax of the command is incorrect”.  To Bing’s credit (and yes, it is my normal search engine – I like the pictures) the second result was http://gettingstarted.hadooponazure.com/releaseNotes.html which contained

•Hive Console

•If a newline is included in the Hive command submitted, you will get a “syntax error.” Remove newlines and the query should execute as intended.

While writing this I tried with Google and it wasn’t even on the first page!  Take that!

So now I had my answer and could move forward.  Running the commands one at a time, because I felt like examining the steps better, I was able to stand up and use the queries so long as they were on a single line.  If they’re submitted as jobs it seems to be OK. 

Now I decided I would try my hand at doing some of my own work, which is the subject of the next post.