HDInsight – Hive part 1

I decided I’d write a little about my first experiences with HDInsight over the last few weeks.  There is a great getting started guide located here (and samples that come with the software): http://gettingstarted.hadooponazure.com/gettingStarted.html

So I was excited to play with my Windows Hadoop cluster (HDInsight) and thought I’d break out some of the samples for it.  The good news is they all worked! 

After loading some data I decided to browse the example for running a hive job.  I then decided to go through some of the steps to work with hive without using the pre-packaged script.  Here’s where things got interesting create.hql shows the query for creating with and populating the hive table.  It’s shown below:

drop table w3c;
CREATE TABLE w3c(
 logdate string,
 logtime string,
 c_ip string,
 cs_username string,
 s_ip string,
 s_port string,
 cs_method string,
 cs_uri_stem string,
 cs_uri_query string,
 sc_status int,
 sc_bytes int,
 cs_bytes int,
 time_taken int,
 cs_agent string,
 cs_Referrer string)
row format delimited
fields terminated by ' '
;
LOAD DATA INPATH '${hiveconf:input}' OVERWRITE INTO TABLE w3c

Run this command and you will receive the following:

The syntax of the command is incorrect.

Doh!

I messed around with this a little bit at this point and then decided to give Bing a turn with it using the following search query: “HDInsight Hive The syntax of the command is incorrect”.  To Bing’s credit (and yes, it is my normal search engine – I like the pictures) the second result was http://gettingstarted.hadooponazure.com/releaseNotes.html which contained

•Hive Console

•If a newline is included in the Hive command submitted, you will get a “syntax error.” Remove newlines and the query should execute as intended.

While writing this I tried with Google and it wasn’t even on the first page!  Take that!

So now I had my answer and could move forward.  Running the commands one at a time, because I felt like examining the steps better, I was able to stand up and use the queries so long as they were on a single line.  If they’re submitted as jobs it seems to be OK. 

Now I decided I would try my hand at doing some of my own work, which is the subject of the next post. 

Advertisements

About danrosanova
I am a Principal Program Manager for Messaging at Microsoft and product owner for Azure Messaging: Service Bus, Relay, and Event Hubs. I have a long history in distributed computing on a variety of platforms and have focused on large scale messaging and middleware implementations from inception to implementation. I was a five time Microsoft MVP before joining Microsoft and author of the book Microsoft BizTalk Server 2010 Patterns.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: