Submitting HDInsight Jobs From An Azure Webjob or WorkerRole Using the C# Hadoop SDK

August 22 2014

All the samples for submitting jobs programmatically to HDInsight assume that you are doing so from a desktop working station that has been set up with a management certificate. The code gets your cert out of the cert store and creates a JobSubmissionCertificateCredential as such:

// Get the certificate object from certificate store using the friendly name to identify it
X509Store store = new X509Store();
store.Open(OpenFlags.ReadOnly);
X509Certificate2 cert = store.Certificates.Cast<X509Certificate2>().First(item => item.FriendlyName == certFriendlyName);
JobSubmissionCertificateCredential creds = new JobSubmissionCertificateCredential(new Guid(subscriptionID), cert, clusterName);
// Submit the Hive job
var jobClient = JobSubmissionClientFactory.Connect(creds);
JobCreationResults jobResults = jobClient.CreateHiveJob(hiveJobDefinition);
 

This is all well and good, but what if you need to submit jobs programmatically from, say, an Azure WebJob or a worker role.

The way I solved this was generating my own management cert with a private key and then uploading it with the exe, placing the cert in the bin with the .exe. Here’s the code to generate a cert (tip of the hat to David Hardin’s post)

makecert -r -pe -a sha1 -n "CN=Windows Azure Authentication Certificate" -ss my -len 2048 -sp "Microsoft Enhanced RSA and AES Cryptographic Provider" -sy 24 -sv ManagementApiCert.pvk ManagementApiCert.cer
pvk2pfx -pvk ManagementApiCert.pvk -spc ManagementApiCert.cer -pfx ManagementApiCert.pfx -po password
 
 
 
ob

Then, after uploading the .cert to the Azure Management Certificate store (see here for doing that) and adding the .pfx to your project (be sure to set copy local to true) you can use the following code to create a JobSubmissionCertificateCredential:

var cert = new X509Certificate2("ManagementApiCert.pfx","your_password",X509KeyStorageFlags.MachineKeySet);
JobSubmissionCertificateCredential creds = new JobSubmissionCertificateCredential(new Guid(subscriptionID), cert, clusterName);
 
 

Tip of the hat to Tyler Doerksen who’s post led me to setting the MachineKeySet flag.

And, there you go: the ability to submit Hadoop jobs programatically from a WebJob or WorkerRole.

Adding JAR Files To Hive Queries In HDInsight That Reference WASB Can’t Be At The Root Of The Container

August 13 2014

Just discovered that if you want to add a JAR file to an HQL statement, the JAR file can’t be at the root of your container. It has to be in a virtual directory. So, for example, this code will not work:

ADD JAR wasb:///csv-serde-1.1.2-0.11.0-all.jar;

But, this code will:

ADD JAR wasb:///user/hdp/share/lib/hive/csv-serde-1.1.2-0.11.0-all.jar;

And, annoyingly, the blob storage browser in Visual Studio doesn’t allow you to create directories, so you’ll need to download something ClumsyLeaf CloudXplorer or the like.

HDInsight Hadoop Hive Job Decompresses CSV GZIP Files By Default

August 8 2014

Been working with Hadoop (2.4.0) and Hive (0.13.0) with HDInsight (3.1) and it decompresses GZIP files into CSV by default.  Nice!  So, loading data with a Hive query in Powershell:

$response = Invoke-Hive -Query @"
    
LOAD DATA INPATH 'wasb://$container@$storageAccountName.blob.core.windows.net/file.csv.gz' 

INTO TABLE logs;
               

"@ 

No additional work or arguments to pass. I thought I had to do something like specified in this post with the io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec but apparently not.

 

UPDATE: Just found this link: https://cwiki.apache.org/confluence/display/Hive/CompressedStorage which goes into keeping compressed data in Hive which has a recommendation to create a SequenceFile.

“Why We Need The Indie Web” by Tantek Celik

June 25 2014

On the open web…

Null Value When Calling CloudQueue ApproximateMessageCount With Windows Azure Storage Client

June 25 2014

Was recently perplexed by why the nullable ApproximateMessageCount property was always null. Then discovered that you have to call FetchAttributes() before accessing this property.  Solved!

Microsoft Azure and Visual Studio Online Server Builds – Tips & Tricks

April 22 2014

I’m all about deploying web sites and cloud services via server builds. Say goodbye to deployments from a developer’s box that can’t be reproduced on another box. Say goodbye to deploying code that isn’t checked in. Say goodbye to deployments that aren’t fully tested.  It is indeed super cool.

So how to get it set up? There are some good tutorials out there, but it can get a little tricky. Here’s how I did it.

First, make sure that you’ve linked your Azure account to your Visual Studio Online repository as explained in Step 3 here: http://azure.microsoft.com/en-us/documentation/articles/cloud-services-continuous-delivery-use-vso/

Then, if you open VS from the Azure website, it will generate a build template for you that is the name of your Azure deployment with an _CD at the end. You’ll have to tweak some things to get it happy though.

First, go the Build section of Team Explorer window:

0

Then, right click on the Build definition file and click Edit Build Definition.

There’s a bunch of things you will need to change.

In the general tab, make sure you enable the build definition. By default it is disabled:

11

In the source settings tab, make sure it is pointing to the right repository:

1

In the Trigger tab, you may want to tweak when the deployments happen:

111

And, in the Process tab, make sure you point the Project it to the right .sln to build as well as the Configuration you want aka Release | Any CPU.

And, in the deployment settings, make sure that the the Path To Deployment Settings points to your .pubxml file and the Windows Azure Deployment Environment points to the name of the Cloud Service in Windows Azure. 

4

With that all set, you can now build and deploy using server builds!

The New Iteration

April 7 2014

cover Having just gotten back from Build 2014, I felt inspired by conversations and sessions that were all about XAML and developer/designer workflow, and I started thinking about a paper Jaime Rodriguez and I wrote six years ago called The New Iteration: How XAML Transforms The Collaboration Between Designers and Developers. I went to go re-read it and, ack, I got a 404!

Well, that’s not okay, so here it is in all its glory. Still a lot of great content in there methinks.

Instagram Security Check Error

March 12 2014

If you are trying to validate your cellphone with a security code with Instagram and it keeps erroring out, here’s the fix: you need to enter your mobile number as an international number. So, if you are in the US, it would look like +1614-985-4045 for example. Once entered in that manner, everything worked!

Bruce Sterling's "Black Swan"

December 31 2013

What is the black swan? Check out the short story "Black Swan" by the venerable Bruce Sterling from his collection Gothic High Tech.

Yet the news never shouts out that history has black swans. The news never tells us that our universe is contingent, that our fate hinges on changes too huge for us to comprehend, or too small for us to see. We can never accept the black swan's arbitrary carelessness. So our news is never about how the news can make no sense to human beings. Our news is always about how well we understand.

Whenever our wits are shattered by the impossible, we swiftly knit the world back together again, so that our wits can return to us. We pretend that we've lost nothing, not one single illusion. Especially, certainly, we never lose our minds. No matter how strange the news is, we're always sane and sensible. That is what we tell each other.

..."You've got a look on your face right now like a drowned fish."

 

 

Adding A Custom Header When Posting JSON Using HttpClient

November 8 2013

UPDATE: See the comments for a better way to do this!


HttpClient comes with handy methods for the very common task of posting JSON to a web service using the various PostAsJsonAsync methods. It handles serializing your object and crafting up the http request for you, aka

var gizmo = new Product() { Name = "Gizmo", Price = 100, Category = "Widget" };
Uri gizmoUri = null;
            
response = client.PostAsJsonAsync("api/products", gizmo).Result;
if (response.IsSuccessStatusCode)
{
    gizmoUri = response.Headers.Location;
}
else
{
    Console.WriteLine("{0} ({1})", (int)response.StatusCode, response.ReasonPhrase);
}

However, what happens if you need to add a custom header to your request? The only way to add a custom header is to craft up an HttpRequestMessage. And the nifty PostAsJsonAsync won’t take a HttpRequestMessage as a parameter; you have to use the SendAsync method.  No sweat; you have to write a little more code but it is no big deal. Here’s what it looks like:

var gizmo = new Product() { Name = "Gizmo", Price = 100, Category = "Widget" }; 
Uri gizmoUri = null;
MediaTypeFormatter jsonFormatter = new JsonMediaTypeFormatter(); HttpContent content = new ObjectContent<Product>(gizmo , jsonFormatter); var request = new HttpRequestMessage() { RequestUri = new Uri("api/products"), Method = HttpMethod.Post, Content = content }; request.Headers.Add("My-Special-Header", "xx-oo-xx-oo"); var response = _httpClient.SendAsync(request).Result;