Serverless Big Data

It’s been a long time since I’ve blogged here, but figured it’s overdue. I’ve been busy in the Azure world and am really excited to see how messaging has really started to shape the modern cloud landscape. Creating and shipping Azure Event Grid was perhaps the highlight of my career thus far, certainly at Microsoft. I’ll write a long over due blog about that shortly.

Seeing Azure Event Hubs grow to 2 trillion requests per day in the time I’ve been with it has also been a great experience. In this time not only has messaging become core to the cloud in general and Serverless in particular, but new patterns are starting to emerge that are really exciting. One of the most exciting this month has been seeing the traction our quietly released new Apache Kafka endpoint for Event Hubs is gaining and the new directions it is driving. The Kafka endpoint feature is available in some regions of Azure today and you can give it a try following this quick start. This allows you to use Kafka producers and consumers to read and write from an Event Hub.

By combining this with Azure Functions Event Hubs Binding you can create a Serverless Kafka processor in just a few minutes. The blog Processing 100,000 Events Per Second on Azure Functions shows you how easily you can scale this Serverless processing to pretty high scale very easily. This means you can really start to create truly Serverless Big Data solutions on Azure today. This is an exciting time and we will see more development in this space as users of these platforms drive innovation on the platforms which we and other cloud providers are starting to provide.

Give this stuff a try and tell me what you think!


Welcome to the Messaging Party Google

This week Google announced the public release of Cloud Pub/Sub and we on the Azure Service Bus team would like to welcome them to the cloud messaging space. This is a big and growing market with a diverse set of competitors, technologies, and strategies. We feel that Google’s decision to enter this space validates our investments in Service Bus Messaging (Queues and Topics) and reflects the growing realization in the industry that messaging is a critical component of scalable applications and a vital part of any cloud architecture and of any cloud platform.

While it may at first appear that Google Cloud Pub/Sub and Service Bus Messaging are directly competing with each other the services are quite different and each has its own strengths. More importantly the real competition for both our services, and the other players in the space, is not each other, but direct application integration.

There has always been a tendency to directly wire applications to each other in a piecemeal organic fashion that results in brittle tightly coupled software. This tendency predates cloud computing and even network computing. Experienced architects know the problems that arise from this design – and know to avoid it. The cloud amplifies these problems. Hopefully Google will now impress on another group of engineers and architects the importance of well-established architectural principles of loose coupling and separation of concerns. Principles that have always been at the center of  messaging architecture and guiding points for the Azure Service Bus.

Azure – The Operating System for the 21st Century

Now that I truly live Cloud computing every day as part of the Microsoft Azure product team I thought I’d share a few reflections about the evolution of Cloud computing over the past few years and how I think we’ve really crossed a threshold with the technology of cloud computing.

I recently commented that the cloud really is the operating system of the 21st century and I genuinely mean that.  Here’s why.  When you look at an Operating System (think back to your Concepts of Operating Systems class if you took one) what you’re talking about is a piece of software that manages the hardware of a machine.  It’s job is to enable us to use the machine to do our bidding.  This ranges from basic features such as facilitating I/O, storage, and computational capabilities to more complex tasks such as networking, multitasking, and job scheduling.

Over time operating systems evolved to be very rich environments that we know today.  Looking through the current Azure feature set it quickly becomes apparent that Azure really has matured into a true Cloud OS – the Operating System of the 21st century.  Storage and compute are some of the oldest services and also mimic the evolution of operating systems – think way back – when the von Neumann architecture was a cutting edge concept.  Maybe even in the OS/360 timeframe.  Personal computers followed a similar path: from my Apple IIe which was really just storage, compute, and I/O to current operating systems that are truly rich experiences.  The cloud is on the same path – and Azure has progressed in a very short time from the cloud equivalent of DOS to a rich computing experience like nothing the world has ever known before.  This includes many concepts we would recall from Operating Systems: a job scheduler, compute, storage, I/O, and a powerful communications bus (yes, Service Bus).  The most striking part is that this really isn’t a Windows OS – it is an OS unto itself that is based very much on open protocols and can be leveraged by any client, or even server, OS.

It was a big risk for Microsoft to invest so heavily on the cloud – I appreciate that more being here and seeing how all in the company is.  At first I wasn’t sure if this was really sure about this, but viewed in the context of the cloud being an Operating System for the future – it makes perfect sense.

The Time Value of Data

I am doing more work than ever with the Internet of Things these days and I’ve wanted to write on this topic for some time. A larger article is in the works for publication, but I’ll give the high level here. Over the last few years my work with Smart Grid in particular and Big Data in general has made me acutely aware of a concept I have started calling the Time Value of Data. This was inspired by my interest in economics and draws its inspiration from the Time Value of Money which dates back nearly 500 years and to a city in Spain that I have always enjoyed visiting.

The theory behind the time value of money is quite straightforward: money today has a future value that is different from the current value. That is capital has a value that changes over time: in a “normal” environment this means some amount of money today is worth that amount plus some more in the future. This is actually a rather complex topic, but plenty has been written about it.

What I want to focus on here is the value of data over time. Data generally has a unique value curve that is different from most other commodities – and yes, data is a commodity (or at least is becoming one). When we think about the Internet of Things in particular – devices, appliances, sensors, and telemetry – it becomes quite apparent that some of this data is going to have high immediate value. A fire alarm is a great example. Knowing about a fire is extremely valuable as it starts. This may allow for safe evacuation or event containment. As time passes the value of this information drops. Do I really care that my building had a fire several hours or days ago? Many of the sensors in use today are focused on this immediate value area.

There is also a secondary data story that is historical or collective data. This is where you can save data in a raw form long enough to gain value from it. Good examples of this are climate data, defect rates, energy usage. As more of this data is collected over longer periods of time the value of it increases dramatically. Although the individual data points may not be as valuable, collectively the data set becomes even more valuable. This is depicted in the chart below (I said this was a rough draft).


As I mentioned, this is an idea I am still formalizing and will have an article about soon – so I invite any comment or contributions on this. Perhaps this is more of a U than a V shaped curve or maybe the right side doesn’t rise as high, but the concept is fairly robust when examining use cases.

More details on this and the implications will follow.

4 Reasons the “Smart” Grid is Dumb

Disclaimer – These are my opinions and mine alone. They do not represent the views of my employer or any organization I am a part of.

I work heavily with “Smart” technologies: in the energy and utilities sector, in the manufacturing sector, and in telemetry covering retail and a few other areas. Over the last few years my work in Smart Grid has been fairly extensive. If you don’t know already “Smart Grid” basically means advanced telemetry built into every segment of the energy grid (though often smart meter and smart grid are actually different things; to most people smart meter and smart grid are the same). In my time implementing and consulting in this area I’ve come to see that there are a few really dumb things about the smart grid.

1) Lack of true standards at almost every level – Technology standards are what make the world interoperable. Ever send a text message or use the Internet or make a call from your mobile? That’s because standards allow devices and equipment from many vendors to work together – in two of those examples those standards are from the GSM Association. It is this interoperability that provides long term viability for the overall market: for the vendors, for the providers, and users. There are very few standards in the Smart Grid arena and there is almost no equipment interoperability. The really bad part here is that Smart Meters aren’t that different from the rest of the Internet of Things (IoT) and should be sharing standards with other parts of the IoT ecosystem.

2) No cloud first implementation strategy – None of the major vendors in the area are pursuing a cloud first strategy. From a technology standpoint most of this twenty first century infrastructure is being solved with late twentieth century architecture. There is a lot of expensive on premise technology that would feel right at home in the late 1990s. Cloud is important for valid reasons on both ends of the utility spectrum: small and large. Small utilities require a cost effective solution to implement this technology and realize the benefits. They cannot afford expensive highly available platforms and their small load factors don’t require it, yet the industry at large only offers them expensive on premise solutions that are overkill for most. Large utilities face another problem that a cloud first strategy would solve: scale. A large utility is going to have millions of meters and they will be providing telemetry at timeframes as short as 15 minutes. This is going to create a lot of data. Let’s look at an example:
5 million meters x 96 readings per day (i.e. 4×24) = 480,000,000 readings

This is just meters! Telemetry on the distribution side could actually be even larger as the readings are likely to be more frequent. The result: Some seriously Big Data (another blog on that shortly). This load from the meters alone would break down to 5555 readings per second on average 24 hours a day, 7 days a week. Although that number is not that big, these events are likely to come in huge bursts. The software and platforms being selected to handle this load are not up to the task on either the messaging (delivery) or data (processing / storage) sides of this challenge. Many vendors and their relational / legacy data platforms think this will scale just fine – throw more hardware at it. It also allows them to sell more licenses and hardware. Unfortunately it just won’t work.

3) Lack of publish subscribe architectures – Building on issue 2 there is the very serious and technical aspect of architecture to be addressed. To be sure we’re early in this Smart Grid game, but most of the solutions so far are trying to use web services at best and sometimes just batch processing to handle this data. This is a true travesty that I think may be the result of some insular group think. Even when web services are used they often don’t incorporate WS-* standards and almost always rely on polling, which also doesn’t scale. The environment that is ultimately developed ends up being an archipelago of services and data that do not build broad scale extensibility into their design. Most of these architectures end up causing load and scale problems so the vendors and users end up falling back on batch processing. This greatly diminishes the value of Smart data as it arrives with a great delay that stops it from being used for real time processing scenarios – which promise to provide the greatest innovation in the arena. Ultimately Smart systems need true publish subscribe capabilities built into their core to provide scale and extensibility. This is the only way to facilitate the development and addition of new components and capabilities without reengineering an expensive and possibly brittle implementation. But what sort of features and capabilities would require this architecture? Glad you asked! Perhaps things like real time analytics to provide predictive failure, demand shifts, weather patterns. Like the Internet, it is not so much what we have thought of that will make Smart systems so successful, but what we will think of once a solid platform is in place. Publish subscribe is the key to extending these platforms to unlock their true value in the future – ideally with standardized protocols that create an open ecosystem.

4) Heavy vendor lock-in – This last point is really a culmination of all the others. Vendors produce their own parts of this Smart ecosystem with little thought of the larger environment and with a desire to protect revenue with a relatively short focus. This manifests itself in single vendor meter networks, closed platforms, and limited extensibility. I know we’re all in business to make money, but if the ecosystem isn’t healthy and providing choice and competition then this money will be short lived for “Smart” as much of the value will be difficult to realize and innovation will be slowed. This is still early in the technology, so I think this will change as the industry matures and vendors realize that they can all have slices of a bigger pie if they embrace interoperability.

The good news is that there is hope. We are very early in the creation of the Smart ecosystem and some participants are starting to take notice, much like how mobile operators did in the past. Standards like AMQP are providing wire level interoperability for a publish subscribe architecture that is vendor agnostic and free to use. Some utilities are starting to demand support for robust open protocols. I have particularly seen this in European utilities where I believe there may be more historic precedent for interoperability. Some members of this community are starting to look beyond the Utilities sector for inspiration and advice from other industries that have faced these exact challenges in the past like telecommunications, financial services, and banking. All of these thing bode well and if embraced will stop making the Smart Grid so dumb. It will be interesting to see.

Apache Storm on Windows

In a release in February Apache Storm community added Windows platform support for Storm 0.9.1

I for one have been very excited to see this.  The Hortonworks distribution of Hadoop (HDP) is the only one that runs on both Windows and Linux and this gives a lot more choice to traditional enterprise clients.  I’ve been working with HDP for about a year and a half now and really like the experience – both on Linux and Windows. 

Storm is a very exciting development in real time data processing using a Hadoop cluster.  This is useful for running models that you’ve created by more traditional batch processing and map reduce within Hadoop.  Storm uses a simple spout and bolt topology for processing tuples of information at scale and in real time.  More information can be found at the storm site:

I am now wondering if this technology, now running on Windows, will make it into the Windows Azure HDInsight service.  I certainly don’t have any inside information on this, but I’d be interested to see it. 

Wayfinding, Simplicity, and Design

Looking back on the last few years and the amount of travel I’ve done I’ve realized that the art and science of Wayfinding is an excellent tool for user experience testing and specifically for testing devices or apps.  According to Wikipedia: “Wayfinding encompasses all of the ways in which people and animals orient themselves in physical space and navigate from place to place”.

I’ve begun testing this theory out after long haul flights.  I have found that this is a peculiar time in human consciousness when your normal abilities of reason and logic are deeply impaired.  When flying long haul everyone experiences a certain amount of discomfort even when travelling in style.  It could be the dry recycled air or the small and highly used lavatories, or the lack of space in the back of the plane, or even the abundance of libations in the front.  After an epic journey (especially transpacific) everyone is out of sorts.  Yet we all find our way through customs and to the train or taxi that we’re looking for.  I recently pulled a 28 hour 11 time zone journey that involved four airports, three flights, and two sets of immigration.  At the end I found my rental car shuttle (yes, I am an American, I rent cars), found my car, and then found my way to the hotel.  Believe me none of this is due to any special abilities I have in navigation or even common sense – it is completely due to the wayfinding design principles that have been used throughout the world to show us where to go.  This idea first came to me after reading one of Garr Reynolds books.  I thought his presentation of this was brilliant.  This is design that must work, for a large variety and number of people.

This is what has lead me to testing my new apps and devices in this state of mind.  Case in point I learned on this particular journey they my non-model specific mobile phone windshield mount has a terrible design flaw with my Nokia Lumia 1020 – or for that matter any Windows Phone: the camera button is in the area where the side clamps hold the phone in place.  Result: I’m looking at a live (and small) image of the nighttime road ahead of me instead of my Nokia Drive app.  Fortunately getting back to an app on Windows Phone is easy – even after a 28 hour trip (there’s some good design).

Now whenever I build an app – or my team does – I always try to get that same level of detachment when I review it.  I’ve even begun to extend this to mock ups, concepts, and presentations.  Sometimes I learn where a user flow is confusing or the next step in unclear.  Since I started writing this I traversed the Atlantic – twice – after the first flight I learned that my presentation on Real World Business Activity Monitoring for BizTalk Summit 2014 had a rather strange sequence in it that didn’t flow as well in this reduced functionality state.  I rearranged some content and dropped some that didn’t fit as well, then it seemed strong.  The crowd seems to have agreed thankfully!

I suppose this last part of Wayfinding is sort of the key to it all: remove that which is not completely necessary to convey the message / information.  Anything else is waste or distraction.  Next time you travel anywhere check out the signage and notice how relatively easy it is to navigate.  This is a good inspiration.  When searching for simplicity use that long day or that sleepless night to your advantage to review something you’ve been thinking about too much, this will give you a different perspective on the topic.