The Time Value of Data

I am doing more work than ever with the Internet of Things these days and I’ve wanted to write on this topic for some time. A larger article is in the works for publication, but I’ll give the high level here. Over the last few years my work with Smart Grid in particular and Big Data in general has made me acutely aware of a concept I have started calling the Time Value of Data. This was inspired by my interest in economics and draws its inspiration from the Time Value of Money which dates back nearly 500 years and to a city in Spain that I have always enjoyed visiting.

The theory behind the time value of money is quite straightforward: money today has a future value that is different from the current value. That is capital has a value that changes over time: in a “normal” environment this means some amount of money today is worth that amount plus some more in the future. This is actually a rather complex topic, but plenty has been written about it.

What I want to focus on here is the value of data over time. Data generally has a unique value curve that is different from most other commodities – and yes, data is a commodity (or at least is becoming one). When we think about the Internet of Things in particular – devices, appliances, sensors, and telemetry – it becomes quite apparent that some of this data is going to have high immediate value. A fire alarm is a great example. Knowing about a fire is extremely valuable as it starts. This may allow for safe evacuation or event containment. As time passes the value of this information drops. Do I really care that my building had a fire several hours or days ago? Many of the sensors in use today are focused on this immediate value area.

There is also a secondary data story that is historical or collective data. This is where you can save data in a raw form long enough to gain value from it. Good examples of this are climate data, defect rates, energy usage. As more of this data is collected over longer periods of time the value of it increases dramatically. Although the individual data points may not be as valuable, collectively the data set becomes even more valuable. This is depicted in the chart below (I said this was a rough draft).

TimeValueOfData

As I mentioned, this is an idea I am still formalizing and will have an article about soon – so I invite any comment or contributions on this. Perhaps this is more of a U than a V shaped curve or maybe the right side doesn’t rise as high, but the concept is fairly robust when examining use cases.

More details on this and the implications will follow.

Advertisements

Apache Storm on Windows

In a release in February Apache Storm community added Windows platform support for Storm 0.9.1

I for one have been very excited to see this.  The Hortonworks distribution of Hadoop (HDP) is the only one that runs on both Windows and Linux and this gives a lot more choice to traditional enterprise clients.  I’ve been working with HDP for about a year and a half now and really like the experience – both on Linux and Windows. 

Storm is a very exciting development in real time data processing using a Hadoop cluster.  This is useful for running models that you’ve created by more traditional batch processing and map reduce within Hadoop.  Storm uses a simple spout and bolt topology for processing tuples of information at scale and in real time.  More information can be found at the storm site: http://storm.incubator.apache.org/

I am now wondering if this technology, now running on Windows, will make it into the Windows Azure HDInsight service.  I certainly don’t have any inside information on this, but I’d be interested to see it.