Why the Hadoop distribution vendor bubble may just be set to burst
The Hadoop big data bubble may be set to expand in 2015 - but prospects are limited for Hadoop distributors, says SysMech's Andy Stubley
Last month's Hortonworks' IPO was very successful, and on the back of a very solid 2014 MapR looks similarly positioned to be on track for an IPO, either this year or next. However, we're going to see limitations with these Hadoop distribution vendors. Hortonworks and MapR aren't Twitter or Facebook - there's no application.
Open source has to ultimately have some facet of a commercial offering in its roadmap. It's the difference between free and a penny: you need a business case if you're going to charge a penny.
When I suggested in last month's Computing feature that Hadoop distribution vendors will find themselves limited by their offerings because there was no future in solutions that can't provide real value, Herb Cunitz, president of Hortonworks, commented that I should look at the large users of Hadoop such as eBay, Twitter, Facebook and Yahoo.
Firstly, the article was about Hadoop distributors and their scale out, not large users of Hadoop. And, secondly, while I do understand these companies have built their infrastructures using Hadoop to store and process much of their customers' data; they've also been the organisations who've actually developed much of the Hadoop technology themselves, for their own specific needs - i.e. they're the Web 2.0 Monsters with millions of online customers and petabytes of web data.
Hadoop is synonymous with big data - everyone can use it and for free - but there's a major technical delivery project required to actually apply it to any real technical innovation. For a start, where are the real-time analytics? And what's big data without analytics and timely information? Most organisations require current information and quick responses.
Hadoop is certainly useful big data storage; but it isn't a big data application, and it can't address any low latency needs. There's a limited future for something with a small range of extreme use cases, and something that can't provide real value to the larger community. What I'd like to see are examples of commercial customers, and specifically, applications running on Hadoop that represent the use cases of typical companies and systems.
There are many successful implementations of Hadoop; of this there is no question. But market analysis will tell you that both Facebook and Twitter use HP/Vertica; they don't run in a Hadoop only world. Successful implementations with real-time data requirements, or timely report and query responses, only live in co-existence with high-volume and high speed commercial applications. These types of systems capture data in real-time; they're processing tens of billions of counters per day, tens of terabytes of data and provide complex analytic capabilities to hundreds of users. That's real-time complex analytics in action, a big data application, and a real use case that Hadoop distribution players can't reflect.
Today, we're generating ever-increasing volumes of data - that's clear. For example, in the telco world the more customer experience centric we become the more touch-points we need to take data from devices, every network element, customer data and social networks to create a holistic view. An end-to-end operational platform needs to be able to support the network, service and the customer; so realistically you need a hybrid approach towards data management, in order to combine all sources. Sure, a commercial data management application may not operate at petabyte volume as yet - it wouldn't be cost effective currently - but then it doesn't need to! Horses for courses then, or a hybrid solution when you need both; we're a long way from Hadoop being the panacea.
Ultimately, any system is only useful if it addresses a business problem and runs a business process. There are two ways of getting to that place: you either develop your own application or you buy one. ‘Make or buy' is hardly new but in the Hadoop world, applications are thin on the ground and application vendors aren't beating a path to Hadoop distributors. Remember, it was applications and ISVs running its database that made Oracle the company it is; not the fact that it had a great database.
One closing thought. Structured or unstructured? Can you make a decision on unstructured data..? For me there are plenty of reasons why Hadoop distribution vendors may not have the future they seem to make out.
Andy Stubley is head of strategy at big data application provider SysMech
> Don't miss Computing's Big Data Summit on 26th March 2015 in London. Click here to see the agenda, and register. Free for end users!