Why the Hadoop big data bubble will continue to expand in 2015

After HortonWorks' successful IPO, MapR and Cloudera are out to prove that they too can flourish on the public stage

If 2014 was the year in which large enterprises started taking Hadoop seriously enough to adopt it, then 2015 is the year in which the three main distributors - Hortonworks, Cloudera and MapR - will have to meet end user expectations and live up to the hype. And if $50bn of the big data market in 2020 will be driven by Hadoop (according to IDC), there's an awful lot of hype to live up to.

Although all three vendors are vying for the same customers, Herb Cunitz, president of Hortonworks, claimed last year that Intel's $740m investment in competitor Cloudera was positive and a "great validation of the space".

Perhaps Cunitz knew the deal would help whet investors' appetites for Hortonworks' IPO - which debuted in December 2014. The firm raised a staggering $100m and its shares rose 65 per cent to $26.38 on its first full day of trading. At the time of writing, prices had dropped to $22.95, but that's still well up on the initial $16.00.

However, while its IPO looks to have been successful, Matt Aslett, research director at 451 Research, suggested that some of the information that Hortonworks had to make public has highlighted the challenges faced by vendors in this early stage of Hadoop adoption.

"It revealed the extent to which the company is losing money on delivering the professional services to support early adopters. The company's stated goal is to drive greater revenue from subscription services and we expect to see this balance change over time," he said - although Cunitz himself suggested that this shift of balance wasn't yet on the company's radar.

Hortonworks' competitors MapR and Cloudera are also looking to announce IPOs in the near future.

"We expect to be ready to go public in late 2015," MapR CEO John Schroeder told Computing.

In comparison to Hortonworks, MapR's strategy is more reliant on proprietary software and less dependent on professional services. Schroeder will be keen to showcase this different approach when his company goes public, but he emphasised that an IPO is not an end goal for the company, rather it is a platform for growth.

And finally, Mike Olson, co-founder and former CEO of Cloudera, said that his firm "absolutely intends to have an IPO", but claimed that there is no date earmarked for that to happen.

"We want to be able to assure our potential public investors that we have good visibility into the future pipeline so that we can accurately forecast for a number of quarters. It's a challenge for most young companies to do that," he told Computing.

But he claimed that Cloudera is not under the sort of pressures that force many companies to go public, because of the money injected by Intel.

"We had [the fundraising] last year through our strategic partnership [with Intel], and we had a much larger fundraise than you see in most other IPOs - Hortonworks included," he stated.

For Olson, capital and the insulation of the private market are huge pluses.

"We've got the ability to invest in areas that are strategically important for the long term and that's a precious advantage in the market. But the key difference is who delivers the best platform for enterprise use cases and we're investing in the capabilities that large enterprises require," he said.

Likewise, MapR's Schroeder said that capital is not the be-all and end-all, and not a factor that is moving his firm towards an IPO.

"MapR has always been, and is, well capitalised," he said.

He said that the benefits of an IPO include liquidity for investors and employees but added that at the same time lock-up periods and trading volumes generally limited liquidity.

According to Cunitz, MapR and Cloudera should be thankful to HortonWorks, because its IPO has been a benefit to the whole market as it has added transparency.

"Our growth rate, our financials are out there, so any questions or fear that any other companies put out there is laid out to rest," he said.

"Perhaps there is value in going first - I would like any competitor to go side by side with us and compare them," he added.

The transparency forced on HortonWorks was certainly appreciated by Cloudera, with Olsen saying his firm now has a better understanding of the market. HortonWorks' entrance into the market reinforced the fact that public market investors are keen on big data properties, he said, claiming, (as all vendors do) that his company remains number one.

So why, then, does the "number one company" have fewer customers than one of its competitors?

A recent Fortune article stated that Cloudera had 300 paying customers. While Schroeder suggested that MapR has more than 700.

"To our knowledge we have the largest deployments in both financial services and retail and we have a good footprint in public cloud as well," Schroeder said.

HortonWorks claims to have had 233 customers at the end of Q3, and Cunitz said the figure would be updated on February 24 when the firm's Q4 reports are released.

But Cloudera's Olson said the Cloudera figures quoted should not be taken at face value - although he declined to disclose the firm's customer count.

"All vendors describe their customers differently; we mean currently active, subscription customers to our software offering. We don't count our services or trading customers, and as we do business with larger enterprises, we may have deals in place with different regional offices or different departments for a large multinational and we generally count those as a single customer.

"I don't think you can compare apples to apples when you see some of our competitors' figures," he stated, going on to claim that Intel's global presence has helped Cloudera attract a larger number of new customers.

Bursting the bubble

Andy Stubley, VP of marketing at big data analytics firm SysMech, claimed that Hadoop distribution vendors will find themselves limited by their offerings, claiming that there was no future for a tool that "can't provide real value".

But Cunitz suggested that Stubley should look at the large users of Hadoop such as eBay, Twitter, Facebook and Yahoo.

"They've built their entire infrastructure around Hadoop to store and process all of their data, while other early adopters in financial services and retail are now going from tens of nodes to thousands of nodes," he said.

Indeed, MapR, Hortonworks and Cloudera are all seeing conversations with their customers becoming more mature. Customers know more about Hadoop, and are now asking the vendors to meet their own specific requirements.

451 Research's Aslett said that users have moved from pilot projects to larger-scale deployments driven by strategic imperatives to expand data storage, processing and analytics. In turn, this is forcing Hadoop vendors to jump through more hoops as they seek to attain strategic supplier status.

Hortonworks' Cunitz concurred: "Customers are asking for advances in security, making it easier to manage the platform, the governance around that, and the ease of use around deployment and scale," he said.

The big data Nirvana

According to Quocirca analyst Clive Longbottom, the IPOs are "of little significance to the end user" at the moment. However, he believes they will provide MapR and Hortonworks with the investment money required to drive toward a "Nirvanic goal of the single data repository". Cloudera already has the funds required, he said.

However, all three firms deny that a single data repository is their main goal.

"Our solution is not going to replace everything ... some workloads will eventually go into Hadoop but it's not going to replace everything that is out there. It's about unifying and interacting in many different ways," Cunitz said.

Despite their claims to the contrary, Longbottom believes they wish to eradicate the need for data processing and storage solutions - such as data warehouses - but he said that there are several key challenges that stand their way. These include technical challenges such as whether performance be optimised to a point outside of high-performance data sets, and whether Hadoop-based storage can be "good enough" for all other data sets.

They will also face challenges from the incumbents. The likes of Oracle, IBM and Microsoft will not be happy to see their lucrative database markets dependent on an open-source solution.

"It has happened to a greater extent with Linux, but Hadoop is a different beast," Longbottom suggested.

Intriguingly, Cunitz put forward the idea that companies on the sidelines of the Hadoop phenomenon could have the biggest effect on the trajectory of the market in 2015.

"What we'll see this year is a number of the larger players who have been on the sidelines [get involved]. Last year you saw Intel put their money in Cloudera, and you saw others who came with HortonWorks on Apache Hadoop.

"This year you'll see a lot of the larger players put their vote in and determine how we should build this market," he added.

However, while these peripheral players could make things interesting, it is public investors that the three firms are attempting to entice first.

MapR's Schroeder alluded to ServiceNow's increasing market capitalisation since its IPO two-and-a-half years ago and said that "great companies dramatically increase in value in the decade following an IPO". The question remains as to which one of the three companies - if any - will be considered great.

Whatever happens, the Hadoop bubble has some growing to do yet before there's any chance of it bursting.

> Don't miss Computing's Big Data Summit on 26th March 2015 in London. Click here to see the agenda, and register. Free for end users!