Big Data Summit 2014: Intel to focus Hadoop efforts on Cloudera after buying major share stake

Intel's big data initiatives to remain open source while the company focuses on Cloudera's distribution of Hadoop

Chip giant Intel has taken a major share stake in Hadoop distributor Cloudera for an undisclosed sum, and taken a seat on the company's board too, as big data continues to spread in popularity.

At the same time, Intel will focus its big data initiatives on Cloudera's distribution of Hadoop in preference to its own, and will help optimise Cloudera's distribution for Intel's chip architectures.

While the size of the investment was undisclosed, Intel did reveal that it will become the company's largest shareholder.

The "strategic partnership" announcement was made at 9am, Pacific standard time today - 4pm in London.

"Going forward we will be promoting Cloudera as our primary distribution of Hadoop. They will be optimising it for our Intel architecture and our engineers will stay and still contribute to the open source project - that's very important for us," Richard Pilling, EMEA chief architect of big data and analytics at Intel told Computing at the third Computing Big Data Summit in London today.

"We had our own distribution, but we wanted to gain marketing traction quicker than we were able to organically," added Pilling. "Cloudera is the market leader with one of the best distributions out there. And we can work with them very easily. The two businesses align very well in terms of what we are trying to do: enterprise-ready adoption on a mass scale."

The differences between different Hadoop distributions are typically small - most are based around the Apache open source core (the trunk) - and every company will use slightly different versions of the packages, putting their own enhancements on them. Hence, the differences between Hortonworks and Cloudera, for example, aren't great.

"We want to accelerate the adoption of these technologies so working with the market leader was the best way to achieve that," said Pilling.

What Intel brings to the market is the Xeon Phi microprocessors, a many-core system that that runs in a similar manner to a graphics processing unit (GPU).

"They are slower than a Xeon core but there are many more of them. So if you have highly parallel code, for example HPC [high-performance compute] applications, you can use them.

"The big advantage of using Xeon Phi is that you can run standard x86 instructions on them so you don't have to re-program all of your code and the compilers will automatically place the code on the correct part - single-threaded stuff on the Xeon and multi-threaded stuff on the Xeon Phi," said Pilling.

Intel has led a number of open source initiatives around big data, which will continue. These include the acceleration of encryption and decryption, and compression and decompression, on the hardware, which are in the Intel distribution and will be pushed through into the Cloudera distribution.

Intel has also led the open source development of a security framework. "So all of the enhancements that we have brought to our platform we will bring to Cloudera," said Pilling.

"But the enhancements we've made in open source will remain in open source, that's a very important commitment from Intel as it helps to drive adoption.

"When we develop them and push them into the open source 'trunk' they get picked up by the main Apache trunk. The majority of our projects [in big data] are open source, including Rhino, our security framework; Ladon is our disaster recovery for HFDS [Hadoop distributed file system]; and Griffin, which is our SQL accelerator on top of Hadoop," said Pilling.

Intel is active in the big data market as it sees big data as a major driver of growth for its microprocessors, solid-state disk systems, and fast-networking products.