Swedish online music streaming service Spotify is to work with Hadoop distributor Hortonworks as it looks to further personalise its services and improve its analytics.
An early adopter of the Apache Hadoop data storage and processing platform, Spotify began five years ago with a 30-node cluster. However, the ability to scale up rapidly was always part of the firm's plans, since its business model is based on the ability to provide tens of millions of users with music streams and record companies with advanced analytics about who is listening to what.
Now the owner of what is one of Europe's largest commercial Hadoop clusters at 690 nodes, Spotify has reached the stage where it requires a partnership to progress further.
"We were looking for a true partner relationship and the team at Hortonworks are committed to enabling the overall ecosystem - including the vendors we rely on - to leverage Hadoop," said said Wouter de Bie, team lead for data infrastructure at Spotify in a statement.
"Their true open-source approach and the work they have done to improve the Apache Hive data warehouse system also aligns well with our needs, as we use Hive extensively for ad-hoc queries and for the analysis of large data sets."
Hortonworks president Herb Cunitz explained to Computing how the company has been working with others in the open-source community to improve Hive and other elements of the Hadoop ecosystem.
"With Hive 11 we have improved the performance by 30 to 45 times" he said. "Then we have ORC compression and Tez, which takes the serial implementation of MapReduce and allows you to parallellise it. Combined, we're expecting an improvement of 100 times, so you're getting close to realtime."
At Spotify, the Hortonworks Data Platform (HDP), which is 100 per cent open source, has now replaced the rival Cloudera Distribution of Hadoop (CDH), which has proprietary elements.
"Spotify wanted to find a partner who could take their enterprise requirements into the open-source community," Cunitz said. "The strategy of what we're doing with Hadoop 2.0 tied in very closely with where they wanted to take it. They wanted to do this with the core of Hadoop rather than with add-ons to Hadoop."
The partnership will see Hortonwork's 88 open-source committers working on issues that Spotify throws at them, speeding up the development process and feeding the improvements back into Hadoop. Spotify will also be able to rely on the firm's abilities to support huge clusters from its experience with operating with Yahoo's 40,000 nodes and similar sized clusters at Ebay.
"It's a deep engineering relationship," said Cunitz.
The Spotify deal marks part of a move into Europe from its US base that Hortonworks began this year. European firms now make up 10 per cent of its customer base. Other recent partnerships include Xing, a LinkedIn equivalent for German speakers, and SAP, which will now be able to resell HDP alongside Intel's Hadoop distribution alongside its HANA in-memory database.
HDP 2.0, due out in general availability later this quarter, is based on the latest developments in the Hadoop ecosystem including YARN, which improves the flexibility of MapReduce. Cunitz said that things are moving fast.
"We often joke that a Hadoop year is a dog year," he said.