SAN DIEGO: Darren Bruntz, senior director of e-commerce at eBay, has indicated that the online marketplace might be more inclined to increase its use of Hadoop as a data analytics tool if its open-source community had more "focus".
Bruntz was speaking at the Teradata conference in San Diego this week.
Hadoop is an open-source tool that uses a process called parallel programming to help understand petabytes of data that were previously unstructured and too large to do anything with.
Parallel programming allows analytics to be run on hundreds of servers, with lots of disk drives, all at the same time.
Hadoop stores this data in a file system called HDFS (Hadoop distributed file system), in effect a flat file system that can spread data across multiple disk drives and servers.
eBay currently has three platforms for its data warehousing and analytics, one of which is Hadoop.
"We have a Teradata enterprise-class, data-warehousing system, which is actually two systems run virtually as one, that manages high-concurrency workloads. It has between five and 10 petabytes of storage in it," said Bruntz.
"Then we have a separate Teradata system that is really about deep storage. That is where we put all our behavioural data, our high-volume data, clickstream and event data. We also keep all our traditional warehousing data in there," he added.
"The third system is built on Hadoop. We target that for high CPU workload, but it is not where we would do relational or structured work. However, it is great for image processing and modelling, which is something that obviously requires a lot of CPU."
Despite its benefits for processing memory-intensive data, Bruntz describes Hadoop as a "programmatic" system that requires a lot of coding to create a system that has the tools you require.
It is widely agreed in the industry that Hadoop is an extremely complex system to master and requires intensive developer skills. There is also a lack of an effective ecosystem and standards around the open-source offering.
"I think we will stay on our setup of the three platforms for a few more years, but Hadoop could be a more compelling offering if the open source community and its contributors got some more focus and energy, as you would have a whole community of people working on new tools and features," said Bruntz.
"We are not really biased towards a particular technology – we look at the value we are getting from that technology. We look at all the different dimensions of service: are we working with a partner that can meet our needs in an aggressive way?" he added.
"So, it's not just the technology, but it's the ecosystem that goes around that. In future we could perhaps move to a single platform, but I don't see there being a single compelling technology for several years."
Have your say on this article
Newsletters
Latest stories from Applications
Applications jobs
Technology Patent Wars
Case studies from large organisations across all sectors
... And rich media, and flexible working, and peaks in traffic ...
Upcoming Events
Join us for this Computing web seminar, in which the Head of BI at the Co-operative Group Nick Colebourn will be explaining just how he reigned in the Group’s sprawling database estate and how significant savings were realised and data quality improved as a result.
Date: 31 May 2012
Time: 11:00 AM
Live June 13th 11:00am: Register now. During this web seminar we will be looking at the sorts of incidents that can bring data centres grinding to a halt and what can be done about them.
Date: 13 Jun 2012
Time: 11:00 am
Receive the latest jobs direct to your inbox
Are you being paid what you are worth?