How eBay takes a tiered approach to big data - pulling in thousands of users
Teradata's Duncan Ross lifts the lid on eBay's big data secrets at Computing's Big Data & Analytics Summit 2015
Global online auction and e-commerce company eBay takes a tiered approach to big data in order to give thousands of staff access to business information, which they can interrogate.
Speaking at Computing's Big Data & Analytics Summit 2015 today, Duncan Ross, director of data science at Teradata International, said that eBay had built an analytics architecture to enable big data tools to be used by thousands of staff across the organisation.
"Ebay is a huge organisation with a vast amount of data coming in from their own website and other websites on a scale that even NASA would be quite scared about. Their approach is to think carefully about how they handle those massive volumes of data in terms of what systems they put it on and who gets access at which point.
"Essentially, they have a very large big-data environment where they put the 'rare' data. It has a small number of users - typically five to 10 at any one time. They are doing really deep analytics. When they discover things that are showing value, they move it onto a discovery platform. So the data is now validated," said Ross.
He added: "One of the big challenges in big data is that you end up with data sets that have a high volume, but a low value density. So there's some value in there, but the density of that value is low."
Ross argued that out of a huge volume of data only a tiny amount would be of genuine interest - much like Twitter on any given day.
"You have to have filters there because there's a lot of extraneous noise. What eBay does next is evaluate the data that has more value. That gets loaded into a different platform which handles discovery. There you have a number of simultaneous users. People who are able to do deep analytics and slightly more standard analytics.
"Again, it's a filtering process. They establish the stuff that has higher value; stuff that is more consistent goes into their data warehouse where they then have thousands of users who have access on an every-day basis to answer business questions. They don't know what those questions are going to be in advance, but they give access to as many people as possible to that data to do things.
"If they get into situations where they feel that the data doesn't go far enough, then they ask further up the chain to find whether there's a data set that will answer their questions," said Ross.
Follow Computing's Big Data & Analytics Summit 2015:
- Live on our website here;
- On Twitter, with the hashtags #ctgsummit and #bigdata.