Ocado: Why we picked the Google cloud

Hi-tech retailer needed to democratise analytics, explains data scientist Marcin Druzkowski

As an online-only groceries retailer with no physical stores Ocado is a heavily tech-focused business. The company retains four-years-worth of customer orders on its systems, and it uses this data to train machine learning models to power recommenders, optimise the operations of its delivery fleet, and to feed back into its famous robot-controlled warehouses and the Ocado Smart Platform ecommerce system which it offers for sale to other retailers.

With the firm's success dependent on fine margins, optimising processes is an essential task, and algorithms need constant refinement and tweaking. Innovating new technologies to stay ahead of the game is another vital activity (it should be noted that Amazon recently entered this space, so this is no time to be sitting on laurels). It is no surprise therefore that data science plays a big part in the company's strategy.

At a recent DataIQ event, Marcin Druzkowski, senior data scientist at Ocado Technology, told Computing that one of the things he's has been working is machine learning (ML) modelling to support recommenders (systems that prompt customers to repurchase an item and select alternatives they might like to try). The fact that most of the goods Ocado sells are perishable makes the business of designing recommenders much more challenging than equivalents for clothes, books or music.

"On Amazon if someone has already bought a book they probably will not buy the same book again," said Druzkowski. "In case of milk, when you have finished it you need to buy more. It's a totally different space."

The difficulties also extend to the supply chain. A product like raspberries has a saleable life of just a few days, and substitutions may need to be considered: in the absence of raspberries would the customer like strawberries instead? Plus, there are extreme seasonal peaks, such as the demand for turkeys and Christmas puddings in December.

"All of this stuff is pretty challenging and it forces us to focus a lot on data and basic analytics as well as ML and AI and that's why we have a pretty big data science team." Druzkowski said.

An additional challenge is presented by the EU GDPR. Mixed in with the multiple terabytes of data stored by Ocado there is a lot of personal information that needs to be treated as a special case under the new rules. The data scientists have been attending workshops in the discovery and categorisation of personally identifiable information, how it should be stored and anonymised, and who should be allowed access to it. Cleaning and tagging data is something else occupying their time.

On the innovation side, Druzkowski and his colleagues are working with Google's TensorFlow AI library to see what new insights they can derive.

"It's about how the data scientists can get value from the data. That's the foundation for all our work. On top of our data platform we're using TensorFlow to build neural networks on more complex machine learning models to do that. If the data is just sitting there and it's worth nothing."

Into the cloud

Ocado has gone the classic big data route: discovering its Oracle enterprise SQL databases could not scale sufficiently it went looking for alternatives, but where it has ended up is more unusual. It currently runs on a mixture of AWS and Google clouds.

We started with Hadoop four years ago but we very quickly learned that it's not for us

"We started with Hadoop four years ago but we very quickly learned that it's not for us so we switched to Spark," Druzkowski said. "But then we realised we'd need something even more simple because to write a Spark job you need some knowledge of Scala, Java or Python and you need to think about the data locality. It's not an easy thing to do so we moved to the Google solution BigQuery and we switched the whole platform to cloud."

Ocado: Why we picked the Google cloud

Hi-tech retailer needed to democratise analytics, explains data scientist Marcin Druzkowski

The company stores more than half a petabyte of data on Google's cloud and makes use of its data flow tools and the Dataproc managed Spark service in order to democratise access to data for analytics, he went on.

Business analysts can write simple SQL queries even on terabytes of data with a history of three or four years and get an answer back in three or four seconds

"These tools can handle any data format and they're pretty easy to use. Business analysts can write simple SQL queries even on terabytes of data with a history of three or four years and get an answer back in three or four seconds."

Previously Ocado spent a lot of time fine-tuning their databases and supporting infrastructure, but this is no longer considered a productive use of their time, Druzjowski said.

"That's not a core part of our business. We need to think about the end product which helps the customer experience rather than improving the databases because other companies do this much better than us."

Enter Amazon

Not everything is with Google though. Ocado's web shops and the web services part of the infrastructure sit in AWS, with aggregated events and data fed into the data platform on Google. Druzkowski did not know of any plans to change this given Amazon's entry into the space, and he had a slightly pat-sounding answer when asked about future competition from Amazon, mentioning that you can already order from Ocado via Amazon Alexa.

"We are very happy that they're coming into the market because that only shows that there is a demand."

He added that Ocado has "years of experience in delivering food," and knows all about the intricacies of temperature regimes and the rest.

"So we're not afraid, it only makes our whole market bigger, but of course we need to keep an eye on what they're doing what kind of machine learning techniques are using. In the future they could be competition but right now the whole market is growing."