We have no problem recruiting data scientists because we have really big data, says Royal Mail

With 160,000 employees and billions of letters and parcels to deliver, Royal Mail has plenty of data for those who like a challenge, says Thomas Lee-Warren

Thomas Lee-Warren set up the Technology Data Group at Royal Mail when he joined from Network Rail two and a half years ago. Before that he was involved in nurturing start-ups.

"I'm interested in bringing the best of small company culture to large corporations and vice versa," he said.

As a company with 160,000 employees, Royal Mail sits firmly in the latter camp, which is an advantage when it comes to recruiting data scientists, developers and visualisation experts.

"With billions of letters and parcels to deliver, they see that we really do have big data, and there are also the sort of opportunities we have across the business. A lot come from start-ups or from the banking industry where they are given the same problems to solve again and again, just optimising what they've done before, whereas here we give them a diverse range of problems to solve ... we're really pulling in the talent."

Lee-Warren's group is working towards providing analytics as a service to the various entities within Royal Mail, using data - and more particularly data science - to solve business problems. So what does he look for when hiring a data scientist?

"It's a complex world, so there is more than one flavour of data scientist," he said. "There are a lot of specialisations, but when we were looking at growing this data science centre I liked the notion of their being able to talk to stakeholders and suggest things they could do with their business.''

Maths is at the heart of most of this problem solving, he explained, so STEM qualifications are generally required. He added that Scala and Spark skills are much in demand. Visualisation experts are also needed because "the more data you have, the harder it is to create a picture that businesspeople can understand - that's a skill."

While they might have different skills, Lee-Warren said he seeks certain attributes in potential employees. He looks for what he calls "super-programmers", people who can understand business problems and quickly model them in code.

"The ability to understand and solve problems quickly is very important. The guys and girls can take a problem, cut code, make models and knock up some visualisations and return and say 'this is what we've done'. They've managed to squash that process down to days or a couple of weeks, and that's revolutionary for a business that is used to tasking IT with something and when they come back they've got grey hair."

Key to this speed is being able to pull in data quickly from multiple sources. Royal Mail uses Hortonworks' Hadoop Data Platform as its central data store and deploys Apache Nifi (now adopted by Hortonworks as DataFlow) to ingest data rapidly into the central lake from which scientists can work with the pooled data.

"The combined data from two sources has much more value than two sources on their own," Lee-Warren said.

Having the data located centrally makes compliance simpler too, he went on, which is important for an international business.

"Hortonworks provides an opportunity because you can better understand the data; you've got it all together and you've got the lineage. You know where the data came from and how many copies you have, whereas before it was siloed. You can introduce much more disciplined processes around it when it's centralised."

Hortonworks has recently integrated the Apache Atlas data governance solution with Ranger security software in response to demand from large customers including Royal Mail, which was demanding a more coherent answer to compliance issues.

So far the board has been very enthusiastic about innovations around data, although it is still early days, Lee-Warren said. The Technology Data Group is mostly focused on improving the customer experience, after this was identified by the board as a priority. Projects undertaken include building churn models and predicting the volumes of letters and parcels to hit particular sorting offices before they arrive. This is one example of the predictive analytics that will become more important: "How do we balance our workforce with our volumes in an increasingly competitive environment"?

While some members of his team are charged with assessing new technologies in the Hadoop ecosystem, data scientists need stability and discipline.

"A lot of people are excited to try out the next great thing, but you need to make sure you have a stable platform for other people to develop on - that's important when things are changing fast," he concluded.