Big Data & IoT Summit 2017: Lloyds dredges the 'data lake'

Machine learning has great potential when combined with big data, says Lloyds's Nicholas Williams - but watch out for the GDPR

Lloyds is perhaps the UK's largest financial services group, with 25 million customers and five million businesses. 12 million of its customers are online, and seven million use mobile banking. Dr Nicholas Williams is the company's Big Data and Machine Learning Senior SME, and he deals with a high volume of very sensitive personal data daily.

In the past, banks would use very simple formulae to make decisions; for example, when assessing someone for a loan, it would be some variant of "If X, then Y". Now, the machines making those decisions are exposed to a vast amount of information, from economy reports to weather forecasts. This is just one example of how the market has shifted.

It is possible to simply dump information into a 'data lake' like Hadoop, but there is more value in reading the insights and patterns in data, by combining big data with machine learning. The eventual aim is to reach a stage at which you have big data enabled by data science, using, for example, machine learning; descriptive analytics; and business intelligence. "Lloyds is on this journey now," said Williams, "but we haven't finished it yet."

Everyone involved with big data will talk about the challenges, but Lloyds has identified two main hurdles: a shortage of staff (mainly data scientists); and integration.

Williams suggested three solutions to get around the personnel shortage:

  1. Utilise partners; each of these has tools and specialties which you can use to solve your own problems. Lloyds is working with an American startup called Pindrop, whose machine learning software can determine whether a call is fraudulent or not.
  2. Machine learning as a service - using off-the-shelf solutions.
  3. Leverage your existing capabilities; specifically, work with your staff. Some may be able to retrain themselves into "at least ‘citizen' data scientists."

When it comes to integration challenges, data science comes down to three main steps: data; tooling; andenvironment. Businesses need a data strategy that will use and send data around the organisation, safely and efficiently; this requires a toolkit suited to your specific needs (each one has its own specific limitations); and an environment to store your data. The environment you are operating in will also determine your route to take ideas and data live.

"It's not all about machine learning" - Dr Nicholas Williams, Lloyds

At the end of his presentation, Williams stressed the importance of education. It is important, he said, to ensure that your data strategy supports data science. Customers should be involved from the beginning of the life cycle; and you, as a data scientists, should not try to do it all yourself. He added that he has, in the past, been asked to design tools using unnecessary software: "It's not all about machine learning," he said. If machine learning is not the most obvious solution, then don't try to use it.

An audience member asked: "How does your team manage the expectations of internal clients?" Williams replied that it is important to set expectations at the start of the process. Lloyds, he believes, may be the only UK bank with technical expertise at board level; but communication is still necessary between data scientists and the executives.

Another question highlighted the GDPR, and how it will affect machine learning tools when it comes into force in January. Under the auspices of the regulation, any automated, machine-driven decision needs to be able to be explained. This could be a real challenge, because sometimes you cannot unpick the logic to explain why a machine made the decision it did.