Big data is becoming a commodity, says ING data architect

Stop going on about petabytes of data and get on with creating data-driven applications, urges senior architect Natalino Busa

Big data platforms like Hadoop are becoming a commodity with the next phase, enabling the creation of products from data analytics, the current area of focus, according to Natalino Busa, senior data architect at banking and financial services firm ING.

"Let's face it, big data as most people understand it, is now 11 years old," he said. "In the first part everyone was talking about the big numbers - petabytes of data and hundreds of nodes - and I still see those infographics sometimes, but really, that's over now. We should stop talking about that."

Busa continued: "The operating system, the platform, that's not in the spotlight any more. It's just an enabler. It's becoming a commodity. Now it's all about building compelling analytics-driven products: APIs, machine learning, recommenders, anomaly detection engines," he said. "These things can have a tremendous effect on the business."

Describing ING as a "data company first and a financial services company second," Busa says developments that are happening on top of the big data platforms are allowing the bank to develop more customer-centric services.

"Data is central. It empowers the customer to have a better understanding of the services they use. It also allows ING to provide better experiences around both marketing and security," he said.

"We are entering an era with in-memory analytical tools like Spark and with event streaming we can actually have a much more direct interaction of customers in terms of services, but also trust and security of the whole system."

ING has a "small but growing team of Spark users in a number of different projects", he said, that's focused on personalisation of services, operational IT excellence behind the scenes, creating APIs for analytics, and "some initial experiments to create add-ons to existing fraud and cybersecurity components," although he stressed that these are additions to "a very solid platform" of security systems rather than replacements for them.

"We see this trend where specialised tools can derive from a product that is more generic in nature but which can provide that extra edge, or that extra analysis, coming from new ideas and new algorithms," he said.

This trend strengthens the links between data scientist and machine learning specialists and existing engineers and infrastructure, he added, which is making data science itself more of a commodity.

"It allows us to grow in the direction of making data science more generic, more of a platform within the company."

However, despite these advances Busa does not think the big data platform consolidation is over by any means, in part because it is still cheapest to store data on spinning disks.

"We're not done with convergence yet. I think all these technologies - Hadoop, NoSQL, enterprise SQL - will have a second round of convergence," he said, going on to mention that Spark might be displaced by advanced streaming projects based on a completely new paradigm like Flisk and Apex "five or six years from now".

"My attitude is pragmatic. I prefer to see what works now and keep a roadmap to see what is coming in a year or two year's time," Busa said.

Computing's Big Data Summit returns to London in March 2016. To find out more - and to book your FREE place - sign up here