Dutch bank ING has taken on Hadoop technology in order to build a new data platform that it hopes will enable it to deliver better services to its customers.
In an interview with Computing at The Hadoop Summit 2014 in Amsterdam, Anurag Shrivastava, head of the solution delivery centre at ING, explained that the firm was looking to move away from a batch-oriented and traditional data warehouse setup, to a modern data warehouse that could help it to yield benefits from big data.
Shrivastava explained that the bank had been using IBM Unica as its CRM platform for inbound and outbound email campaigns, and that its data warehouse was built a decade ago.
"We engineered the entire stack in-house with EMC, IBM machines, and Oracle on top of that. It is pretty reliable and has disaster recovery also set up, but although it was state-of-the-art in 2003, it is probably outdated now," he said.
To shift away from some of the older tools, ING's marketing team suggested that the bank explore what big data really was.
"[My team] said yes we'll do that, and I think in a large bank you get approached by a lot of vendors and they all say that they have a ready-made solution that you plug in from which you can gain value from big data. IBM has a significant presence in ING and they were also trying to push their solution on us. However, we wanted to really explore and play around with it before making a decision," he said.
So in March 2013, ING's engineering team set up a "play area" for big data, using unused hardware in an isolated area in the bank, which was disconnected from the rest of the bank's architecture.
"We set up a Hadoop cluster there, loaded some data on it and started playing with it, with [big data tools] RHadoop and Hive, and that is how the journey started," Shrivastava said.
But he knew that the environment was just a base to start from, and that once people bought into the value of it, the bank would have to build something better and bigger.
The play area was based on Hadoop data platform Cloudera, but Shrivastava said that the firm steered clear of Cloudera because it felt that the firm may "come up with something that is not open source" which could be a "trap for ING in the long term".
In July 2013 the bank decided it would build a complete solution, and by October it created a new project that involved buying new hardware and building what Shrivastava deemed a "serious Hadoop cluster".
His team talked to various other companies and decided that HortonWorks was the best match. He suggested that this was purely because of HortonWorks' approach to technology.
"We aren't a technology company, we're a bank. But a lot of inspiration in the bank has come from technology companies; we talk more about Facebook and Google at ING than we do about the Bank of America. So in HortonWorks we saw a good fit as they're based on open source, have a big community and have a clean approach to open source," he said.
"The great thing about HortonWorks is that they have a very active open source community, so if you have any problems you can talk to them and they will find the answers," he added.
The company opted for HortonWorks Data Platform 2.0 (HDP 2.0) and decided to build a predictive analytics lab.
Despite being impressed by HortonWorks, Shrivastava believes that Cloudera and HortonWorks' approaches are similar, and that either solution could have worked for them.
"Hadoop is just Hadoop, whether you take it from Cloudera or HortonWorks - it doesn't always matter what their strategies are; we have to use our brains too," he said.
The bank didn't have any issues with the initial load, but has encountered problems with the second system it is setting up.
"We're having problems but they aren't to do with Hadoop - they are all inside problems; how a department works and how new technology is brought inside the data centre, for example," he said.
[Turn to page 2]
Successful leaders are infusing analytics throughout their organisations to drive smarter decisions, enable faster actions and optimise outcomes
Focus on cost efficiency, simplicity, performance, scalability and future-readiness when architecting your data protection strategy