IT departments sidelined as finance and marketing harness big data
Enterprise vendors say interest in their big data systems is increasingly coming from finance and marketing chiefs
The pressing need to analyse huge data sets containing high volumes of unstructured information is forcing some corporates to re-evaluate how they manage their data assets, with individual business departments like marketing and finance bypassing the IT department to take the job on themselves or outsourcing it to cloud providers.
Oracle launched a specialised big data appliance this week, a dedicated piece of hardware running a mixture of its own proprietary and open source software designed to speed up the process of filtering, sorting and indexing structured and unstructured information from diverse sources before loading it into a data warehouse. Big data can include social media content, email, HTML pages, instant messaging (IM) logs, blogs, digital images, video files, surveillance footage, e-commerce transactions, call records and medical records, as well as datasets created by academic, scientific and research projects, which process large volumes of information.
The companies analysing these huge volumes generally do so either for risk protection or business intelligence purposes, and Oracle is just the latest in a long line of IT vendors – including IBM, Dell, EMC and Microsoft – looking at ways to tap into enterprise fears of being left behind by competitors who are able to mine and interpret market data faster and more accurately than they can.
"People are either asking what is all this stuff and what is the risk to us, or what market opportunities are we missing because we can't see what people are saying about us on Twitter or whatever and we can't figure out what it [all this data] is. But the thing that has really changed is the variety in the kind of data available, and that is where IT managers run up against absolutely hard and fast limitations on what they can do," Debra Logan, vice president and distinguished analyst at research firm Gartner told Computing.
"So you need massively parallel computing systems to break that down – more power, speed and memory – to break up unstructured data and index big data sets, which is what people have always struggled to do."
Oracle's big data appliance may represent more of a product refresh than a new direction: the company has been selling an Exadata database appliance for a number of years. The use of the open-source Apache Hadoop distributed computing platform is an innovation, though it will link to Oracle's proprietary 11-g structured datasets via the Oracle Data Integrator (ODI). The capture and management of unstructured data will be handled by another open-source application, Oracle NoSQL, a general-purpose database supporting simple queries and easy administration.
By offering pre-packaged and pre-configured dedicated appliances with a focus on ease of use, vendors like Oracle are tapping into an emerging trend within corporate management structures, whereby business intelligence tasks like data analytics and reporting are performed by individual departments or members of staff who no longer rely on the IT department to do it for them.
"The empirical picture is going to change drastically with more specialisation around data – CIOs are getting replaced with chief marketing officers and chief finance officers who do not necessary understand the technology side but they have bigger budgets," said Logan.
Logan's comments echo the opinions of Teradata CEO Mike Koehler, who told a user conference that more companies are treating data as a corporate asset rather than an IT resource, which forces change in operational budgets and control.
Discounting its Exadata appliance, which the company insists is more about storage than analytics, Oracle is relatively late to market with a big data push, even though rival vendors still disagree on the precise definition of the problem.
Storage giant EMC already offers a similar device built on the Greenplum database 4.0, delivering data loading performance of up to 10TB an hour, with Netezza and Teradata offing similar massive parallel processing (MPP) appliances and Dell having hooked up with Aster Data's nClusterMPP platform, optimising the software to run on Dell PowerEdge C-Series servers for large-scale data warehousing and advanced analytics.
Cloud service providers like Amazon, Google and Microsoft have long had the infrastructure to handle both the volume of data being created and the huge processing requirements needed to properly analyse it, with new companies like LexisNexis spin-off HPCC Systems also reported to be considering cloud-based access to Hadoop-type analytics systems. What dedicated big data appliances may do is give corporates the hardware they need to do the same thing in-house.