Big data in 2013

By John Leonard
02 Jan 2013 View Comments
big-data-2013

While MongoDB planted its flag here a year or two ago, concentrating on ease of development and user education, with its latest release Couchbase has moved into the same open source space, claiming improvements in scalability and performance over its rival.

Further reading

"Enterprises are adopting NoSQL technologies very rapidly now. We're seeing more and more organisations experimenting with applications that use NoSQL technology. In 2013 you're going to see increased competition among NoSQL players. As these players develop their solutions, there's going to be more overlap and more head-to-head competition. I think that's great," Couchbase CEO Bob Weiderhold told Computing.

Computing verdict: Not a chance

Will the big data bubble burst in 2013?

Some see the proliferation of new vendors, the rapid growth of some of these vendors and the enthusiastic large-scale investment in the big data field as the symptoms of a classic bubble. Is investment being driven by hype rather than real long-term value, and is that money being spent disproportionately on marketing to obtain market share? In some cases probably yes, but while 2013 will undoubtedly see a degree of shakedown in the market, and perhaps a few failures, the underlying long-term drivers for big data remain strong.

First, the annual increase in data volumes is starting to outstrip Moore's Law, suggesting that for increasing numbers of organisations, traditional methods for collecting, storing and processing data will no longer be sufficient.

Meanwhile, as Western economies continue to limp along, retaining the good will and engagement of existing customers (or citizens) and communicating with them more intelligently will remain a top priority for all sectors, as will pursuing the efficiencies that can be gained from automating manual processes. Compliance is a growing issue in many sectors too, with more data being stored for longer, often with a requirement that it can be retrieved quickly.

Diversity is another reason for continued expansion. Big data is a broad church. Among the different types of NoSQL databases, for example, are key value stores, column-oriented databases, big tables, data grids and others, each with its own use case. Businesses and public sector organisations will continue to need different tools for different jobs, and while work is going on to improve the flexibility of big data offerings, it is highly unlikely that any sort of catch-all solution will emerge in the next 12 months.

The wheels might come off the bandwagon, though, if big data analytics demonstrably fails to deliver the competitive advantages and novel insights promised, in other words if the hype gets too far ahead of the reality. To be successful, big data needs to be part of an overall data management and business strategy, with skilled technicians, data scientists and managers on hand to see it through. In short, for many obtaining real business value from big data might turn out to be much more difficult than they first imagined.

Computing verdict: Probably not, although there will certainly be some market consolidation

Will real-time querying become easier?

Because of its inherent complexity, much effort is currently going into simplifying big data systems, allowing real-time querying in standard SQL and easier integration with data warehousing and standard BI systems.

In 2013 Microsoft will release its PolyBase system, which will integrate its SQL Server Parallel Data Warehouse (PDW) with Hadoop. Initially PolyBase will pull data directly from the Hadoop Distributed File System (HDFS) rather than going through MapReduce, but later releases will allow both approaches, and this functionality may eventually be built into Microsoft's standard SQL Server database in the future.

Late last year Cloudera released Impala, a SQL query engine that runs on Hadoop and allows users to interrogate in real-time both HDFS data and that in HBase databases. Oracle too has just released a new version of its NoSQL database, offering tighter integration with both Oracle Database and Hadoop environments.

With giants like Microsoft and Oracle moving in this direction (other examples include Rainstor, Teradata Aster and ParAccel) it is safe to say that we will see more of the same during 2013.

Computing verdict: It already is. Expect much more this year

Will Hadoop lose its crown?

Most distributed big data systems are currently built around the vast data crunching capabilities of Apache Hadoop and its supporting ecosystem (Hive, Pig, Zookeeper and other add-ons) designed to make it more business-friendly. But is this ecosystem now in decline? After all, many of the advances listed above come despite Hadoop rather than because of it, or are workarounds designed to overcome its shortcomings for real-time ad hoc querying.

Google now uses its proprietary Dremel query system in place of Hadoop. Dremel allows ad-hoc querying of petabytes of data in seconds using a language similar to SQL. Drill is the open source version of Dremel and when mature could well see a move away from Hadoop towards this faster, easier-to-use alternative, which may be much more suitable for non-technical users.

Reader comments
blog comments powered by Disqus
Newsletters
Is it time to open Windows?

Computing believes that Microsoft will start offering Windows free of charge by 2017. Is this a good thing for the enterprise?

56 %
15 %
7 %
20 %
2 %