“By far our most important competitor is Oracle. After that it’s Oracle, Oracle and Oracle”, says Max Schireson president of 10gen, creator of the open-source NoSQL database system, MongoDB.
“I see other NoSQL players such as DataStax [distributor of Apache’s Cassandra] and CouchDB as comrades in arms in the battle to persuade people that the answer does not have to be Oracle.”
So who are these pretenders to Oracle’s analytics throne and does Larry Ellison need to start worrying?
What is NoSQL?
The confusingly-named NoSQL (generally considered to stand for “not only” SQL) is a database system based on documents rather than tables. Often mentioned in the same breath as “big data”, NoSQL enables large data sets to be spread across racks of cheap commodity servers allowing for almost infinite scalability and processing power, while the lack of fixed schema makes it ideal for applications where a wide variety of data types must be brought together into a coherent whole.
Combined with MapReduce technology such as Hadoop, NoSQL can perform complex operations such as statistical aggregation, filtering, grouping and sorting on large data sets, leaving traditional data warehouses and BI (business intelligence) appliances far behind in terms of performance and price, especially at extremes of data volume and variety.
Horses for courses
If your data set is reasonably small and structured and you wish to run scheduled and ad hoc queries then an open-source relational database like MySQL running on a commodity server and connected to some sort of data warehouse may be all you need for affordable analytics and reporting.
But MySQL does not scale up well. For faster and more specialised analytics an enterprise will likely be looking at a proprietary specialised RDBMS (relational database management system) and data warehouse, such as an Oracle solution, running on a high-powered server – most likely with a SAN (storage area network) to take care of storage duties.
At the high end of the analytics spectrum things start to get seriously expensive.
“In the relational world, when you need real processing power you might go out and buy a big [Oracle] Exadata box for $10m,” Schireson told Computing.
“But in our world the way to get more power is just to buy more cheap commodity servers. One $10m server will typically have less processing power than a rack full of 50 cheap commodity servers that cost $5,000 each or $250,000 in total.”
Storage will generally cost less too, as the distributed system can make use of cheap direct-attached storage (DAS) rather than an expensive SAN, and Schireson insists that support costs come in at just 20 per cent of those for an Oracle solution.
Big data = big opportunity for NoSQL
For processing and querying data sets that are characterised by huge volumes and enormous variability (a.k.a. big data), employing NoSQL on clusters of cheap servers for high-powered analytics becomes an increasingly attractive alternative to a large SQL appliance, especially when it is used in conjunction with MapReduce technology to crunch the data: NoSQL databases are generally distributed pre-integrated with Hadoop, or at least with built-in support.
“If you’re storing data in a relational database and you want to run it through Hadoop, you need to take the data out of the database, put it into HDFS [Hadoop File System], do the analytics in Hadoop, take the result of that and put it back into the database. With Mongo you can do those operations in real time while it’s still in the operational database. You can also mix and match database-style queries with Hadoop-style MapReduce analytics,” said Schireson.