Hadoop has been much in the news of late with the multi-million dollar investments in Hadoop distributors Cloudera and Hortonworks. Along with analyst predictions that the market for the open source big data platform will be worth tens of billions in a few years, evidence is mounting that it is now being taken very seriously indeed.
Often mentioned in the same breath as Cloudera and Hortonworks, MapR has nevertheless been the subject of fewer headlines recently. The company's low profile is remarked upon by analyst Forrester in its report The Big Data Hadoop Solutions, Q1 2014. While scoring MapR highly for innovation and its current offering, Forrester said the firm lags behind the other independent vendors in terms of recognition.
"We've been in stealth mode," said CMO Jack Norris half-jokingly. "The objective is not to see who raises the most money, it's to see who drives the most value to customers."
While MapR may have no fresh tranches of venture funding to announce (the last reported round was $30m in March 2013), the company does have other results to trumpet. The HP Vertica analytics platform is now integrated into the top-of-the-range MapR M7 distribution and the firm has expanded into 10 countries.
There was also a tripling in the number of bookings in the year to Q1 2014, 90 per cent of sales coming from subscription-based product licences.
"Enterprises are using us for substantial applications. There now are seven or eight industry sectors in which we have at least one customer spending a million dollars," Norris told Computing.
To many people Hadoop conjures a picture of something that is big and powerful but hard to master - rather like an elephant in fact - but Norris is keen to put a different image in people's minds. The real power of Hadoop, he says, is simplification: the potential to reduce the number of processes and operations needed to obtain the sort of results that can really make a difference.
"This is the biggest paradigm shift in enterprise computing in my lifetime," he said. "We take for granted that there's a separate storage network from compute; we take for granted that you have an analytic environment that's separate from the production environment, but it wasn't always so."
The historical separation of functions came about, Norris says, because analytical processes were putting a drag on the production systems and so were hived off into a different workflow with a whole new set of processes required to store and extract and move the data between data warehouses and analytics tools. But Hadoop is changing all that, allowing data to be efficiently stored, processed and analysed on the same platform and at scale.
"Hadoop doesn't require you to separate. We've focussed on features that recognise that. This is the alternative to single-purpose analytical silos. And really it's about a cluster that can support the needs of a business flexibly. We've got features like volumes [logical units that allow you to apply policies to a set of files, directories or sub-volumes] so you can logically separate out data."
Simplifying the Hadoop ecosystem is the goal, he said, hiding the nuts and bolts to produce an easy-to-use multipurpose platform.
"We don't want people to think about the technology, we want them to think about what they can use it for and the benefits they can achieve," Norris said, pitching MapR firmly into the enterprise camp and differentiating it from other distributions that might be more attractive to "tinkerers".
"In a big data environment you need the ability to scale and work independently and not require a centralised administrator or IT person to set and define things ahead of time otherwise you've got delays and dependencies that can impact on the result," he said.
MapR has drawn a certain amount of criticism from open source advocates for using its own proprietary direct access NFS file system in place of the standard Hadoop distrubuted file system (HDFS). In this way it is similar to Cloudera, which has a proprietary management system built in to its distribution, but not to Hortonworks, which is 100 per cent open source. Norris says the criticism is unfair, that the company is involved in many open source projects (including Apache Drill, which he expects to be in production within 12 months), and that there is no danger of lock-in for customers.
"Our founder and CTO [M.C. Srivas] worked on Google Big Table and before that he was with Spinnaker [a NAS vendor]" he said. "We focused on the underlying data platform. Yes we did some changes there but we exposed the open APIs to make sure that data flows freely, and the application layer on top is all open source," he said, explaining that NFS speeds up the data access and transfer processes allowing streaming writes.
"The issue is data is growing so quickly that it's taking longer to move data across the network than to process it, and we're seeing some really streamlined approaches with the platform that delivers better results faster, cheaper and more flexibly than anything that's gone before," Norris concluded.