In a speech to the Royal Society on 9 November, Chancellor George Osborne insisted that big data analysis will be one of the cornerstones of the UK economy in the years to come.
Claiming that the UK is the world leader in the collection, tabulation and provision of data, maintaining extensive datasets in areas such as healthcare, demographics, environmental change and food, the Chancellor said: "The next generation of scientific discovery will be data-driven discovery, as previously unrecognised patterns are discovered by analysing massive data sets."
"Business will invest more as they see [the government] invest more in computational infrastructure to capture and analyse data flows released by the open data revolution."
One company that will have noted these words with interest is Californian software firm Cloudera, which has been in discussions recently with the UK government over how best to process these massive data sets.
Cloudera's CDH, now on version 4, is the firm's open-source Apache Hadoop distribution, which includes components that make Hadoop more user- and business-friendly, such as the column-oriented distributed database HBase to allow SQL-type queries, and other elements that provide workflow, security and integration with other systems.
As well as being free to download from the web, CDH is bundled into Oracle's high-end Big Data Appliance (BDA), demonstrating Cloudera's aim to cover all the bases when it comes to potential customers. To this end, too, the firm is directing much effort into its training and certification programme.
While some might view Hadoop as experimental and thus something of a risk, COO Kirk Dunn insists that the barriers to entry are much lower than many assume, and that anyone with SQL and basic Java skills could – and should – make a start with CDH.
"We're a next generation data management platform." says Dunn. "CDH is very lightweight and inexpensive to start, yet the return is customer intimacy."
Dunn believes that if technical people start thinking in terms of data rather than IT the business case for giving Hadoop a try becomes much easier to make.
"There's not a company on this Earth that wouldn't want the sort of relationship with their customers that Facebook has. Where are you going to learn more about your customers? From how you think about IT or how you think about data? The answer is obvious," he says.
"If you're unwilling to try new things, then innovators in your field are going to run past you."
Mix and match
The promise held out by MapReduce technologies like Hadoop is that by treating all data as organic and tipping it wholesale into a big mixing pot, relationships will emerge that would never be arrived at were the data to be artificially forced into a fixed schema.