There are various definitions of what big data represents. The 3Vs (velocity, variety and volume) is the best known, but while this describes the nature of the data it has little to say about why you should go to the trouble of processing it.
There is another more practical definition, which is very much tied in with the impending internet of things (IoT): turning operational technology into information technology.
Operational technology (OT) is defined by Gartner as "hardware and software that detects or causes a change through the direct monitoring and/or control of physical devices, processes and events". Examples include sensors, machine-to-machine buses, logging devices - and thermostats, such as the NEST smart device that was recently purchased by Google.
The number of such devices and sensors is increasing exponentially as everything becomes more connected, and increasingly data from OT (and other sources) is being mined in real-time for insight.
"The most prevalent example today is that data from sensors in manufacturing equipment is becoming part of information technology," said Yves de Montcheuil, VP of marketing at data management company Talend.
"Historically, this data was never viewed as 'information' in the sense that IT means it. It was used by the operational engineers running the manufacturing chain to check the equipment was running properly, but it was analysed in a separate world from IT."
Organisations that generate a lot of data from OT are starting to see the benefits of pooling this data and analysing it in a wider context, he said.
"Let's say you're a just-in-time manufacturer. Being able to do capacity planning based not just on historical statistics but based on the instant availability of information about the health of the manufacturing chain is extremely valuable."
Another example in which OT can usefully be turned into IT is the airline industry, where the data used by maintenance engineers is typically not made available to the rest of the business.
"The agents at the gate have no clue what's happening with the aircraft. They know maintenance is going on but they have no other information to give to all those pissed off customers who are screaming 'when is my flight going to leave?'," said de Montcheuil.
But these are early days. While a few organisations have bridged the OT/IT gap, in the majority operational functions and IT are still separate worlds, exchanging figures and reports on a weekly or monthly basis rather than in real-time.
Breaking down the walls between OT and IT is, at heart, an integration problem. It is an issue as old as IT itself, but addressing it with big data technologies represents a significant break with the past, de Montcheuil argues.
"Companies have been doing integration for decades. In the two industries I've mentioned [airlines and manufacturing] they might have been using something like Informatica or IBM Datastage, and in many cases that's been serving them well. But on the big data journey they shouldn't be using the tools and techniques of the past."
The landscape has changed. First there are a lot more OT data sources coming online as the IoT takes hold, and a lot more more public sources as well; second is the rise of open source integration technologies, particularly Hadoop; and third is cloud.
"Big data is going to enable a lot of new bridges between systems," he said.
"With version 2.2 Hadoop is technologically mature enough to deliver actionable intelligence. Cloud is an enabler, it's quick and easy to deploy Hadoop clusters in the cloud, and it's also another source of data. You've got open data from governments, social data - which is more or less public - and private data from Bloomberg, weather stations and the like. These are all new types of data that can be very relevant to big data projects."
Perceptions of what big data technologies can do are also changing. In most organisations early efforts are focused on simply crunching more data for better analytics, whereas later in the journey the possibilities of deploying them to drive organisational change become of interest, feeding insights into day-to-day operations and automating improvements in processes.
As open-source integration platforms like Hadoop become mainstream and computing resources become still cheaper, de Montcheuil believes, expensive proprietary enterprise tools will be left behind.
"They require that you use a proprietary integration engine, or they require you to deploy runtime on every node of the Hadoop cluster. Security is not integrated with Hadoop's security; they have their own monitoring deployment consoles that don't integrate with the Hadoop consoles. You could probably get it to work, but it's ‘kludgey', it will be difficult to manage and extremely expensive."
Ultimately, he said, the flexibility and interoperability required to turn data into insights and insights into automation and organisational change could mean the end of the line for proprietary enterprise platforms.
Computing's Big Data Summit 2014 takes place in London on 27 March. Register today.
Sometimes, the power of the mainframe is the most cost effective answer. Computing's Peter Gothard puts Computing's readers' questions on the future of the mainframe to IBM's Z13 expert Steven Dickens.
This Dummies white paper will help you better understand business process management (BPM)