Up and down the country, probably across the world, new projects in big data are starting up. Many are focusing on personal data that has been freely given combined with information that has been bought in and, using big data analytic techniques, and they're finding out more about their customers than was ever dreamed possible.
Off the record (perhaps with the aid of some alcoholic liquid lubricant) the data scientists working on such projects will tell you more about all the different ways they are mining this data - and the new insights on clients, potential customers and business operations that these big data projects are uncovering and the ways in which they are being used.
However, the Information Commissioner's Office (ICO) has already sounded a warning: at the end of July, it released a 50-page report entitled "Big data and data protection", in which it warned companies that they must adhere to the Data Protection Act when conducting customer-focused big-data projects.
Big data and data protection
The ICO's concern is that as customer-focused big-data projects become more widespread and pervasive, the rules governing data protection will be forgotten as organisations rush to learn more and more about their customers, and the maximum fines that the ICO can dish out for non-compliance - if and when it finds out - are small compared to the potential business benefits.
"[Big data] is characterised by volume, variety and velocity of data, and by the use of algorithms, using ‘all' the data and re-purposing data. The ICO is interested in big data as it can involve processing personal data," says the ICO.
It continues: "Many instances of big data analytics do not involve personal data at all. Using climate and weather data, for example, could enable new discoveries and improved services without using personal data.
"However, there are many examples of big data analytics that do involve processing personal data, from sources such as social media, loyalty cards and sensors in clinical trials. Where personal data is being used, organisations must ensure they are complying with their obligations under the Data Protection Act," warns the report.
The Data Protection Act - which will be broadly applicable across Europe - applies in a number of ways, says the ICO.
One key data protection requirement is to ensure that the processing of personal data is "fair". This is especially important where that processing will be used to make decisions that could affect individuals.
"Fairness is partly about how personal data is obtained," explains the report. "Organisations need to be transparent when they collect data, and explaining how it will be used is an important element in complying with data protection principles. The complexity of big data analytics is not an excuse for failing to obtain consent where it is required."
In other words, there is an emerging grey area over the use of multiple different data sets for big data analytics where personal data is involved. For example, an organisation might have legacy customer data in a customer relationship management (CRM) system, which it is now combining with data gleaned from its e-commerce website, as well as data - both online and offline - acquired from third parties.
Yet, as far as the ICO is concerned, this may not be compliant with the Data Protection Act. Indeed, it is quite explicit that it probably isn't:
"Big data analytics can involve re-purposing personal data. If an organisation has collected personal data for one purpose and then decides to start analysing it for completely different purposes (or to make it available for others to do so) then it needs to make its users aware of this."
Theory and practice
In theory, big data techniques ought to enable organisations to better serve their customers. In practice, though, it's a little different, says data scientist Andrew Maclaren, managing director of consultancy Brilliant Data.
"Organisations are massaging customer data to try and create customer wants and needs," says Maclaren, citting a contract with an agency involved taking web logs of customer activity.
"It did demographic research on its customers to find out what they were searching for, how long they were taking, who was searching and why," says Maclaren.
That research found a correlation of various factors, such as that between the hours of 1pm and 3pm, women between about the ages of 28 and 38 frequently looked at new cars. Drilling down, it found that in four-fifths of cases those women also had children and were therefore most interested in family cars with a good reputation for safety.
In response, the agency pushed that information to its car-making clients, advising them how they could identify users on their websites and push offers appropriate to the needs that Maclaren's research had identified. "That kind of analysis worked across industry, although it was mostly a B2C thing," says Maclaren.
The example demonstrates how big data analysis can be used to determine likely customer needs and serve them appropriate offers. However, many users might also be surprised at the ease with which they can be identified on websites with the aid of data-aggregating third parties.
In the US, ethical questions have increasingly been raised over the activities of "data brokers", commercial organisations that keep ever-expanding databases about people's buying, web browsing and other habits for sale to all-comers.
Acxiom, for example, claims to have files on 10 per cent of the world's population, with about 1,500 pieces of information per consumer - at the moment.
Data brokers' files on individuals are generally sold by list, which according to testimony before the US Congress includes lists of rape victims, pensioners with dementia, "financially vulnerable" people, people with specific medical conditions and the medication they are taking, and even police officers, including their home addresses.
"The advertising community has been woefully unforthcoming about how much data they're collecting and what they're doing with it. And it's going to backfire on them, just as the Snowden revelations backfired on the National Security Agency [NSA]," high-profile angel investor Esther Dyson told Adobe's CMO.com website. She continues: "Ethics don't change. Circumstances change, but the same standards apply."
In response to such concerns, some organisations have started appointing chief privacy officers or chief data officers, although it is unclear how much real power they actually wield.
Martin Houghton, managing partner at HP - and a former chief data officer at rival services firm CSC - believes that the criticisms of big data have been largely over-done.
"Consent is the key to getting it right," says Houghton. Banks today, for example, will request permission from customers to be able to use publicly available data about them. "That consent is absolutely vital," he says. With people increasingly sensitive to privacy issues, "companies have got to weigh up how they are perceived from a marketing perspective by their customers", he adds.
Big data investments
Regardless of whether organisations are becoming more ethical about their use of data or not, one of the key investors in big data technology is the CIA, the US intelligence-gathering agency, via its "not for profit venture capital firm", In-Q-Tel.
It unashamedly describes its aim as helping to fund the development of technology "with the sole purpose of delivering these cutting-edge technologies to intelligence-community end users quickly and efficiently".
In recent years, it has invested in RedOwl Analytics, "a company that applies statistics to the ever-growing corporate digital trail to examine organisational dynamics in support of governance, risk, and compliance"; Paxata, a developer of a data preparation platform "that lets business analysts rapidly connect, explore, transform and combine data"; and, Narrative Science, a company that claims to be "a leader in automated business analytics and natural language communication technology".
While the CIA is a separate entity from the National Security Agency (NSA), the leaks by former NSA contractor Edward Snowden have demonstrated the extent to which such organisations are prepared to use technology to spy on friends, enemies and citizens alike. The emergence of big data technologies perhaps explains their apparent desire to monitor communications globally.
Furthermore, they are also typically among the first customers of new, privacy-busting technology, regardless of how ethical the companies producing them endeavour to be. "Let's face it," says one data scientist, speaking off the record, "how ethical can you be when your first clients are the US Army and then Google?"
What is big data?
Definitions of big data vary. However, such organisations as McKinsey and Gartner and even the ICO have devised definitions that summarise the technology and how it ought to work:
"Big data and data protection" - Information Commissioner's Officer
Sometimes, the power of the mainframe is the most cost effective answer. Computing's Peter Gothard puts Computing's readers' questions on the future of the mainframe to IBM's Z13 expert Steven Dickens.
This Dummies white paper will help you better understand business process management (BPM)