Computing research: The power of data science

Stand aside BI Man. Extracting meaning and value from today's unstructured business information is a job for the Data Scientist

In the beginning there was BI

A big beast in the IT jungle, business intelligence (BI) evolved to fill a niche: meeting the decision-making needs of large and complex organisations, bringing together disparate silos of data, processing them and pushing relevant information out to where it was needed.

For many years BI specialists had no major competitors in a corporate environment that valued their expertise. Commandeering as they did a fearsome array of tools by IBM, Microsoft and Oracle among others, they enjoyed a symbiotic relationship with the board, feeding strategists the information they needed for planning and being granted a favoured status in return.

Then things started to change. Data got big, new technologies arrived and the board's demands for timely analytics grew. Organisations of all types began to realise that extracting value from the torrents of data streaming in from their websites and other sources every second of every day could give them a real competitive advantage.

BI now found its position challenged. It was forced to adapt to the changed conditions, in particular the need to make sense of very large and rapidly changing sets of data not easily marshalled into the familiar tabular formats that BI systems require. As a result of this adaption a new species emerged: the data scientist.

Hey, can I get your number?

Suddenly, data scientists are hot property. This rapid increase in sexiness might come as a surprise to even the data scientists themselves, although as analytical types with a broad overview, a talent for spotting trends and a firm grasp of numbers, there is no logical reason that it should.

In short, data scientists are in demand because in organisations of every size and sector data is now a fundamental resource. At the same time, data volumes are increasing exponentially, and with organisations increasingly fighting over the same patch of ground, strategists need someone who can read the digital runes and provide them with evidence-based direction to help them get ahead.

Computing research: The power of data science

Stand aside BI Man. Extracting meaning and value from today's unstructured business information is a job for the Data Scientist

A data scientist, then, is someone with the confidence and ability to speak truth to power in a language that power understands, while retaining a firm foothold in the IT bedrock, manipulating, processing and analysing data from multiple sources to come up with a coherent picture of the organisation and the environment in which it operates.

Lost in translation

Many BI implementations have historically failed to live up to their billing, partly because of differing ideas of requirements between IT and the rest of the business. Tools are regularly rolled to business users (for example, report design, ad hoc query and OLAP tools) that these non-technicians have neither the time nor the inclination to learn. This is a sure-fire way of ensuring that business users will download data as a CSV file and manipulate it manually using a spreadsheet, thus recreating the very data silos, duplication and error-prone processing that the BI implementation was designed to eliminate.

Other problems occur when a BI system is built on a datawarehouse that does not extend across the entire business, or where silos of data have been left out on a limb – perhaps owing to departmental politics – or where an unsuitable BI solution has been bought by the business after having failed to consult properly with IT.

These common shortcomings have one underlying cause: mutual misunderstanding between business and IT. It is in this gulf, which many might find uncomfortable, that the data scientist finds his or her niche, as an intermediary between the two parties.

Equal but different

So how does a data scientist differ from a BI specialist and will one species displace the other?

While BI specialists understand data and analytics, their focus tends to be on the technical aspects such as implementing the software and controlling how data is stored within the system. Their expertise may also revolve around a particular vendor. A data scientist is generally less concerned with the technical nuts and bolts and more interested in the analytical side, revealing the messages that lie hidden in the data and uncovering insights that can deliver immediate competitive advantage, for example through better targeted marketing campaigns, more intelligent customer service or decreased waste.

Communication is a core skill of the data scientist, bringing together the technical data crunchers and coders in IT with the business strategists in the boardroom – the majority of whom are likely to be non-technical. The data scientist is a key ally of the CIO, providing the information needed to make decisions based on evidence rather than insight.

Computing research: The power of data science

Stand aside BI Man. Extracting meaning and value from today's unstructured business information is a job for the Data Scientist

This, then, is a multi-faceted role. As well as BI skills such as SQL, data mining and OLAP, the data scientist needs to be comfortable with statistical techniques such as regression analysis, and – perhaps most importantly – presentation skills, such as public speaking and data visualisation. An understanding of the business is also crucial. BI specialists, then, have nothing to fear, and potentially much to gain: data science expands the role of BI rather than making it extinct.

Big data driving the science

As well as the need for improved communication between the data crunchers and the business, big data analysis is the main driver for data science. Much has been written of late about big data, suffice to say that a different skillset is required over and above traditional analytical skills using relational tools. Among these are Hadoop and MapReduce technologies and NoSQL (for “not only SQL”) databases.

With the advent of big data, it is no surprise that some CIOs are looking at whether Hadoop and advanced analytics should replace BI and datawarehousing, forcing the removal or retraining of its BI specialists in the process. However, despite its alluring simplicity, this would likely cause more problems than it solved.

In truth, most “big data”-type analyses will continue to be performed using RDBMS and a standard BI dashboard is a perfectly adequate window on the proceedings. It is only at the upper end of the volume and variety scales, where complex and processor-intensive “what if” queries are to be performed and where near-realtime results are essential that Hadoop and similar systems really become necessary.

Make me a scientist!

“If you have the right people already, train them up”, said Head of the Computer Science Division at the University of Dundee, Professor Mark Whitehorn, at Computing’s recent Big Data Summit. “They have the existing [BI] skillset. Do they have the curiosity? Do they have the mathematics?”

The University of Dundee is the first in the UK to offer a masters degree in Data Science. Dovetailing with the university’s existing course in business intelligence, with modules that cover subjects such as algorithm development, machine learning, technologies for handling big data and the analytical language R, the course is available on a part-time basis and starts in 2013.

Professor Whitehorn recently told Computing that the “science” bit of data science is something of a misnomer. While mathematical knowledge and the use of statistical techniques are important in the stage of formally processing and analysing the data, simple curiosity is an essential part of the mix.

“Do you enjoy pushing buttons just to see what happens?. If you don’t, you’re not going to be a good data scientist,” he said.

A good data scientist, then, is likely to be more of a generalist (albeit one with good mathematical skills) than a specialist, with a wide range of interests, enabling them to communicate across the typical departmental silos that exist in most organisations. Naturally inquisitive, they enjoy finding hidden trends and patterns that others might miss, joining the dots to add to the big picture. As natural polymaths they might feel equally at home in marketing or finance as in IT.