Research: Skilling up for big data

By John Leonard
16 Apr 2014 View Comments
big-data-skills

In 10 years’ time will big data analytics have replaced traditional relational business intelligence systems?

That was one of the questions that Computing asked UK IT decision-makers in the quantitative survey that formed part of our recent big data research programme. The answer to that question was a pretty decisive “No”, with 68 per cent of respondents saying there will always be room for both technologies, while four per cent dismissed big data altogether, saying it’s a flash in the pan.

Further reading

Only eight per cent agreed that big data will eventually replace traditional relational business intelligence systems, with the remaining 20 per cent arguing that big data technologies will always be too complex for most organisations. 

Big data technologies are rarely seen as rip-and-replace solutions. Rather they work alongside existing solutions, finding niches where their ability to process voluminous or diverse data sets makes them a better fit than more traditional enterprise software.

But while big data technologies might help to bring a wider range of data under the analytics umbrella, this is of little use unless the skills and the will are there to make use of the information.

“BI is still hugely underutilised in its more traditional form,” commented one respondent, and this undoubtedly is true.

Many enterprises struggle to get their traditional BI systems to report on the fabled “single version of the truth” as it is, hidebound as they are by incomplete coverage by their underlying datawarehouse systems, data sitting unanalysed in silos, and different premises or departments operating their own discrete systems.

It is also the case that many people don’t know what to do with the information revealed by their BI systems, with reports languishing unread in desk drawers, or ignored when the truth becomes inconvenient or impolitic. Adding new data sources where such a culture exists is unlikely to produce much growth in insight.

In short, unless an organisation is committed to being “data-led”, with people, skills and roadmap in place, there’s not much that new technology can do to help it.

The skills issue

Many organisations are very interested in becoming more “data-led”, however, and traditional BI skills will be very useful in the brave new big data world.

As we saw in part one, creating a multi-disciplinary team with responsibility for big data initiatives across the organisation will often yield the best results.

A pivotal role in this team is the data scientist, devising the questions to ask of the data and analysing the results for meaning and providing a focus for the endeavours.

As a relatively new discipline requiring a mixture of statistics, programming, business and presentation skills, data scientists are notoriously hard to recruit, and vacancies will often be filled internally by training and promoting existing data analysts. But there are other skills gaps too (figure 1).

bd-skills-fig-1

More advanced users (those using big data technologies operationally or in a trial environment) are likely to be using or trialling in-memory databases. However, especially with in-memory databases, in-house skills are lacking.

There is less of a skills gap with Hadoop among this advanced group. Hadoop is emerging as a de facto standard platform for storing and processing large and varied data sets and no doubt many will have been focusing on strengthening their skills in this area; likewise with NoSQL databases, most of which can work in conjunction with Hadoop, or are distributed as part of a Hadoop-based package.

Among less advanced users (those who have not yet performed serious trials), Hadoop skills were much thinner on the ground: only four per cent said they could count on these skills among their in-house staff. In other areas too, skills lagged some distance behind ambitions. Twenty-four per cent would use NoSQL, but only 14 per cent had the skills.

The ability to integrate diverse data formats and to ensure data quality is fundamental to all big data endeavours if the scenario of “rubbish in at scale, rubbish out at scale” is to be avoided. Here at least it seems that most organisations are confident in the ability of their staff to deliver.

For those lacking in-house skills or resources, cloud-based big-data-as-a-service offerings are an obvious option. Twenty-nine per cent of more advanced users said they have adopted cloud-based services somewhere in their set-up. Hadoop clusters, NoSQL databases and analytics tools are available on a per-use basis, which can simplify operations and provide cost-effective proof-of-concept.

“One of the traditional barriers to a reasonably small NGO like us would be cost,” a CIO at a charity told Computing during a focus group discussion.

“Petabytes worth of storage just wasn’t going to happen, whereas now on Amazon you spin up tonnes and tonnes of storage for as many datasets as you want for peanuts. Plus it’s that ability to do it: it’s using the classic agile methodology, just doing something really quickly, see if it works, if it doesn’t you’ve not lost anything.”

However, for some workloads cloud is unlikely ever to be a serious option.

“We are very limited in terms of using cloud services. We need very secure and identified locations,” the CIO at a legal firm said.

Opening up to open source

Like cloud, open source was seen as a valuable tool to lowering one of the main barriers to entry – cost (figure 2). Many big data start-ups are based on offering open-source technology, and while it is certainly not cost free over the long term, many felt that the fact that such software is free to download and run was a significant advantage, especially for smaller organisations.

bd-skills-fig-2

[Click on image to enlarge]

“There is no way that we could go out and buy a big IBM or big Oracle solution, that’s just out of our range,” said the head of IT at an online business services company.

It was generally felt by companies of all sizes that the open-source model has matured significantly over the past few years, building on the success of Linux, Apache web server and other widely adopted software, making it much more acceptable to budget holders. Developers, too, are increasingly likely to have cut their teeth on open-source tools.

“There is much more acceptance among analysts to look at open-source tools, also our skill sets are coming from university graduates and they are more comfortable using open-source tools. This is important from an innovation perspective as you get them [running] very quickly to test and learn. You can then migrate them to more robust enterprise technologies if needed,” a big data lead at a marketing firm explained.

Open source or proprietary, big data technologies are seen as increasing the size of the whole cake rather than gobbling up individual slices such as traditional relational BI.

@_JohnLeonard

Reader comments
blog comments powered by Disqus
Newsletters
Is it time to open Windows?

Computing believes that Microsoft will start offering Windows free of charge by 2017. Is this a good thing for the enterprise?

57 %
16 %
6 %
18 %
3 %