Bridging the data science skills gap
How serious is the data science skills gap now, and what can CIOs do to overcome it?
It's no secret that the data science skills gap is still huge.
The McKinsey report from 2011 which forecast a shortage of 140,000 - 190,000 data scientists in 2018 in the US alone still holds, while Computing Research shows that in the UK there is still a chasm between the big data technologies companies are using or planning to use, and the skills they actually have on board to harness them.
While 26 per cent of companies are using or planning to use cloud-based big data services, only 13 per cent feel they have the skills. Similarly, 25 per cent of companies want to - or are - using NoSQL databases, but only 15 per cent feel they have the necessary expertise the get the best out of them.
In-memory databases and next-generation data warehouses or analytics databases are also at risk of being under-exploited, with only 11 per cent of companies believing they have the talent for the former, and 13 per cent for the latter.
Non-Hadoop big data platforms, meanwhile, are being used, or are under consideration, by 12 per cent of companies, yet only six per cent feel they have the skills to use them.
Despite these findings, vendors seem to disagree with the notion that there is a skills gap at all. Rackspace, for example, recently told Computing that it believed the tools are easier to use than companies might think.
"The demand for 10,000 data scientists [a figure reflecting the current 2014 shortage] may be a little aspirational for some companies who have goals to do this great, big big-data strategy," said Rackspace's head of technical product strategy, Toby Owen.
"But I'd agree that the basic skillsets are there, the tooling is getting better to use, and the toolsets being made are clearer, such as BI tools."
Owen said that big data platforms were becoming easier for non-specialist end users to operate, giving MongoDB as an example. Since 2007, it's certainly leapt in use and popularity, and now rivals some of the big guns.
"But how many people have become experts in it in that time?" he asked.
Chris Harris, technical director at Hadoop developer Hortonworks, even cheekily suggested that the ability for companies to take something like Microsoft Excel and "stick it on top of Hadoop" is another way to sidestep the skills gap.
"The majority of people out there know how to use Excel, right?" he asked.
But who better than the venerable initiator of the Tesco Clubcard to disagree? Dunnhumby was formed in 1989 and was doing big data and data science before the terms existed.
And that company's CTO, Yael Cosset, told Computing back in October 2014 that the data science skills gap is so serious, that dunnhumby is training two different types of data scientist to take it on and beat it.
"I'd say there are two key areas our science teams focus on. One is around the pure, academic - which I think is perhaps the wrong term - but how to structure models and algorithms," said Cosset.
"The scientists are looking at leveraging new science techniques to find answers we haven't answered in the past."
But also, said Cosset, "once you create the best algorithm" there needs to be a place to "execute and apply that science".
While data science skills weren't mentioned explicitly - the focus being more on cyber security - Computing was pleased to discover that the House of Lords Digital Skills Committee is at least now aware of more specific skills deficits across the UK.
"Clearly we have a gap at a very high level - the top geek level, if you like," committee chair Baroness Morgan told Computing earlier this week, giving the impression the Lords' digital skills report will recommend a greater focus on vocational courses.
"The system at the moment isn't set up to deliver the nimbleness, or range of courses," she said.
At the end of the day, perhaps companies shouldn't be afraid to jump into the science and analytics space, as opposed to holding back and worrying about skills. While matters can certainly never be as simple as "sticking Excel on top" of anything, getting your hands dirty can reap benefits.