Machine learning can replace data scientists, says CTO

Struggling to recruit a data scientist? Don't bother, just get yourself a machine learning platform, says Pearson CTO and COO

Data scientists are no longer needed to mine data for business value, and organisations should instead look to machine learning platforms.

That's the view of Albert Hitchcock, CTO and COO of publishing and education group Pearson.

"The days of writing to a Hadoop database and have a data scientist write algorithms are rapidly disappearing," said Hitchcock, speaking to Computing this week. "We just put the data into a machine learning platform, and it spots the patterns."

This could be good news for many firms which are reportedly struggling to recruit data scientists, with many being snapped up by giants such as Google and Amazon.

Pearson is in the middle of a large digital transformation project, which includes discarding the majority of the group's on-premise hardware and moving wholesale into the cloud. Part of the project is an ambitious drive to produce adaptive content on the web, so users consume individually tailored information.

This relies on machine learning to search for and display the right content for each user.

"We're starting to metatag all the content, so it can be chopped up into component parts, and then be reassembled on the fly [for each user]," explained Hitchcock.

He likened it to the techniques used by firms such as Google in selecting which ad to display online, based on its understanding of each user.

"But our use case it much richer. We're changing the nature of the content, so we need deeper analytics, and we use Pearson's unique IP [intellectual property] to modify that learning content on the fly," he said.

"I think we're seeing some really interesting things in machine learning. The platforms are teaching themselves how to come up with suggestions. It's exciting and frightening."

When asked which machine learning platform the group would select, he said it would likely choose several.

"We will probably in the future end up plugged into multiple machine learning environments, and we'll just choose whichever result is the best [in each case]. So we'll consume various machine learning services, and choose the most accurate answer based on real-time feedback," he said.