Splunk looks to users to make and share machine learning apps

Developing machine learning tools in Splunk? Share and share alike, urges CEO Doug Merritt

Splunk is aiming to fill its Splunkbase app store with machine learning tools and algorithms developed by users after launching Splunk Enterprise 6.5, which incorporates base machine learning technology from its acquisitions of Caspida and Metafor.

While the updated version of Splunk Enterprise and IT Service Intelligence (ITSI) will offer a number of basic machine learning capabilities, the company is expecting its users to develop many more - and to share their developments among other users - using the Machine Learning Toolkit incorporated with its latest software.

"We can be operated against with many different machine learning formats," Splunk CEO Doug Merritt told Computing.

"To make it easier for our customers, we are including a machine learning toolkit, a framework with Splunk Enterprise 6.5 that allows customers to program algorithms that can operate natively against the Splunk data in 6.5 framework so that they can leverage machine learning for the things they want to do: statistical analysis, outlier detection, predictive analytics, forecasting, the classic machine learning use-cases.

"Our idea with the toolkit is that it is a framework so that the world can provide algorithms and donate algorithms. We stage it with some algorithms as well, but it's much more of an open source [type of] approach. While an algorithm is interesting, it really needs a specificity of the company, the vertical, the function and the use case to bring it to life.

"We can only imagine within Splunk probably one per cent of all the different algorithms that it could be possible to run against big data, so we provide a mechanism for the world to do that," he said.

Merritt went on to compare Splunk's approach to machine learning to the approach taken by cloud CRM company Salesforce.com with Einstein, which is based on a number of disparate acquisitions that Salesforce has made in a bid to get into machine learning.

These acquisitions include PredictionIO, an open source machine learning platform, which Salesforce donated to the open source Apache Software Foundation.

"Salesforce has taken a different approach. It's got this huge body of existing users and applications and they are trying to use machine learning and AI in order to 'surface' the insights that are already trapped within their applications," said Merritt.

"As a tight partner of Salesforce and an active user, I cannot wait to see the benefits because there's so much gold: How many of my reps are using the system? What makes a rep successful that's using the system in a certain way? Are there customers in that body of data that we should pay more attention to, that cost all of us time and attention to surface today?

"Given that Salesforce has all of that data as a software-as-a-service company they should be doing that for us," he said.

Splunk's own acquisitions - of Caspida and Metafor - only fill a couple of particular niche needs in machine learning, Merritt admitted.

"We bought a company called Metafor about a year ago. That was one of the leaders in using machine learning and algorithmic processing around the IT operations field, which is part of the magic that we are driving with ITSI right now.

"We also bought an awesome company called Caspida. That created what we call our user behaviour analysis (UBA) product set... It's a very different framework from the rest of Splunk. It is relying much more on data at rest, HTFS [high throughput file system] clusters, long-running algorithms.

"[This is] because you've got 30-, 60-, 120-day data and you're actually traversing that data trying to find anomalies, which is different from a classic Splunk use-case, where you have got a sea of data coming in and you're dipping in with rapid queries that are more time oriented...

"So we will continue to mix-and-match different architectures and approaches, just like Salesforce and every other vendor is going to need to do," he said.

However, it's simplistic to talk purely of machine learning algorithms as if they could easily be ported from one company to another without any thought for the context in which they will be used, added chief technology officer Snehal Antani.

"When you think about machine learning, you've got two spectrums: One is a bunch of statistical packages that you can run against time-series data. So, things like threshold attention and outlier attention.

"For example, if you have five wind turbines and they're in a cluster, and one of them starts to behave differently, it's fallen out of cohesion, which is a way to understand that there's probably a maintenance problem.

"But that's just the application of statistical techniques. There's nothing domain specific to wind farms to be able to catch that problem. So the machine learning toolkit gives you a set of universally applicable statistical algorithms that you can apply to any set of problems. We expect to see customers use those packages to solve IT problems, IoT problems and so on.

"But if you start to bring in domain expertise - when we acquired Caspida and their team they brought in a tremendous amount of security domain expertise - you can now apply that domain knowledge with the statistical analysis to catch deeper problems," said Antani.

"For instance, if two machines start speaking to one another that have never spoken to each other before, that's probably a compromised credential.... No analyst can look at dashboards and catch that problem because there's too much 'noise' and not enough 'signal'. But if you've got the domain expertise, you'll know that this is the leading indicator of a compromised credential," he said.

Interest in machine learning software, services and tools is expected to balloon over the next 10 years, with analysts Tractica predicting growth in the global market from $643.7m in 2016 to $38.8bn by 2025.