The world of big data is typified by youthful start-ups. Guavus, by comparison, is a positive greybeard.
Originally launched in 2006, the company has, until recently, been focused on the telecoms industry. By crunching large volumes of machine-generated data to flag up faults or anomalies in the network and combining this root cause analysis with customer call records, Guavus's streaming analytics technology allows the telecoms provider to identify those customers who might be affected by a fault so that they can be advised of the issues – hopefully before they have noticed that there is a problem.
Like many specialist big data software vendors, Guavus is a supporter of the open source model. The vast majority of its Reflex Operational Intelligence platform is now open source. However, as chief technologist Ben Parker told Computing, this was not always the case.
"Guavus has been around so long that there weren't a lot of open source resources around at the time we started, so in the early distributions we had a lot of wrappers and proprietary integration points and extensions," says Parker.
While some aspects of the company's software remain proprietary, Parker cites the benefits of putting the bulk of the development out to the community as reduced cost and complexity, and enabling Guavus more time and resources to focus on the applications that sit on top of the platform.
"Open source really gives us the ability to focus on the core side of the business – the data science and solving real business problems. It makes things a lot easier to deal with because as you integrate more complex systems into the platform, it tends to get 'brittle', which makes putting applications on top of it more complex, requiring more testing, and so on.
"As the capabilities of the open source community evolve, it tends to reduce the degree of brittleness so that reduces the cost of testing and the time it takes to develop applications," he says.
Recently the company decided to throw its weight behind AMPLab, a collaborative research effort at the University of California, Berkeley where open source big-data technologies, such as Spark (an in-memory alternative to MapReduce) and Shark (a datawarehouse that sits on Spark) are being developed.
Spark and Shark are making real-time processing of big data streams a more practical reality, bringing analytics to a whole new set of use cases. It is the ability to process big data streams on the fly that Parker says differentiates Guavus from others operating in a similar arena, such as Splunk.
"Splunk takes machine data and indexes it, then you can apply search strings to trawl through the data. What we do is apply algorithms in the streams in near-real time," Parker said, explaining how those algorithms enable the data streams to be pre-analysed without ever needing to be stored.
"We are able to compute first. Our collectors are quite intelligent, they discard the data that's not important so the interesting data can be sent up to the core for analysis. That means these massive data streams can be economically collected and analysed so we don't have to create this huge multitude of storage.
"We tend to focus on the high-value data, analyse that and store the output; then the business can make a decision on how long they want to store the raw components of that data, the un-analysed version."
Bringing together real-time analytics on streaming data and MapReduce-type batch operations on data at rest opens the door to many potential applications, Parker said.
"Real time is becoming easier to achieve, but being able to do real time and needing to do it are two very different things. What's interesting is that you can bring these things together and effectively analyse the same data in different ways at different times," Parker explained.
"You get that kind of integrated end-to-end tool set then you can build really interesting applications on top of it. That's where the real meat and potatoes are."
Guavus is now expanding from its telecoms-sector base to focus on anomaly analysis in complex data-centre operations, such as those of cloud service providers, and also use cases in the banking sector.
"We're now moving to the banking industry – you know, fraud, gaining a 360° view of the customer. The bank can identify the next best offer whenever a customer takes a product or service. This is an interesting extension of some of the things we've done in the telecoms space but applying it to a different business," Parker said.