Observability and applied intelligence are the key to better software, says New Relic

CTO Gregory Ouillon on speeding up incident detection and resolution

In DevOps discussions, observability has become a key theme, and for good reason, says Gregory Ouillon, CTO EMEA at New Relic. After all, you can't manage what you can't measure, and you can't measure what you can't see.

Moreover, the complexity of modern applications, the frequency of updates, the ephemeral and dispersed nature of the architecture on which they run and the broadening of the remit of DevOps engineers conspire to make incident management a much more demanding task.

"Keeping systems up and running the high performance is much more difficult and the old monitoring techniques are becoming irrelevant," said Ouillon during a presentation at Computing's Deskflix: DevOps event last week.

Enter observability.

"Observability is a new set of technologies and platforms that allow you to understand much more holistically how your application, and infrastructure behave," he explained. "It's much more proactive and predictive in its nature of telling you an incident might occur, and you have an opportunity to solve it before it actually impacts your customers."

Rather than teams and individuals each using their own tools to monitor selected metrics from the application and its environment, which inevitably leads to gaps and errors, the ideas is to pull all the telemetry into one stream which is visible to all. But that doesn't solve the problem of information overload. Events and alerts still need to be triaged, and this is where automation comes in.

"The idea is to bring all that telemetry in real time, so that you can correlate all the data. You can curate the data in a way that will surface the most relevant issues."

AIOps, which stands for Artificial Intelligence for operations - or Ouillon's preferred term Assisted Intelligence for DevOps since human intervention can still be a significant part of the process - is the use of machine learning to analyse all the telemetry data that's generated across the ecosystem in order to be able to predict possible problems, determine root causes and then drive automation to fix them. The difference from the old way of monitoring is that it doesn't require predefined metrics, instead adapting its response according to feedback. It can direct alerts to the most appropriate team and also respond proactively if required.

Summing up, Ouillon said: "We are dealing with architectures and systems which are more complex, more fragmented and volatile.

"So, the combination of real-time observability and applied intelligence really brings to teams what they need to deliver superior uptime and great performance, taking into account the end user experience as well. They can also apply the same technique to deploy software much faster."