DevOps: Engage engineers with tangible - and achievable - targets

IT Leaders talk about what's been holding up DevOps, and what's been driving it, at Computing's latest Dining Club

Give engineers tangible, but achievable targets that they can aim for in order to help drive adoption and support for DevOps. That was just one piece of advice that emerged from Computing's latest Dining Club, Human-Centric DevOps Transformation, hosted at the Shangri-la on the 34^th floor of the Shard in London, and sponsored by Automation Logic.

"You've got to keep your engineers engaged," one IT leader at a big-name dot-com advised, not just with targets, but "with live metrics so they can see the impact of what they do. Engineers want to be solving problems and the information will demonstrate either that they are working, or that they've got problems."

There are a very small number of metrics that really matter and most people aren't measuring those at all

Hence, his company has plenty of dashboards offering engineers a real-time picture of how well their systems are doing - how quickly web pages are being served, for example, and other metrics that directly reflect the user experience. If any of these diverge from an acceptable range, then engineers will be diverted to find out what the problem is before the service gets any worse.

Most organisations, Kris Saxton, partner and co-founder of Automation Logic, pointed out, barely measure anything.

"Those that do are generally measuring what is easy to measure and not what is actually related to the outcome they want. In most cases, there are a very small number of metrics that really matter and most people aren't measuring those at all."

On top of that, there's nothing quite as incentivising to ensure that problems rarely crop up as having to rise at 3am in order to re-boot a server when problems do arise. "You build it and you own it, even if you have to get out of bed at three in the morning to do it," he said.

"Trust your engineers, but make them responsible for the outcome," added the CISO at a major industrial company.

Or to put it another way, suggested Saxton, "trust but validate".

He continued: "DevOps, like Agile, is often accused of being devoid of governance. There's a huge responsibility on leadership to ensure that teams are moving towards the goal, to support and facilitate where that's not happening (rather than 'assure' and whip), and also to make decisions to 'kill' or 'pivot' when the data suggests your hypothesis is wrong."

You build it and you own it, even if you have to get out of bed at three in the morning to do it

While digital transformation and the need for organisations to provide services to customers online was driving most DevOps shifts, for some it was simply down to the decrepitude of existing systems and the urgent need to update them.

The IT leader at a major media company described a key management application written in COBOL, running for decades on an ICL mainframe. However, while the software reliably ran and ran, engineers were finding it increasingly challenging to source spare parts for the mainframe when components failed.

The company was finally persuaded to invest in a wholesale shift when engineers couldn't even find spare parts second-hand on eBay. The new system, though, offers greater flexibility, better integration with new systems, and faster, more accurate reports.

Very often, companies think they're doing DevOps, but find that they're missing the essence of it when something goes catastrophically wrong. That's according to the CIO at one company that adopted DevOps to support a rapid shift towards an 80 per cent online-ordering business model.

Teams need to be aware of their inter-dependencies with other services

The company only found out about the ‘cracks' between different teams following what is called internally ‘the St Valentine's Day' massacre, when systems went down for two hours during one of its busiest days of the year. That instigated a re-examination of how the organisation ‘did' DevOps in order to learn from failure.

Indeed, learning from failure was another issue that caused some disagreement around the table. On the one hand, logs of ‘lessons learned' from failure are considered an essential part of the process. But the chief technology officer of a well-known financial services brand noted one case where an internal team had suffered the same failure over and over - with the lessons-learned report cut and pasted more than once from previous failures.

Clearly, lessons weren't being learnt.

The chief architect at another financial institution noted: "You've got to have people willing to fail and to fail often - but they've also got to learn from those failures."

In the case of the St Valentine's Day massacre, the post-mortem identified how the various elements of the IT organisation needed to come together better to learn how they had become inter-dependent and, therefore, the necessity to work together better.

And teams, and how they work together, are vitally important for DevOps.

All of the problems we solve are bigger than any one person, especially in enterprise settings

"As we move to more loosely-coupled architectures, teams need to be aware of their inter-dependencies with other services and to be able to continue to function in the event that those services or not available or degraded," said Saxton.

The IT leader of the media company claimed that he had taken the best part of a year to put together his team of around 20 engineers, but is now reaping the benefits. "Front loading that effort meant that the team now works well together and therefore sticks together," he said, adding that staff turnover was, consequently, exceptionally low.

Even internally, he was asked questions about the slow pace of team-building in that first year, rejecting candidates despite the strong recommendation of HR in favour of people who looked like 'solid' candidates.

Saxton added: "All of the problems we solve are bigger than any one person, especially in enterprise settings. It's more important that an engineer can collaborate than try and put out the world's fires on their own."

And that, hopefully, means no alerts going off at 3am.