Clouds won't stop these trains - quite the opposite, says Trainline CTO Mark Holt

Moving to AWS has almost doubled developer resources at Trainline

Trainline describes itself as Europe's leading rail ticket vendor. Spun off from Virgin Group in 1997, the website began selling rail tickets online two years later, and by most accounts has gone from strength to strength in the time since - but wasn't all smooth sailing. The company had to deal with 16 years' worth of legacy tech (think Windows Server 2003 and Biztalk) in its move to the Amazon public cloud in 2015 - but CTO Mark Holt, the man behind the move, doesn't regret a thing.

Holt is known as an AWS evangelist, often speaking about the speed, agility and robustness that Trainline gained in its transformation. As well as cost savings, he tells us that developer time spent on infrastructure issues has fallen from between 30 and 50 per cent to almost zero, freeing up a huge amount of resources. With less time spent on "just keeping the lights on," there is more to use on innovative projects like BusyBot and Trainline's voice app:

"When we were looking at cloud migration, we had three things that we wanted: one was that we would save £1.2 million a year capex and our opex would remain flat... The second was, in the old data centre environment, our developers would spend thirty to fifty per cent of their time on infrastructure issues… Can you think of many initiatives that would buy you thirty per cent additional productivity from a team? That's huge."

As part of the move to the cloud, Trainline pushed its services out of physical datacentres, and Holt couldn't be happier about that: "We used to have something like eighty or ninety different server roles in production, so different ways of configuring each box; we now have two: one called Linux and one called Windows, and everything's the same. But the data stuff, the ability to take on Lambdas and stuff like that, has been properly awesome."

He's referring to Trainline's third objective in the transition: taking advantage of Amazon's innovation curve. Holt freely acknowledges that Trainline, despite employing hundreds of staff, cannot match the web giant's budget: "They just innovate better and faster than anybody else," he says. "I think the greatest example of that is [former Netflix cloud architect] Adrian Cockcroft. Someone once said to him, ‘Netflix and Amazon are competitors; why is it you're using AWS?' and he said, ‘Why would I deny myself access to the best cloud platform out there, just because it's run by a competitor?'"

Trainline makes heavy use of Lambda, Athena, Kinesis and S3, freeing up even more time and resources. "[We use] a whole bunch of Amazon technology that, in the olden days, would have taken us twelve months to just get working from an infrastructure perspective," says Holt. "[Now] we click three buttons and up it comes and it's magic. That has been a really big win."

Agile isn't fragile

The cloud has often been hailed (or blamed) as the enabler of DevOps, and Trainline is no exception, adopting an A/B testing method to massively increase its software delivery speed. When Holt joined in April 2014, the company was pushing monolithic releases out with six week gaps: eight releases each year. Development cycles overlapped, so the actual time to market for each release was three months; "Now, in our best week, we've done over two hundred and twenty releases in a single week."

A/B testing means that changes are rolled out in a staggered fashion, and developers can revert them quickly. "It used to be that you dropped a release and everyone went, ‘Oh my god, I hope it works!' and we'd all cross our fingers. There'd always be some bits that didn't quite work and you'd have two or three days of trying to figure it out. Now you have [a lot more] functionality."

Largely thanks to this testing method, Trainline has found that its systems have actually become more robust by becoming more agile, avoiding common concerns about increased fragility. Unscheduled downtime has fallen by about 60 per cent, with a 1,000x rise in software releases.

Trainline's software developers have found themselves with a lot more time on their hands since the cloud move, but they're keeping busy. In an upcoming article, we'll look at the rollout of Trainline's BusyBot tool and its growing use of big data. As Holt says, "We're really good at doing change - which is a much better place to be than being afraid of it."