DevOps at Channel 4: staying ahead of the On-Demand curve

It's all go all the time for All 4's tech team, explains senior solutions architect Dan Jackson

All 4 is UK broadcaster Channel 4's video-on-demand service. It's a system that's in constant flux as the team moves to meet new expectations and is currently based on a mostly microservices-style architecture with back-end applications running almost entirely on AWS.

The service has to accommodate a number of different devices and formats, so there are native clients for Android and iOS apps, the All 4 website and ‘big screen' platforms including tvOS, Samsung, Amazon Fire, Freeview, Roku, Xbox and PlayStation. The diversity of client software and the complexity of the back-end - which also includes content delivery networks (CDNs), message queues and serverless functionality - means the IT team has its hands full ensuring that All 4 is delivering a consistent and reliable service.

We are always keen to adopt tools and processes that will help us be more efficient and give us greater visibility

"All of these applications are owned and operated by Channel 4 and our partners. Each one of those is a product that has its own dedicated team managing development," explained senior solutions architect Dan Jackson.

"Working at this scale, and with such a large number of moving parts, requires significant operational oversight," he went on. "Our team has limited resources, so we are always keen to adopt tools and processes that will help us be more efficient and give us greater visibility and insight into the health of our systems."

All about the UX

Media companies live or die on the quality of the user experience (UX), so it's vital to be able to spot issues quickly and nip them in the bud, but the piecemeal nature of the tech stack made this task arduous, with different services logging to different repositories, some to local disk, others to a home-built log aggregation system. This added management overhead and made it awkward to maintain consistent oversight of all the client applications.

"It was difficult to build proactive processes to alert us around issues that would affect the experience viewers were having, and then act on those production issues," Jackson said.

Channel 4's tech team launched a major initiative to improve the reliability and quality of the All 4 playback experience and bring the service in line with business and technical KPIs. This included replacing the disparate logging systems with a centralised solution so that operational data from all the different platforms could be analysed in one place. They selected Sumo Logic for this purpose because of that company's focus on customer experience and because "compliance and security ticked the right boxes with regards to our technical and compliance requirements around data security."

Once the data is ingested into the new system, the DevOps team can gain a better view of the types and volumes of errors and their impact on user experiences of individuals and on and the business more broadly.

Asked where he looks for trustworthy information among the forest of opinion and marketing when selecting potential new tools and techniques, Jackson said he has some favourite Twitter accounts.

"There are a few individuals out there doing an amazing job of sharing valuable insights, and from whom I've learned a great deal," he said. "Charity Majors (@mipsytipsy) is a must-follow account if you're into the theory and practice of monitoring and observability, and Yan Cui (@theburningmonk) is another who is a fantastic resource for advice on how to deploy and monitor serverless components."

Operational ownership

The new monitoring system was not just a technical project. The wider initiative to improve quality also required changes in the team structure to "instil a greater sense of operational ownership within our development teams".

Over the next 18 months, the team will be working on containerising some of their remaining monolithic applications, breaking them into more manageable services. They are also looking at new data models and ways to rationalise the mechanism by which content metadata is delivered from systems of record through to viewers, and increased personalisation, which Jackson says "will come with its own set of new challenges," for the back-end crew, particularly around caching.

We haven't found a compelling reason to migrate All 4 to a multi-cloud architecture, but we never say never!

Asked whether being based on AWS wasn't an issue (Amazon is no slouch in pushing streaming media services of its own, after all), Jackson said that so far the benefits of being based on that cloud platform had outweighed the competitive risks.

"As an organisation, Channel 4 takes a multi-cloud approach where we select what we feel to be the right provider for the requirements at hand. As yet we haven't found a compelling reason to migrate All 4 to a multi-cloud architecture, but we never say never!"