The implications of containerisation for bet365

Having been an early adopter of Kubernetes, bet365 is now exploring the implications of expanding the scope and scale of the platform, writes James Nightingale

When considering the aims of bet365's Platform Team, it really boils down to finding the answer to one simple question: How do we ensure that the teams behind bet365's products and services can perform their roles unhindered?

The answer, of course, is not quite so simple and has become less so as the company has grown in scale, scope and complexity. Bet365's platform and the applications that sit on it are a cornucopia of integrated technologies, both traditional and open source.

At the heart of the issue is the challenge of reducing friction between the demand for product hosting environments and the speed of their provision. As the arbiters of the infrastructure, we are very aware of the impact that every change can have on the integrity of the platform, so understandably we're protective of it.

By separating the platform from the applications that sit on it, we ensure our product teams can deploy their product on demand

However, as the product sets have become more complex, the challenge is that many teams now have a role in configuring these environments. This has blurred what was once a clear dividing line of responsibility. The impact is that different teams can make changes without understanding the full impact of them.

It's for this very reason that Kubernetes found its way onto our radar. It was its promise of immutability and the potential for a more predictable platform that made it attractive. If our hunch was correct, we could dial down the impact of change by essentially, removing change from the equation.

We have come a long way since that first gut feeling. Having achieved buy-in from our development teams and the business, we've established our first production cluster and laid down the basic operational requirements that support it.

Having now lived with it for some time, we are now exploring the implications and underlying principles for expanding the scale and scope of the platform.

We are currently working on the idea of using namespaces to help implement service groups of like-minded applications that are sorted by manageability, deployment and traffic routing

Keeping it simple

The challenge with Kubernetes is that there's so much you can do with the technology, but we have explored a big chunk of it. However, having got a little lost in the art of the possible at the beginning of our journey, we've since moved to a mantra of ‘keep it simple'. An approach that is now paying dividends.

A key concept we've developed following this mantra is that of availability zones. These zones are limited to the key components needed to get a cluster up and running. There are no superfluous components, including no persistent storage.

While made up of individual elements, each zone is treated as a collective. Were we to manage each component independently, we'd end up with an entire matrix of differences between one part of the platform and another. This would defeat the object of our move to Kubernetes, re-introducing the same management complexities we were dealing with previously.

Instead, we have clusters that have everything they need to run and are subject to very little change. They are immutable.

Delegating control to development

In turn, this immutability delivers perhaps the most important benefit. By separating the platform from the applications that sit on it, we ensure our product teams can deploy their product on demand, while maintaining the security and stability of the platform.

We can now work together to set certain limits, such as the amount of compute given to each workload but, essentially, product-specific configuration is held in the deployment. To perform an update, the container is simply replaced, and the underlying platform remains the same.

Kubernetes is not a panacea to every challenge that we face

Ultimately, we are leveraging a common language that both teams can understand. It enables a more transparent approach to product deployment, which means all necessary teams have a clear understanding of the potential impact of every change and the capacity needed.

In addition, we are writing our own version of Ingress to ensure we have visibility and control of traffic flows. This will put our product teams in greater control of their release schedule and be an enabler for important actions, like scheduling releases for different markets.

Broadening use cases

Now that we believe we have the fundamentals sorted, we have started to look at further scaling the technology across other parts of our system.

We are currently working on the idea of using namespaces to help implement service groups of like-minded applications that are sorted by manageability, deployment and traffic routing.

Built around Kubernetes main stasis, we're exploring how to use Ingress to map to a certain set of applications. This will ensure the platform is flexible enough to manage constant change, while tempering the overall complexity.

Historically we've managed platforms based on the infrastructure, which has its own terminology to describe it. The problem with maintaining this approach is it's Infrastructure centric and can't be easily interpreted by product teams. Whatever approach we choose must be product orientated.

Monitoring

At first, we implemented Prometheus for monitoring. While it worked well for the Infrastructure team, it was too opinionated for the product teams. It meant having to write their applications in a specific manner to leverage the monitoring capability, which created unnecessary effort.

When you put immutability and simplicity at the heart of your strategy, the range of use cases aren't as broad as other types of traditional architecture

We realised that just because we could do something, that didn't mean that we should do it. By utilising our existing monitoring and logging systems we can make sure we have a single view of the data on which Infrastructure and Development can share and collaborate.

We have taken the same approach to storage. Initially, we looked at the option of engineering storage facilities into the platform solution, but ultimately decided not to. We identified very early on that there are other, better places in the network to offer storage.

We are now taking a close look at the use cases that Kubernetes is appropriate for. When you put immutability and simplicity at the heart of your strategy, the range of use cases aren't as broad as other types of traditional architecture.

Ultimately, like all technology, Kubernetes is not a panacea to every challenge that we face. However, it has the potential to become a new platform for a range of use cases that creates the necessary alignment between Infrastructure and Product Development.

As a company, we are always innovating, which means we must stay agile and ensure that Infrastructure plays its role in facilitating the smooth adoption of that change. That is where Kubernetes really shines.

James Nightingale is principal infrastructure architect at Hillside Technology, bet365's technology arm.

More from bet365 on Computing: