Software development 2.0: Achieving sustainability in an environment of constant change

bet365's Sports Development team faced a number of severe hurdles as they looked to support and expand a rapidly growing website

In part 1 of a two-part series Alan Reed, head of Sports Development at Hillside Technology, bet365's technology business, outlines the considerable challenges his team faced in supporting systems that have grown in scale and complexity

The human body is a masterpiece of engineering. Written into the machinery of every person is the specific instructions needed to construct us from embryo to adult. From the outset, the blueprint is set and the code present to complete it.

In many ways, software development is similar. It starts with an embryonic system that grows organically. Over time it evolves, taking on a new form as new product and functionality is added.

Where the biology differs, however, is that in computer science we rarely have the complete blueprint from the get-go. We don't know where we'll end up. So, our processes and the technology we use must adapt with the system. We must always be innovating what we do and how we do it to keep up with demand.

But what happens when you hit a dead end? When the scale and complexity of the system is such that development and deployment come to a grinding halt? A couple of years ago, this was the prospect we faced. Development and deployment were in danger of coming to a standstill and we had to find a solution.

Early attempts to find a solution were hampered by complexity. What was needed was a new lens through which to view our world. One that we couldn't construct in isolation.

What resulted was a new model for collaboration that would touch every aspect of the software development lifecycle (SDLC) and lead to a sustainable approach that would bring the entire development function along with us.

The Genesis of the problem

In the beginning the website was the product and it was available to anyone who could access bet365.com.

The driving mandate for the Sports Development unit was to make the product as feature rich as we possibly could, as performant as we possibly could and maintain it to the highest standard we possibly could. And do it quickly.

The code didn't have to be perfect, but it did have to work perfectly. For us, the focus was the number of opportunities we could offer the end user and the timeliness of the information we could put at their disposal. We then looked to enrich the product so that the experience remained new and fresh.

Over time the system has grown in scale and complexity, each new product or change triggering a ripple effect

Over time the system has grown in scale and complexity, each new product or change triggering a ripple effect, which must be closely managed to ensure everything continues to perform as it should.

Putting the right checks and balances in place is essential. However, it also takes time and time is always of the essence. As the business has evolved, so have our processes and procedures. Each change challenging us to find ways to increase the speed of delivery and ensure the product quality is maintained.

The impact of geography

Now add the complexity of operating in different regulatory environments, where every change you make must be catalogued, observed and understood before it goes live. There's little consistency as regulation changes from one region to the next. So, your product starts to fragment to suit each country's stakeholders.

This requires a distributor model that ensures each partner only has the information they need to assess the various changes taking place in that region and that region alone. The result is multiple instances of your product in different regions that are subject to different variances of change and where the roll out of new services are rarely synchronised.

Now, it's not just your codebase that's distributed but your delivery mechanism needs to become distributed as well. All the while, you've still got all the existing complexity of performance, functionality, new features and speed of delivery.

Existing methodologies for writing code for a centrally based system are no longer sustainable. We do hundreds of releases a week and each must conform to the global landscape.

We could see the gridlock coming and it was clear that our development stance had to change

The result was our development speedometer started to slow as more considerations were thrown into the mix and we saw released code backing up as it failed to make it through test.

We could see the gridlock coming and it was clear that our development stance had to change.

Battling with complexity

We started by looking at the problem from the perspective of Sports Development and identified three options we could pursue.

Option 1 was to do nothing an continue as normal. Had we taken this stance we would now have a single codebase split by function. Each function would have a tag so that the developer would know which function was relevant for each region. Despite the attraction of carrying on regardless, this tagging approach was quickly rejected because of the complexity of updating the system. A team working on the functionality for each country would need to make changes in multiple places across the monolith. It introduced too much risk. The margin for error was too high.

From what we could see, two further options were open to us. The problem was that both introduced complexity that was going to be almost impossible to maintain over the long-term.

Option 2 was a branching codebase that had core features plus region specific features. The difficulty here, is knowing where the boundary is between the core and the regional codebase.

It's not an exact science. The code is fluid. There's no obvious cut line. Each region consumed parts of the core and shared some of the same discrete code that was found in the regional variations. Follow this path and not only do you risk duplicating efforts, you also introduce complexity to something that is already extremely complex. It looks good on paper but in practice you must be extremely disciplined, which is difficult to do when you have a large team involved. The amount of paperwork and working knowledge needed to maintain the code was prohibitive. You'd never be able to introduce anyone new onto the team. It would take months to onboard them.

Option 3 was to write the distribution into the code itself. The idea was to write discrete functions for each region. Theoretically, this is the most straight-forward solution but like option 1, it was difficult to make work in practice. We would have been in a world of different implementations and versions of the same code around the world. The cost and complexity of maintaining so many instances were mind boggling. At some point we would run out of one resource or another, whether it was money, hardware or talent.

Back to the drawing board

We realised that we would need to start from square one. We would need to begin with the big picture. For the first time ever, we had to ask, why can't we release code and we knew there wasn't a single answer because it wasn't just us in the frame. We'd have to go wider and look at the problem from the perspective of everyone involved in the process from Development to Release.

If we moved to a centralised model, there was always the threat that we'd trip up over the usual traps of age: a large codebase; uncertainty around what all of it did; and dormant code that could get in the way.

We needed a framework of considerations that was transparent to the developer

In analysing the issues, we realised that our distribution problem was a similar issue to our speed of delivery issue. The common denominator were the people at keyboards writing the code and the system knowledge they'd needed to do their job.

We needed a framework of considerations that was transparent to the developer, so they could make the changes needed easily. But it was critical they didn't become so obsessed by the considerations, that they got in the way of the task at hand.

The answer wasn't to be found in technology, although it would certainly be an enabler. It would be found in culture.

Part 2 of this article will be published next week.