Case Study: The Google SDN WAN

By Jim Wanderer
11 Jan 2013 View Comments
Google datacentre tech coloured pipes

SDN in action

Further reading

Members of our operations team were the ones to appreciate the next set of benefits. One we had not fully anticipated was the way it made it so much easier to implement reliable failover. Using SDN and proven distributed system software we quickly got the system working with many fewer problems than normal in setting up reliable failover.

As well as preserving availability, it made upgrades especially tidy. Normally it would be done using two control cards, upgrading the backup card and then failing over to it while the main card is being upgraded. With SDN, however, you can do the equivalent by failing over to a second upgraded controller, without even touching the actual router. What's more, the OpenFlow protocol made it easy for the application state to be re-constructed accurately.

Just as in the development and test situation, operations also want to trial new software under real conditions but without compromising the working network. With the control plane separate from the network hardware, it is possible to run the new version on a separate controller, exposed to the actual event stream but in a "no operation" mode where no actual changes take place. You can see how it behaves under the impact of real conditions, but without risking the actual data flow.

Something else that would hardly be practicable in a traditional network would be to carve it into two parts running different versions of an application to see what difference an upgrade makes - maybe trialing it on a quarter of the network while leaving the remaining three quarters with the old software - Fig 4.

google-deployment

Network configuration on this scale is a major challenge, and it would be wrong to sell today's SDN as a magic potion to cure every configuration headache - but it does offer significant advantages. For a start it means configuring networks and fabrics rather than hundreds of individual devices. Rather than pushing out a complex configuration over CLI, never a simple task, we build and run a management application to configure the whole network. And, as already explained, SDN made it much easier to pre-test configurations before roll out.

So far, I've run through the benefits that helped the operations team running the network, but what is a whole networking operation really about? It must be about delivering the best possible WAN service for the users, and that would be the ultimate test for our SDN solution.

Even in the early stages of development it was clear that centralised, near optimal traffic engineering was meeting the system's high performance demand while keeping costs down. High availability was assured by rapid response from local controllers, using the greater compute power of fast servers and with centralised traffic engineering then optimising overall performance.

Google's application teams and services have high expectations for the internal technical infrastructure, and the internal WAN is now helping us meet those expectations. Also, controlling ten or so devices as a unit - so that the traffic engineering system no longer needs to be aware of each separate device - gives us far better scalability. As demand changes the system can recalculate and push several new traffic solutions per hour - in the past it could take hours to develop a single new solution.

[Turn to next page]

Reader comments
blog comments powered by Disqus
Newsletters
Is it time to open Windows?

Computing believes that Microsoft will start offering Windows free of charge by 2017. Is this a good thing for the enterprise?

55 %
17 %
6 %
19 %
3 %