There's intense curiosity about Google's inter-datacenter SDN [Software Defined Networking] deployment using OpenFlow. That's hardly surprising given that OpenFlow is so recent and has, on the one hand, been unfairly dismissed by some as an academic exercise yet to bear fruit, while on the other we have here an example of not only a large-scale deployment by a giant among Cloud service providers, but also one that has already clocked up two years of operational experience.
I can sum up that experience by saying OpenFlow SDN has worked really well for us, that Google couldn't have achieved the results it has without SDN. We could have used other approaches, but there's no way that the results would have been as effective. Finally, as a result of this success. Google is now committed to further SDN deployments
So, to satisfy some of that curiosity, and to answer those who might think "as a member of the ONF [Open Networking Foundation] board, Jim would say that", this article gives some more detail, and spells out the challenges and the benefits that have convinced us that SDN really is the future of networking.
Google is such a universally familiar service offering - with Google+, Gmail, Google Maps, YouTube as well as the best known web search engine - that you might not realize that it has not one but two main backbones - the Internet facing network for all that user traffic, but also an internal WAN linking several large datacenters in Europe, North America and the APAC region - see Fig 1.
The two backbones have very different requirements and traffic characteristics. In particular the internal WAN was not constrained by the severe SLAs of a public-facing service, and that allowed our development team more freedom to explore innovative approaches. With a dozen key datacenters linked across three continents it presented an exciting opportunity to put the then-nascent open SDN concept to the test in early 2011.
The problems we wanted to address were all too familiar to those managing large networks:
• Big networks don't behave predictably enough;
• Failure response and performance is suboptimal;
• Difficulties in configuring and operating large networks;
• Dependency on manual, error-prone operations;
• Not starting from scratch - need to connect to existing networks
When the project began, OpenFlow-enabled hardware scalable to the network's traffic levels was not yet available, so Google built its own network switch from merchant silicon and open source routing stacks with OpenFlow support.
Our aim was to optimize the WAN routing for high performance and network utilization, while being able to monitor and control network behavior from a central point. A vendor-agnostic solution was needed because we already had a massive installation of equipment from nearly every major vendor. At each site we had multiple switch chasses allowing scalability to multi-terabit bandwidth as well as providing fault tolerance.
[Turn to next page]