John Leonard interviews two enterprise subscription customers of the NoSQL database Couchbase to find out how it has fitted in with their businesses.
Imagine that you are a travel agent. A customer comes in asking about flights from London Heathrow to Rio de Janeiro on 6 June. Solving this problem, you will probably agree, is not too tricky: simply obtain a list of airlines flying out of Heathrow on that day, check departure and arrival times, prices and availability, look at stopovers and other details and advise the customer accordingly. Job done.
Now imagine that a second customer walks in. This guy's plans are much less precise. He is interested in travelling from London to South America sometime in June. Suddenly the task of producing a list of options has become almost infinitely more complex. Increasingly, however, travel agencies are being asked to do just that.
"This is a nasty query," said Dietmar Fauser, VP architecture, quality and governance at Amadeus, a provider of technology solutions for the global travel industry.
"We call this 'inspirational shopping'. It's massively parallelised. From a single query you get thousands of different combinations. There are 100 different airports in South America and you can travel with 25 airlines over various routes and can connect to Miami, Bogota, Madrid..."
And that's just the start of it. Each of these possible connections will have a minimum transfer time, for example, and once the schedules have been sorted out the price of each available set in each aircraft must be ascertained and fed back to the customer.
Such speculative enquiries are a classic big data problem. Start and end points are indistinct and a huge volume of data - internal and external - must be processed to come up with a range of possible solutions in near real time.
Amadeus' flight inventory systems were historically based on Oracle relational database technology with other related systems based on mainframe. Such systems are ill-equipped to take on inspirational shopping queries.
"The queries coming in now are orders of magnitude above what this technology can really offer," Fauser said. "You don't want to hit five million reads per second on an Oracle database. So we started building caches in front of this using MySQL clusters and SSD cards so the cache got persistence. But the growth patterns are really dramatic: 150 to 200 per cent annually. We expect to go up to 30 million hits per second."
To cope with these dramatic growth rates Fauser's team added a further MemcacheD in-memory cache on top of the MySQL cluster, but managing and administering this multi-layered system was becoming increasingly complex and it was clear that an alternative was needed.
Amadeus chose NoSQL database Couchbase initially because of the easy migration path from MemcacheD and for its dynamic load rebalancing feature and persistency. The company is now a Premium Plus subscriber and a contributor to the source code.
"We get in-memory-type performance with full persistence for backup. You don't have problems of cold starts when you lose an in-memory node," Fauser said.
The new setup has allowed significant simplification of the flight inventory system, giving Amadeus ample headroom in which to expand its capacity for handling complex queries of the type given above for its airline, hotel, hire-car and travel agency customers.
"[Couchbase SVP of products] Nahim Yaseen has a theory: innovations must come with a significant level of simplification otherwise there is no need for them. He calls this 'delayering'," Fauser explained.
"We collapsed the MySQL and MemcacheD layers into a single layer of Couchbase. We use Couchbase servers that have ultrafast Fusion IO cards so we get the speed of in-memory and the persistency of very fast SSD cards."
Additional capacity can now be added without having to manually rebalance the data across the additional machines, a task that typically took two weeks to undertake before, and Amadeus has plans to explore Couchbase's cross-data-centre replication capabilities to bring active clusters as close as possible to the major US booking sources.
"We are more US based now and you lose 150 milliseconds [in transmission from Europe]. For a perfect user experience you want to move these very large clusters as close as physically possible to the booking sources such as Google and Kayak," Fauser told Computing.
[Turn to next page for another big data case study]
Sometimes, the power of the mainframe is the most cost effective answer. Computing's Peter Gothard puts Computing's readers' questions on the future of the mainframe to IBM's Z13 expert Steven Dickens.
This Dummies white paper will help you better understand business process management (BPM)