Interview: WANdisco CEO David Richards

Richards tells Computing how firms get sucked into the Amazon cloud, and how he plans to replicate entire data centres over the WAN

"Amazon Elastic Map Reduce runs like a dog. It's pathetically slow, we've done benchmarks. It's because it's metered; they charge by the minute."

So says David Richards, CEO of distributed computing company WANdisco, which is active in the development of Apache Hadoop. Having dismissed the web giant's cloud-based Hadoop offering, Richards went on to explain to Computing how, in his opinion, organisations get sucked into the Amazon cloud, tempted by low initial pricing and rapid deployment. At a time when virtually all large organisations are looking at what big data technologies can do to for them (500 out of the Fortune 500, according to Richards), the pressure is on to get results.

"Here's what really happens," says Richards. "I was speaking to a guy in a bank. The guy organising the trial says 'We're going to do some trials with Hadoop and I need 100 servers with four CPUs in each', and the IT department goes: 'No way, the procurement time for that is at least six months.' So the guy goes to Amazon with his credit card and in five seconds, he can procure a 100-server implementation. Then what happens is he starts building his app. Then, before you know it, the app that was just a trial becomes a production implementation. Suddenly your mission-critical application is running in this completely unsecured environment with no persistence - if the power goes off you lose your data. That's a disaster for companies."

One of WANdisco's products is S3-enabled HDFS (HDFS is the Hadoop Distributed File System), which provides what Richards terms an "off ramp" from Amazon's public cloud, allowing organisations to shift data from the Amazon cloud to their own private in-house clouds.

"S3-HDFS allows users to connect their private Hadoop implementations with Amazon web services," says Richards. "It's proven to be a very popular product. It's a get-out-of-jail card."

Richards claims that companies are not using Amazon - and other multi-tenancy public cloud services - correctly.

"We use Amazon for QA, that's really what it is: a QA environment," he says. "It's not reliable enough for us to even host our web site there. It's just not a mission-critical environment. If you want that, you have to host it yourself. Look at what happened to Netflix [which went down on Christmas Eve thanks to an Amazon data centre failure]. By the way, the biggest cost to Netflix is their Amazon bill. Amazon is a ticking timebomb."

WANdisco's move into the big data arena began in 2010 when it became apparent to Richards that Hadoop was about to become the basis of cloud computing in the same way that Windows is synonymous with the PC and Linux with the server. Building on the firm's background in high-availability distributed computing and software configuration management, WANdisco acquired US big data firm AltoStor for £3.2m in 2012, bringing on board two of the key players in the original development of Hadoop: Dr Konstantin Shvachko and Jagane Sundar.

"In 2010 and 2011 we could see Hadoop was going to win," says Richards. "We started to do a lot of investigation about application technology in this space, and by 2012 it had happened. Fortunately for us we were able to make a clever acquisition to drive us into that space very quickly. We can guarantee 100 per cent uptime with Hadoop and that's what the market likes. We've been working on Apache Subversion since 2005 and it's a similar market."

[Turn to next page]

Interview: WANdisco CEO David Richards

Richards tells Computing how firms get sucked into the Amazon cloud, and how he plans to replicate entire data centres over the WAN

Floating on London's AIM stock market last year, WANdisco, which is based in Sheffield and Silicon Valley, yesterday reported 96 per cent year-on-year growth in subscription bookings for the most recent quarter. Recently it released its own Hadoop distribution, and its high-availability Hadoop technology, Non-Stop NameNode, is also licensed to other Hadoop players. The next challenge, says Richards, is replicating entire data centres.

"That's never been done before," says Richards. "We've started at the NameNode – which guarantees the availability of any Hadoop deployment – and we're working our way down the stack. You can do YARN and all the different components of Hadoop, finally down to the data blocks. For real high availability you would need to replicate the whole thing – in case of electrical storms or floods or whatever. It has never been done over a WAN."

Note: the original version of this article stated that Cloudera was licensing technology from WANdisco, which was incorrect. The article has been amended accordingly