Every firm that needs to store data, which in essence means just about every company on the planet, has the same problem. That problem is defined by the CAP theorem.
CAP theorem states that it's impossible for a distributed computing system to provide all three of the following guarantees, instead it must pick two and hope for the best. These are:
Consistency: There's one version of the truth. Wherever you are in the world, the same query will always provide the same answer.
Availability: Every query will always get a response.
Partition tolerance: The network can operate despite a connectivity failing between parts of a system.
Sam Schillace, SVP engineering at cloud document storage and collaboration firm Box, explains that this latter term can also be defined as latency.
"Partition tolerance is a tricky term, it's really about temporary partition tolerance, which really means latency. So under CAP theorem you pick two out of three: you can have data that's always consistent, always available, or fast.
"And you really have to have partition tolerance, you can't take forever to come to the right answer."
Having said that, Schillace admits that Amazon has done just that with its Glacier offering. This is a storage product that offers consistency and availability at the expense of a time guarantee. You'll get your data, but you'll have to wait for it.
By one definition, Glacier means "cold storage". By another less complimentary reading of the term, it describes the pace of data retrieval.
"With Glacier, Amazon are optimising in the direction of consistency. But it isn't fully subject to the CAP theorem because it's not truly distributed - it's a single node making the choice of low cost over latency. If you don't need your data quickly, just eventually, then this is a good solution."
He explains that the challenge for Box is that it can't choose eventual consistency - being in the world of real-time collaboration, it has to be fast.
"Feature by feature we choose between consistency and availability, we can't choose to ignore speed. When a database write command comes in for a feature - if you favour consistency then the write has to complete synchronously."
This means that any new document, or any form of change to an existing document needs to be replicated across the entire network, and every data centre across the globe that houses that information.
The problem is, as CAP theorem dictates, that activity of writing to databases across the globe takes time, so that document won't be available everywhere.
"That's expensive if you're far away, and the more you write the more expensive it gets. If you take the other choice - availability - then it's an asynchronous write. You write locally, maybe just to a nearby data centre, but it doesn't immediately propagate throughout the system. You have to wait for eventual consistency, but the user gets speed and availability."
[Turn to next page]