Photobox Group is among the world's largest image-hosting platforms - and may well be the largest of those based in Europe. Adding about a petabyte (PB) of storage every year, the need for speed, agility and scalability has recently meant dropping the data centre and embarking on an ambitious, large-scale move to the AWS cloud.
Until this January, all of Photobox's data was stored on racks in co-located data centres in Amsterdam and Paris - and the company's no-delete promise meant that storage was becoming unwieldy.
"If you think about the size and scale of the data we were dealing with - nine petabytes - we were adding about a petabyte a year: anything up to about 5-6 million photos a day," said Group CTO Richard Orme. "On peak days that really goes up, we could be adding a million photos an hour to our storage - and we don't delete anything.
"Part of our proposition is if that if you upload an image to us and build a product, you can reorder and reorder and reorder that product over time."
More and more time was necessary just to maintain the storage layer, which limited the effort that could be devoted to more than just ‘keeping the lights on'. Innovation began to suffer, until the company decided that enough was enough.
"We hit a really crucial decision point about two years ago, where we decided that actually it was probably no longer in our interest to keep running those data centres.
"Fundamentally, three things have changed [from when we first set up out data centres]: we're good at data centre provision, but AWS is definitely better - as are Microsoft and Google; [secondly] pricing, and in particular bulk storage pricing across all the cloud providers, has got to a place where economically it became very viable for us to do it; and thirdly...we'd slowed down a bit too much on the digital and physical product innovation side. It was important for us to reinvest the spend that we were putting into the data centres into kickstarting the innovation back in those areas."
In the end the Group chose to go with AWS, partly due to the existing relationship between the firms: all of the Photobox brands' websites run on AWS, which helps them scale up at peak times. With as much as 60 per cent of annual revenue being made in November and December, scalability and reliability is critical.
We were pretty sure that if we were able to match the speeds that AWS felt they could ingress at, we'd actually end up melting a lot of the disks in the data centres
Moving nine petabytes anywhere is no small task. As well as mapping the project out from end to end with a team brought together for exactly that purpose, Orme had to consider physical logistics: even a fibre optic pipe would have struggled to work with so much information, and the physical disks were their own challenge.
"We evaluated with our network providers whether or not we would be able to move data out of the data centre at a fast enough rate; we evaluated with our hardware and then our storage providers how fast we could read data off the disks. There were some interesting conversations: we were pretty sure that if we were able to match the speeds that AWS felt they could ingress at, we'd actually end up melting a lot of the disks in the data centres."
Understandably, Orme and the team started to look at other solutions. Back in 2016, panellists at a Computing event agreed that nothing beats a van or truck for transporting petabytes of information - and nothing has changed in the intervening time. Enter AWS's Snowball Edge: physical storage devices that can each hold 100TB of data.
"[That] sounds like a lot, but we actually ended up using 90 of them over time!" said Orme. "We were working on two separate data centres and at each of the two data centres we had three sets of four Snowballs. We were either filling the Snowballs, or they were in transit or they were with AWS and they were egressing data. In the end we had 24 Snowballs on rotation."
The company also considered the AWS Snowmobile, which can carry up to 100PB of data at once, but the waiting time - plus the challenge of having to move information from two locations, not just one - meant that the Snowballs worked out to be the most efficient route.
Kubernetes, multi-cloud and open source technologies are all key ingredients, says head of compute infrastructure Andrey Rybka
The use of containers and open source virtualisation without relying on VMware could disrupt the HCI market
Keep it simple, advises David Keigher
We speak to Canonical, Red Hat and SUSE about the place of Linux in a cloud-based future - and what the CentOS EOL foretells
There’s a lot of sensitive data contained in Office documents - so it makes sense to take care of it