NASA forgot about the egress costs for its 247 petabyte data store on AWS

The space agency is set to increase its storage by 215 petabytes, but failed to factor in the costs to retrieve that data from the cloud

Last year NASA picked AWS to run the Earthdata Cloud: a repository for information from the Earth Science Data and Information System (ESDIS), which collates information from its missions. With storage requirements expected to jump from the current 32 petabytes to almost 250 by 2025, that extra capacity is clearly needed - but the Agency, it appears, forgot about the costs of retrieving the data it feeds in to AWS.

Egress charges are the cost organisations pay to move data from the cloud to another area - say, a local workstation for a scientist to perform analysis. They are typically charged on top of whatever the monthly cloud bill is, meaning that the more data you retrieve, the more you pay.

NASA currently stores its data on-prem, in 12 Distributed Active Archive Centers (DAACs), and expects to transfer all of its data to the cloud in the coming years; the first transfer was planned for Q1 this year.

But where is all this new data expected to come from? 215 petabytes, to be precise? The answer is in the Agency's 15 upcoming missions, expected to produce more than 100 terabytes of information every day. They include the NASA-ISRO Synthetic Aperture Radar (NISAR) and the Surface Water and Ocean Topography (SWOT) satellites, which will be the first to upload their data directly to the Earthdata Cloud.

Having all that data in the cloud, and not geographically dispersed like it is now, will be a huge boon for NASA's researchers - if the Agency can afford it.

An audit report from early March this year, from the office of the Inspector General of NASA, concluded that the Earth Observing System Data and Information System (EOSDIS), which makes the information from ESDIS available, hadn't properly modelled the effect of egress charges:

‘Specifically, the Agency faces the possibility of substantial cost increases for data egress (i.e., when end users download data from a network to an external location) from the cloud. Currently, when end users access and egress data through a DAAC there is no additional cost to NASA other than maintaining the current infrastructure. However, when end users download data from Earthdata Cloud, the Agency, not the user, will be charged every time data is egressed. Ultimately, ESDIS will be responsible for both cloud costs, including egress charges, and the costs to operate the 12 DAACS.'

The report damningly continued, ‘In addition, ESDIS has not yet determined which data sets will transition to Earthdata Cloud [(emphasis ours)] nor has it developed cost models based on operational experience and metrics for usage and egress. As a result, current cost projections may be lower than what will actually be necessary to cover future expenses and cloud adoption may become more expensive and difficult to manage.

‘Collectively, this presents potential risks that scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons.'

The report also found that the Evolution, Enhancement, and Efficiency (E&E) panel selected to review the DAACs didn't attempt to identify potential cost savings (after the Mission Support Council removed the need to do so); didn't follow the National Institute of Standards and Technology's (NIST's) data integrity standards; and lacked independence, with six of the 12 panel members working on ESDIS.

The report makes three recommendations:

  1. Once NISAR and SWOT are operational and providing sufficient data, complete an independent analysis to determine the long-term financial sustainability of supporting the cloud migration and operation while also maintaining the current DAAC footprint;
  2. Incorporate in appropriate Agency guidance language specifying coordination with ESDIS and OCIO early in a mission's life cycle during data management plan development; and
  3. Ensure all applicable information types are considered during DAAC categorisation, that appropriate premises are used when determining impact levels, and that the appropriate categorisation procedures are standardised.

How an organisation that can send people into space somehow missed the fact that egress costs exist is somewhat baffling, until you remember how large projects like this actually operate. Just talk to anyone who's worked on Crossrail.