Archival data is still impacting modern astronomy, and we need new ways to process it

Automation and increasing compute power are needed to handle the growing pool of astronomical data

Eileen Meyer, an assistant professor of physics at the University of Maryland, believes that big data is transforming our knowledge of space.

Writing for Smithsonian, Meyer describes how thousands of black holes near the centre of our galaxy were recently discovered by digging through old, archived data with modern techniques. She expects findings like this to become more common as the amount of data collected continues to increase.

In the middle of the 20th century, many astronomers worked alone or in small teams; knowledge sharing was far from common. On top of that, the equipment they used mainly measured the visual spectrum.

At the time, data was mostly stored on photographic plates or published catalogues, not the digital repositories used today. In the modern age there are observatories that cover the entire electromagnetic spectrum, and share their findings with many institutions all over the world.

Observatory data is publicly available shortly after being recorded (remember Nasa's anticipated problems with releasing photos of astronaut Mark Watney's body in The Martian?), and that is making astronomy democratic, Meyer argues.

The space industry generates so much data that the term ‘big' doesn't really do it justice; ‘astronomical' would be a more appropriate adjective. Each generation of observatories is ‘at least' 10 times as sensitive as the one before. The Hubble Telescope transmits about 20GB of raw data per week, while the more modern Atacama Large Millimeter Array in Chile could potentially generate 2TB of data every day.

Then there is the Square Kilometre Array, due to be completed in 2020. This massive array will be the most sensitive telescope in the world, capable of generating more data than the entire internet in its first year of operation.

New ways of handling these massive data sets will be required, likely using automation to process images and facilities that can handle hundreds of terabytes each day. But it's not only the systems; scientists will need to be able to be capable of working with, understanding and processing data sets to gain insights, and make decisions based on what they learn.

Meyer ends by suggesting that in the future, research conducted using archived data - possibly recorded before the scientist was even born - will become the norm.