Don't play Russian roulette with test data, warns CA Technologies
DevOps and "requirement-based" approach is the way forward, argues delivery evangelist
An uncoordinated approach between development and operations when approaching data analytics can lead to a game of Russian roulette with data, leading to missed opportunities and even privacy legislation issues, CA Technologies has warned.
Speaking at Computing's DevOps Summit today, CA's continuous delivery evangelist Simon Poulton championed what he called a "requirement-based" DevOps-utilising approach to data analytics.
"In most organisations I've worked with, the approach to test data is just going off to production, and copying production," said Poulton.
"If [the dataset] is over 40TB people think maybe not even copy it at all, and all this to me is just playing Russian roulette with test data."
Poulton explained how, when doing analysis on data for a UK high street bank, CA found vast swathes of data that didn't even need to find its way into pre-production environments to be intregrated into new projects.
"Ninety-six per cent of investment data covered trade in the same currency. Imagine they had 100,000 records - 96,000 of those records tested the same thing," said Poulton.
"Why bring those over to a pre-production environment?"
Compliance-wise, Poulton reminded delegates that "all of that production data has PII [personally identifiable information] stuff" - meaning GDPR compliance could be an issue.
Finally, new app functionality in new projects would try to draw on existing data, but if there isn't any in production which can be used, the migration is again a waste of time and resources, he argued.
"Testers also tend to find in data the data they want to use specifically, and use it again and again, but in terms of depth, that's not a good result," he added.
The alternative, said Poulton, is to use discoverability tools to define logical relationships, as he recommended a requirement-based approach:
"Identify the data specifically needed for this user's story, and use the data only that matches that requirement," he said.
"So what we're really doing is ensuring through and through quality in that process".