Last month's IT Leaders Forum, entitled Learning from the Leaders in Big Data, centred on an expert panel discussion that touched on a range of topics including how best to tackle the proof of concept (PoC) stage of a big data project.
Dave Brown, head of informatics infrastructure at Genomics England, kicked off the session by outlining some of the challenges around metadata.
"Our project is to create 100,000 raw human genomes of NHS patients suffering from cancer," Brown said.
"The aim of that is to pave the way for progress in the National Health Service for genetic medicine, but also to look at how we can start to build a technology platform that can allow the sequencing at high volume.
"The challenge with building a metadata engine that allows you to search with a single metadata query over both file and object store is going to be key in keeping all the data up to date," he said.
"One thing we see is you have the data increasing in size, but it's also about the speed of the access. So one of the goals of our project is asking how you give clinically actionable results back to a physician when you sequence any living thing," continued Brown.
He said that with cancer cells you only have a five-to-10 day window in which to sample and extract the DNA required, send it to the DNA sequencing facility, and then compare it with the other sequences Genomics England has on file.
"And that's the huge challenge - there's going to be a constant need to improve the speed of processing across that storage," said Brown.
In Brown's field, a six-month PoC period is not unheard of when planning this kind of endeavour. This was backed up by Janis Landry-Lane, software-defined life sciences industry lead at IBM, which has been working with Genomics England on several of its cancer-based projects.
"If a PoC is run for more than six months, you're really out of time - you have to define something you can accomplish within a calendar year," she said.
"If we're approaching a PoC, we actually interview the stakeholders to find out what they want to get from the PoC, what defines success for them," she continued.
"So I think that's key, and you have to define who the stakeholders are, their criteria for success. Some people will say, 'Okay, let's just build it and prove that we can move on, and those are relatively short [periods], but from my experience, for example in our recent work with Alberta Children's Hospital Research Institute researching the treatment of autistic children, building that transformational database - and allowing them to have access to that transformational data, and improving patient outcomes - defined the success of the PoC."
In other words, it's not just about time.
But for Alpesh Doshi, managing partner at investment banking and principal investment firm Redcliffe Capital, such a long, albeit thorough, PoC is far from ideal.
"A PoC of six months, for me, that's way too long," said Doshi.
"The shorter the better," he insisted.
"You won't get to a good outcome with a result to show a value. You'd get to that point in ideally three to four weeks. The time to value is one of the biggest things for people, as I see it."
In Doshi's experience, "the stakeholders and the sponsors - and the board - could get frustrated and say, ‘This isn't working - you've spent six months doing this'.
"And what's happened is, there's this illusion that people want to see the value, but don't know what the value is, and it takes too long."
"We had a very large Fortune 500 company, and we proved they could do R&D very fast, for things they'd not been able to do in years, they did in six weeks. So [they realised] if they did this 100 times, they'd save $10m."
For Doshi, being agile is the key to "the whole model of how to get value from data".
"It's got to be an agile process, and producing real outcomes very quickly. Some things take time - such as medical research - but there's always got to be a very simple question, and if the answer takes six months, it's too big a question - you've got to break it down."
But while "breaking things down" might be a relatively simple process in the financial sector, in the field of medical research things are perhaps less susceptible to simplification. To illustrate the scale of the challenges big data projects face, Landry-Lane outlined the work IBM has done developing the Cancer Genome Atlas (TCGA).
The TCGA is the result of research between the National Cancer Institute and the National Human Genome Research Institute, and has so far mapped genomic changes in 33 different types of cancer.
But right now, even with this supplementary information, six months is still realistic.
"We [still] have 200 cancer patients for whom the standard of care has not worked, so taking the Cancer Genome Atlas, and doing all that comparison, all that reprocessing, in order to make a monumental change is taking us six months.
But we think we're on the right track," she said.
Jonathan Gill, IT director of Watchfinder, an online retailer of second-hand luxury watches based in Maidstone, said that in stark contrast to IBM and Genomics England, whose work is all about "saving lives", all he does is "purely bottom-line, saving money". He said for him big data is all about "asking a question".
"Now, sometimes that question is completely wrong," he said, so the most important thing is to go back and get that question right - as quickly as possible - as "everyone seems to want everything now".
Doshi agreed, again stressing the need for agility.
"Companies are agile - Uber, for example. Those guys don't have a six-month process to find a business case - they just do it. Now I'm not saying everybody has to do this, but I'm saying those organisations move so fast, they make those changes quickly."
All the panellists stressed the importance of getting to the board and key stakeholders early, to explain your business case. And if you're trying to cure cancer, you may find them more willing to wait months for results to filter through.
If the project is slightly less high stake, perhaps Doshi's advice wouldn't go amiss. The so-called ‘Uberisation' of business - while bandied around like marketing fluff all-too often - can indeed be a potent method for shaking up an analytics process with an agile approach.
If data - and data collection - are left standing stagnant for too long, more time and money can be wasted on building big data projects that have only become visibly ineffective when it's already too late to make rapid changes, and then new projects may need to be started from scratch, rather than running new angles on existing, functional ideas.
Even greater reliance on data could be the greatest thing to emerge from the pandemic, say IT leaders
A panel of senior IT leaders hosted by Computing argue that the pandemic has broadened enterprise interest in data, but warn that care must be taken to present the right data and tools
Finding peace in data: an interview with Alice Genevois, senior data science manager at Lloyds Banking Group
Genevois wanted to be a marketer - then she discovered data science
'If history has taught me anything, it’s that open ultimately becomes the winner,' says Hillary Ashton
SageMaker Clarify can discover potential bias during data preparation and after training, says AWS