Can big data analytics increase cancer survival rates?

Four ways in which big data analytics can help in the battle to enable more people survive the scourge of cancer

Big data and data science have a promising opportunity to influence an increase in cancer survival rates. The technologies, processes and principles can be applied across both medical research and clinical practice and act as a platform to accelerate adoption of data-driven healthcare that enables improved clinical decisions, patient outcomes, and alleviates the economic burden of healthcare.

Over the past 20 years the volume of data in health has grown exponentially, yet predominantly it remains underutilised, locked into siloed systems as isolated, structured and unstructured, clinical records. More recently, new forms of data have appeared that add insight, such as complex imaging and genotype data, or data streams from medical devices, wearables and even social media.

If we are to enable big data analytics to increase cancer survival rates it is critical that medicine, computing and analytics work together to deploy modern data science techniques that will solve this integrated care data challenge. Big data analytics can support the goal of increasing cancer survival rates in four ways:

1. Accelerating genomic annotation, allowing more cancers to be sequenced across a broader range of patients, producing better knowledge on clinically actionable variants.
2. Raising the quality of interpretation of variant data by ensuring clinical data are used in genomic pipelines which offers the potential to accelerate the drug discovery process.
3. Facilitating informed, multi-agency decision making by creating an integrated and patient-specific view of the disease and treatment programme, as opposed to an organisational view.
4. Delivering and monitoring the performance of modelling algorithms that predict the length of time a patient is likely to survive based on the population's longitudinal data.

The principles of advanced analytics and data-driven services that underpin the above examples are:

• The adoption of a "data lake" approach to the management of data to support the agile linkage and analysis aligned to specific questions.
• The ability to deploy run time data modelling.
• Embracing open source for intellectual property including software and analytical tools to avoid vendor lock-in and excessive cost.
• Cloud consumption via a PaaS model as not only does it reduce cost but also improves interoperability.
• Using a "community & collaboration" or "knowledge commons" approach through active sharing of data set definitions, open APIs and algorithms.

These principles directly address the vision of integrated care, precision medicine, and value based outcomes where data needs to follow the patient and where that data is linked to provide personalised services. They also address some of the key challenges to this vision, including:

• fragmentation of clinical data across organisations;
• incompatibility of data standards and lack of system interoperability;
• deceleration caused by weak processes for diffusion of innovation.

Privacy and information governance, which are regularly seen as unmovable forces hindering the earliest stages of linking and analysing medical data for both clinical and research use, must be foremost in technology design.

At Aridhia we have been working with one of the Academic Health Science Centres in NHS England to establish a data lake for renal cancer. This data is aggregated from various silos, linked and transformed into a standard dataset model, to support a multi-disciplinary team review of the patient population and the cancer pathway.

This is a form of descriptive analytics, where the insights describe what is happening in the system. As this service and quality of data matures, it will be possible to introduce more advanced analytics, such as survival modelling.

In its simplest form, such a model could use a small set of features including tumour grade, age at diagnosis, gender, family history and other clinical markers. In a more advanced form, it could also refer to genomic and treatment information, resulting in more accurate survival prediction and treatment recommendations, enabling the clinician to make more personalised treatment decisions, maximising the patient's probability of survival.

Big data and data science technologies and techniques are critical if we are to extract valuable insights from healthcare data which can be translated into data products as apps or integrated with existing systems for use by the clinical team, service providers, and potentially patients. Existing data assets offer huge potential to enable service providers, funders and regulators to improve efficiency and bridge the gap between constrained resources and escalating demand for services, while improving the quality of patient care through integrated and personalised services.

And yes, I believe big data analytics will increase cancer survival rates.

Andrew Judson is director of data science at analytics firm Aridhia Informatics