When researchers in the 1950s were seeking clues that might confirm their suspicions that smoking was a major cause of ill-health, they resorted to examining medical records and made a link that was startling back then, but which will come as no surprise to anyone today: people that smoked had a high risk of contracting lung cancer – even Camel-smoking doctors.
To some, therefore, people’s medical records represent a golden seam of data that ought to be mined for every insight that can be gleaned from them. Such analyses might not just uncover hitherto unknown medical correlations, such as the one between lung cancer and smoking, but may also bring to attention side-effects from drugs more quickly or connections between behaviours in one stage of life and conditions in another.
This is the rationale behind “care.data”, a medical database intended to provide NHS patient record data to independent researchers, third parties and even companies. Of course, says the Health and Social Care Information Centre (HSCIC), the quango responsible for overseeing care.data, any patient information that will be passed on will be anonymised and, in any case, it says, it isn’t Hoovering up every last detail of every summary care record held by general practitioners.
Regardless of the potential benefits, however, for others medical records are private and personal, and should be kept strictly between individuals, and their doctors – and no one else.
Furthermore, say campaigners, who argue that care.data is a dangerous development, the issue of big data and the ability of big data techniques to uncover identities from disparate information in many cases renders the concept of anonymisation redundant.
Nothing to worry about
Eve Roodhouse, care.data programme director at the HSCIC, is keen to re-assure people that there is nothing to worry about. First of all, she says, it is not a politically led initiative: the “okay” for the programme, and how it will be shaped, was one made by clinical professionals, and they have also been involved in shaping how it will work.
“A group called the Independent Advisory Group considers applications for extracted primary care data and it considered this request for extracting data... This request was also considered by representatives of the British Medical Association and the Royal College of General Practitioners,” says Roodhouse.
“The way that this extraction will work is that there are an agreed set of ‘Read codes’ [a coded thesaurus of clinical terms] and the data will be extracted according to that technical specification,” she adds.
The HSCIC is also keen to assert that it isn’t recording each and every summary care record of everyone in the UK. Rather, it will be taking four-monthly extracts of specific types of data, although this data will include NHS numbers, date of birth and postcodes, which, if passed on to third parties, would be anonymised according to a particular formula. Postcodes, says Roodhouse, would be truncated so that only the first three or four digits were included, not the whole postcode.
“The data will be collated and checked to ensure that it’s in a valid form and that there’s no data corruption. Then, when it lands at the processing stage, the potential identifiers will be replaced with pseudonyms. We don’t delete the identifiers, but they will be separated out from the clinical data and held in a separate, secure, locked data table.
“That’s what happens in the processing stage. Before we release data to customers we will add a [further] pseudonym per customer and per purpose... So that pseudonym will be further pseudonomysed before it goes out to the customer,” says Roodhouse.
The typical customers for the data would be pharmaceutical companies, research organisations, other parts of the public sector (including the NHS) but also, eventually perhaps, insurance companies. That prospect has aroused many people’s suspicions, but the first “customers” will be from the public sector.
“In the first instance, NHS England, Public Health England, core commissioning groups and public health teams in local authorities can apply for access to pseudonymised, linked datasets and only specifically for commissioning purposes where it’s going to benefit patient care,” says Roodhouse.
They will also be required to sign a legal contract and conform to “strict” rules that HSCIC will apply to the use of that data.
But there are still fears that people’s identities will be inadequately covered by the various processes that the HSCIC says will protect them.
Indeed, pseudonymising, even twice over, is not the same as anonymising, says Phil Booth, the coordinator at MedConfidential, a campaign set up specifically to fight for confidentiality and consent in health and social care.
“It is essentially ‘de-identifying’ data that is sensitive and quite re-identifiable in certain circumstances,” says Booth, a former IT professional and 1990s dot-com entrepreneur.
“Anonymising data is when one aggregates all of the data and treats it in a number of ways to remove small numbers, or ‘low counts’ and then to statistically ‘perturb’ data by putting random noise into it so as to make it impossible to identify any single individual within that data set. What care.data proposes to do is nothing of the sort,” says Booth.
By eliminating high entry costs for big data analysis, you can convert more raw data into valuable business insight.
A discussion of the "risk perception gap", its implications and how it can be closed