Of Raspberry Pi and CSI: how portable DNA analysis tools are helping police forensics, agriculture and medicine

What used to require a specialist lab can now be done in a box, researchers tell Computing

Genetic sequencing. The the very words conjure up images of supercomputers, cleanrooms filled with expensive laboratory equipment operated by specialists with PhDs, and the 10 years and £2bn it took to sequence the three billion base pairs of the human genome. In many people's minds it is also associated with big data, both because of the computing power needed to sequence the genome, and because we're starting to hear a lot about individualised medical treatment, which is very much a big data narrative.

But there's another part of genomics story too, one that is in many ways the exact opposite of big machines and big data. That is the sampling and analysis of genetic material in the field. Here small is beautiful. Instead of supercomputing clusters, processing is accomplished by a cheap low-powered device such as the Raspberry Pi. And rather than sifting through vast reams of data searching for patterns, the software algorithms on board are tuned to look for very specific matches. By stripping out extraneous functionality and bringing in innovative sampling and sequencing technologies, analysing complex metagenomic samples (genetic material recovered directly from the environment) can now be done in near real-time using instruments that can be tucked under one arm, and before long, perhaps, carried in a pocket.

Sequencing the air

"Our ultimate vision is a genetic sequencer with a simple collection and sample preparation system, plus low-power computing for analysis in a self-contained box that can be taken out to the field," says Dr Richard Leggett, project leader in the data infrastructure and algorithms group at research institute The Genome Analysis Centre (TGAC).

Leggett is one of a number of researchers at TGAC who are working on "sequencing the air" - continuously checking for the presence of certain airborne micro-organisms by analysing their DNA.

By distributing the boxes he describes across the fields of East Anglia, for example, the airborne presence of dangerous concentrations of wheat pathogens could be detected and farmers warned to take evasive action. It may be possible to identify the source of the outbreak, too.

"If you have a high enough density of these things, and if you also know the wind direction you can identify the source," explains Dr Matt Clark, plant and microbial genomics group leader at TGAC.

Aside from agriculture, there are other potential uses for field-based genetic sequencing too, including as an early warning system against a bio-terrorism attack - the system can be set up to detect the anthrax bacterium - or to check air conditioning systems for Legionnaires' disease.

The key to all these use cases is being able to sequence the DNA, perform the analysis and communicate results quickly, eliminating the need to transport samples back to the lab, where results can typically take hours or days to come in.

So what's in the box?

A key component is a novel USB-powered DNA sequencer called the MinION (pictured), which was developed by Oxford Nanopore and is capable of sequencing individual molecules. This takes in samples from the air (or liquids) and breaks down long strands of DNA into short sections and sequences them to create "reads". Then comes the job of distinguishing between the organisms in the sample.

"If you're sequencing the biological content of the air there are potentially thousands of different organisms in there," Clark says, illustrating why analysing metagenomic samples is no simple task.

How the system separates out genetic material of interest is beyond the scope of this article, but the MinION allows reads to be produced in real-time, which is key to its usefulness as a monitoring device.

"The nanopore sequencer offers benefits over the other previous sequencing technologies in that it has a streaming ability - you get a sequence as it sequences - whereas with other technologies you'd start them running and then you'd wait a day or maybe 10 days," Clark says.

Reads from the sequencer are streamed to an on-board computer, where they are divided by software into small sequences called k-mers. The scientists call on the Rasberry Pi for this duty as it is compact and inexpensive, with low power demands. Leggett, who has been experimenting with the Pi for bioinformatics for some time, says that it runs a tweaked version of TGAC's Kontaminant software used to screen, filter and analyse the streamed output and can do so "very quickly, a lot quicker than the sequencer can generate reads", although more RAM would be nice, he adds.

The Pi holds k-mer libraries containing unique identifiers for organisms of interest, be they anthrax, MRSA or wheat pathogens. Kontaminant matches the results of processed sequencer reads with the contents of these libraries so that pathogens can be identified without having to refer to an external database.

An alternative approach would be to do the processing in the cloud, but this would involve transmitting large volumes of raw data over potentially unreliable wireless networks. By doing the analysis on board the device only the results of the analyses need to be transmitted, Leggett explains, and mobile GSM networks are best suited for this purpose.

"[The boxes] might be in the African bush or in a field in Norfolk, but even if you are guaranteed 4G that's still a lot of data you'd be sending into the cloud so it's better to analyse it locally," he says.

The scientists are also working on an alarm system whereby individuals are sent a text message as soon as an organism has been identified.

"That doesn't require bandwidth, it just sends a code of what the organism is. That's a kind of thing that you can do even with really poor reception," says Clark.

So far the system has proved itself in the lab environment and field trials are now under way.

The CSI effect

"I was talking to the writer PD James when this technology was first arriving and she said, 'Well, if it only takes 10 minutes for DNA results to arrive you're going to make crime novels very short'," says Dr Paul Debenham, director of innovation and development at forensics service provider LGC.

Debenham was instrumental in the development of a portable DNA testing kit for use by police at crime scenes to reduce the time it takes to get results. Called ParaDNA, it is currently being tested by a number of police forces.

DNA testing has been used by the police to identify suspects and victims of crime for more than 20 years and the technology, methodology and legal framework surrounding it are well established. However, the police face a number of difficulties in collecting usable samples at the scene of a crime and inefficiencies arise due to the high probability that the material is contaminated beyond use.

ParaDNA (pictured) enables relatively untrained officers to test whether a swabbed sample contains human DNA or not and also perform some preliminary analyses, such as determining the person's gender and possibly other factors too.

"For a long time it has been very much laboratory-based technology, requiring very sophisticated methods to extract the genetic material, very sophisticated high-power instruments and quite complex analytical methods to get the DNA code out," Debenham says, after which the results have to be sent to the national DNA database for matching, which can take several days.

There has long been a frustration on the part of the police about this waiting time - the "CSI effect" Debenham calls it - and the ParaDNA analytics-in-a-box device is designed to go some way to addressing that, providing rapid results in the field. Rather than miniaturising the hardware, as with the MinION, innovations in the chemistry have allowed DNA analysis to be performed in a machine the size of a briefcase.

"We decided to keep with the conventional process but bypass half the steps. We found a new way of directly sampling the cells from a blood spot or saliva or semen straight into the test tube and then stick it into a mini-machine to run and give the result," he said. "We can now do the chemistry side of DNA analysis very rapidly so there is no need to build on our instrumentation."

Once in the test tubes, the samples are treated with fluorescent dyes that bond to certain genes that are characteristic of human beings. The mixture is then heated and cooled within the machine and the dyes, if they are indeed bound to the DNA, will fluoresce in a characteristic pattern at different temperatures. The fluorescence pattern is then picked up and analysed by on-board software. Within 75 minutes it is possible to determine if human DNA is present, their gender and the likelihood of generating a usable DNA profile from that sample for more in-depth testing in the lab.

Since a crime scene sample might be a swab of a "smudged finger print or a few skin cells scratched off a door knob", a lot of police time can potentially be saved by screening the samples on site as well as being able to eliminate potential suspects or victims from enquiries.

However, the "old technology" of full testing in a lab is so well established that, while innovations like ParaDNA might help speed up the process, the crime novel is probably safe for now.

Horses for courses

Both groups of scientists are also researching other areas in which portable genetic testing may be applied, many of which are in medicine.

"The obvious application is human diagnostics," says Debenham. "If you give a sample you have to get it sent off to a lab and you come back two weeks later to get the result. It doesn't need to be that way. Just as a policeman does not have to be skilled [to use ParaDNA], equally nurses can be quickly trained to take a sample from you, put it in a little box and get a result in less than an hour."

Diagnosis of fast-acting and dangerous diseases such as meningitis would be such a use case. Another is the prescreening of volunteers for testing medicines for side effects.

In a hospital the early diagnosis of infection by antibiotics-resistant bacteria, such as MRSA, could mean that effective treatment begins earlier.

"A doctor may prescribe you with one of their backup antibiotics, one they hardly ever use, right from the very beginning, not just two days later when you're getting much worse. They will know exactly which one to give you straight away," says Clark, adding that micropore sequencers are already used in the production of antibiotics.

Then there is the use of genetic analysis for dose control. Warfarin is a widely prescribed blood anticoagulant, but as well as factors such as the patient's age and weight, genetics come into play in how the drug is metabolised.

"It's a very powerful drug," says Debenham. "Too much you can bleed to death, not enough then the person is at risk of clotting. It's a fine therapeutic range. Three genetic variations influence the level of dose you should have."

Away from medicine, LGC was contracted to test supermarket meals in the recent horsemeat scandal ("stick in a probe and you can quickly tell if there is beef, or pork or horse present") as well as looking for traces of peanuts in food production lines. In fact, anywhere where quick investigations into specific genes are needed.

"Within the constraints of efficacy and cost-benefit there are any number of possible uses," Debenham says.