MIT's machine learning model can make cancer treatments less toxic

The system can determine the smallest doses that are still effective at treating tumours

Researchers at MIT have developed a machine learning technique that can lower the toxic treatments that patients are given for glioblastoma, the most aggressive form of brain cancer.

The prognosis for patients with a glioblastoma tumour, which grows in the brain or spinal cord, is usually no more than five years. They must be treated with a combination of chemo- and radiotherapy.

Doctors administer the treatments in the maximum safe doses, but even these can have harmful side effects like hair loss, nausea and fatigue.

In a new paper, to be presented next week at the 2018 Machine Learning for Healthcare conference at Stanford University, the researchers describe a model that could determine a way to administer drugs more safely.

The MIT model examines the treatment regimens already in use and adjusts the dosage until it finds ‘an optimal treatment plan, with the lowest possible potency and frequency of doses that should still reduce tumour sizes to a degree comparable to that of traditional regimens'.

In a simulated trial of 50 patients, the model managed to maintain the tumour-shrinking potential of treatment while lowering potency to a quarter or half of nearly all the doses. In some cases it lowered the regularity of treatments to twice a year instead of once a month.

The researchers used a reinforcement learning technique, which is a method in which the model learns to favour behaviour that leads to a desired outcome through a system of rewards and penalties.

In this case, the reward system took the form of assigning each outcome a positive or negative mark, with their size weighted on factors like chance of success. If the model chose to ‘cheat' by simply giving patients the maximum number and potency of doses, it was marked down - forcing it to choose fewer, smaller treatments.

"If all we want to do is reduce the mean tumor diameter, and let it take whatever actions it wants, it will administer drugs irresponsibly," said Pratik Shah, a principal investigator. "Instead, we said, ‘We need to reduce the harmful actions it takes to get to that outcome.'"

While traditional RL models only work towards a single outcome (keeping an autonomous car on a road, winning a game), the MIT system led to one that weighs the potential negative outcomes against the positive results.