Deep learning is inferior to scripting for teaching robots

But deep learning pulls ahead for complex tasks

An artificial intelligence start-up called Kindred has published a study into the use of machine learning in robotics in real-world situations, rather than the simulated environments that companies have been using.

Presented at the Conference on Robot Learning in Zürich, ‘Benchmarking Reinforcement Learning Algorithms on Real-World Robots' states:

‘Model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks… To carry forward these successes to real-world applications, it is crucial to withhold utilising the unique advantages of simulations that do not transfer to the real world, and experiment directly with physical robots.'

Reinforcement learning (RL) is a technique to train AI models by ‘rewarding' them (assigning a one instead of a zero, for example) when they take the correct action. Researchers have used it to teach systems how to drive a car, how to play games and even in medicine.

However, use of RL with real, physical robots is being delayed by a lack of benchmark tasks and supporting source code. The report authors (A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma and James Bergstra) aimed to change that by taking three commercially-available robots and using off-the-shelf implementations of four RL algorithms to teach them to move.

The authors used a robotic arm (Universal Robotics' UR5); an actuator used to control various robots (Robotis' MX-64AT Dynamixel); and a hobbyist robot based on the Roomba vacuum cleaner (iRobot Create2). The algorithms were TRPO, PPO, DDPG and Soft-Q.

They conclude that deep learning lags behind the traditional way of training robots with scripts ‘by a large margin in some tasks, where such solutions were well established or easy to script'. However, RL was ‘more competitive' in complex tasks (for instance, docking to a charging station).

The report also shows that the RL algorithms need careful tuning to their variables before being useful for any task - but after tuning, those same variables were applicable across a range of tasks.

‘Although hyper-parameter optimisation is likely necessary for best performance on a new task, a good configuration [of hyper-parameters] based on one task can still provide a good baseline performance for another,' the authors wrote.

The study was based on more than 450 experiments that were performed over 950 hours of robot usage. The report concludes:

‘This study strongly indicates the viability of reinforcement learning research extensively based on real-world experiments, which is essential to understand the difficulties of learning with physical robots and mitigate them to achieve fast and reliable learning performance in dynamic environments. The benchmark tasks and the supporting source code enable the necessary steps for such understanding and easy adoption of physical robots in reinforcement learning research.'