Bots using MIT's neural network can teach themselves how to manipulate objects they've never seen

The system learns just from seeing the objects

Robotic vision is key to taking process automation out of factories and into homes. The technology is already being used on assembly lines, for bots to ‘see' the items that they're working with and perform a specific movement. However, outside of these boundaries it's a lot more limited. That's where new work by MIT comes in.

Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have presented a paper describing a new computer vision system called Dense Object Nets (DON), through which robots can understand objects that they have to interact with.

The DON generates a series of collections of visual data points, arranged as coordinates. The system stitches these together into a larger coordinate set, which the robot uses to ‘see' the object's shape and how it works with the surrounding environment.

According to the researchers, object descriptors take about 20 minutes to learn, and are task-agnostic: that is, they're applicable to both rigid and non-rigid objects, from shoes to soft toys.

PhD student Lucas Manuelli, a lead author on the paper, wrote in a blog post: "Many approaches to manipulation can't identify specific parts of an object across the many orientations that object may encounter. For example, existing algorithms would be unable to grasp a mug by its handle, especially if the mug could be in multiple orientations, like upright, or on its side."

As Manuelli notes, the descriptors remain constant even if the object orientation changes - but also if the object itself is changed. A shoe of a different size, shape, texture or colour would still be recognised as a shoe.

"In factories, robots often need complex part feeders to work reliably," Manuelli said. "But a system like this that can understand objects' orientations could just take a picture and be able to grasp and adjust the object accordingly."

The system uses an RGB-D (depth) sensor, and can train itself. Rather than having to feed the DNN thousands of images of hats, you can simply place a robot with the system in a room with a hat for a while. It will take photos and generate coordinate points to recognise other hats in the future.

Although the system is promising, the technology is still in its early stages. The team will present their findings at the conference on Robot Learning in Zürich next month.