The experiment took place entirely in simulation, as so much robot training does these days . In a digital environment, a robot undergoes a supercharged form of trial and error called reinforcement learning. The environment simulates variables like friction, and a robotic arm tries to grasp an object over and over using different grips. If it stumbles on a good grip, the system tallies that as a victory—if it does something stupid, the system counts that as a defeat. Over many attempts, the robot learns what constitutes a robust grasp.
But while tech giants like Google and Amazon and Facebook have pushed major advances in the development of AI in purely digital contexts—getting computers to recognize objects in images, for example, by having humans label those objects first—robots have remained fairly dumb as researchers have focused on getting the things to move without falling on their faces.
But in comes a so-called adversarial human actor, a sort of additional signal. If the robot finds a good grasp, the human uses a graphical interface to click on the object it’s gripping and apply a force in a certain direction. That disturbance basically tests how good the grasp really is, and helps the robot rule out the less effective ones.“The robot learned to grasp objects much more robustly using this additional signal that the human was providing, but also learned to generalize to new objects much better,” says USC roboticist Stefanos Nikolaidis, coauthor on a new paper describing the work. To put a number on it, when a human was giving the robot tough love, the machine had a 52 percent success rate at grasping, compared to 26.5 percent without the tough love.
Now, some critical caveats here. First of all, a simulation is a necessarily imperfect model of the real world—there’s no way to fully replicate all the physics and uncertainty of meatspace (or metalspace, in this case). So porting what a robot learns in simulation into a physical robotic arm is still very difficult, a challenge known as the reality gap. And two, this wasn’t willy-nilly tough love, as the human participants were working with certain rules and constraints.