It’s easy to watch a baby finally learn to walk after hours upon hours of trial and error and think, OK, good work, but do you want a medal or something? Well, maybe only a childless person like me would think that, so credit where credit is due: It’s supremely difficult for animals like ourselves to manage something as everyday as putting one foot in front of the other.
It’s even more difficult to get robots to do the same. It used to be that to make a machine walk, you either had to hard-code every command or build the robot a simulated world in which to learn. But lately, researchers have been experimenting with a novel way to go about things: Make robots teach themselves how to walk through trial and error, like babies, navigating the real world.
Researchers at UC Berkeley and Google Brain just took a big step (sorry) toward that future with a quadrupedal robot that taught itself to walk in a mere two hours. It was a bit ungainly at first, but it essentially invented walking on its own. Not only that, the researchers could then introduce the machine to new environments, like inclines and obstacles, and it adapted with ease. The results are as awkward as they are magical, but they could lead to machines that explore the world without us having to coddle them.
The secret ingredient here is a technique called maximum-entropy reinforcement learning. Entropy in this context means randomness—lots of it. The researchers give the robot a digital reward for doing something random that ends up working well. So in this case, the robot is rewarded for achieving forward velocity, meaning it’s trying new things and inching forward bit by bit. (A motion-capture system in the lab calculated the robot’s progress.)
Problem, though: “The best way to maximize this reward initially is just to dive forward,” says UC Berkeley computer scientist Tuomas Haarnoja, lead author on a new preprint paper detailing the system. “So we need to penalize for that kind of behavior, because it would make the robot immediately fall.”
Another problem: When researchers want a robot to learn, they typically run this reinforcement learning process in simulation first. The digital environment approximates the physics and materials of the real world, allowing a robot’s software to rapidly conduct numerous trials using powerful computers.
Researchers use “hyperparameters” to get the algorithm to work with a particular kind of simulated environment. “We just need to try different variations of these hyperparameters and then pick the one that actually works,” says Haarnoja. “But now that we are dealing with the real-world system, we cannot afford testing too many different settings for these hyperparameters.” The advance here is that Haarnoja and his colleagues have developed a way to automatically tune hyperparameters. “That makes experimenting in the real world much more feasible.”
Learning in the real world instead in a software simulation is much slower—every time it fell, Haarnoja had to physically pick up the four-legged robot and reset it, perhaps 300 times over the course of the two-hour training session. Annoying, yes, but not as annoying as trying to take what you’ve learned in a simulation—which is an imperfect approximation of the real world—and get it to work nicely in a physical robot.
Also, when researchers train the robot in simulation first, they’re explicit about what that digital environment looks like. The physical world, on the other hand, is much less predictable. So by training the robot in the real, if controlled, setting of a lab, Haarnoja and his colleagues made the machine more robust to variations in the environment.
Plus, this robot had to deal with small perturbations during its training. “We have a cable connected to the batteries, and sometimes the cable goes under the legs, and sometimes when I manually reset the robot I don't do it properly,” says Haarnoja. “So it learns from those perturbations as well.” Even though training in simulation comes with great speed, it can’t match the randomness of the real world. And if we want our robots to adapt to our homes and streets on their own, they’ll have to be flexible.
“I like this work because it convincingly shows that deep reinforcement learning approaches can be employed on a real robot,” says OpenAI engineer Matthias Plappert, who has designed a robotic hand to teach itself to manipulate objects. “It's also impressive that their method generalizes so well to previously unseen terrains, even though it was only trained on flat terrain.”
“That being said,” he adds, “learning on the physical robot still comes with many challenges. For more complex problems, two hours of training will likely not be enough.” Another hurdle is that training robots in the real world means they can hurt themselves, so researchers have to proceed cautiously.
Still, training in the real world is a powerful way to get robots to adapt to uncertainty. This is a radical departure from something like a factory robot, a brute that follows a set of commands and works in isolation so as not to fling its human coworkers across the room. Out in the diverse and unpredictable environments beyond the factory, though, the machines will have to find their own way.
“If you want to send a robot to Mars, what will it face?” asks University of Oslo roboticist Tønnes Nygaard, whose own quadrupedal robot learned to walk by “evolving.” “We know some of it, but you can't really know everything. And even if you did, you don't want to sit down and hard-code every way to act in response to each.”
So, baby steps … into space!
- Give yourself to the dark (mode) side
- The life-changing magic of peak self-optimization
- What is XR, and how do I get it?
- The simple engineering that will keep NYC's L train rolling
- A reclusive lizard became a prize for wildlife smugglers
- 👀 Looking for the latest gadgets? Check out our picks, gift guides, and best deals all year round
- 📩 Get even more of our inside scoops with our weekly Backchannel newsletter
“This is all about letting the robot supervise itself, rather than humans going in and doing annotations,” says coauthor Lucas Manuelli, also of MIT CSAIL.“I can see how this is very useful in industrial applications where the hard part is finding a good point to grasp,” says Matthias Plappert, an engineer at OpenAI who has developed a system for a robot hand to teach itself how to manipulate, but who wasn’t involved in this work.