What AlphaGo Can Teach Us About How People Learn

David Silver is responsible for several eye-catching demonstrations of artificial intelligence in recent years, working on advances that helped revive interest in the field after the last great AI Winter.At DeepMind , a subsidiary of Alphabet, Silver has led the development of techniques that let computers learn for themselves how to solve problems that once seemed intractable.Most famously, this includes AlphaGo , a program revealed in 2017 that taught itself to play the ancient board game Go to a grandmaster level. Go is too subtle and instinctive to be tamed using conventional programming, but AlphaGo learned to play through practice and positive reward—an AI technique known as “reinforcement learning.”
In 2018, Silver and colleagues developed a more general version of the program, called AlphaZero, capable of learning to play expert chess and shogi as well as Go. Then, in November 2019, DeepMind released details of MuZero, a version that learns to play these and other games—but crucially without needing to know the rules beforehand.Silver met with senior writer Will Knight over Zoom from London to discuss MuZero, reinforcement learning, and the secret to making further progress in AI. This transcript has been edited for length and clarity.
WIRED: Your MuZero work is published in the journal Nature today. For the uninitiated, tell us why it is important.David Silver: The big step forward with MuZero is we don't tell it the dynamics of the environment; it has to figure that out for itself in a way that still lets it plan ahead and figure out what's going to be the most effective strategy. We want to have algorithms that work in the real world, and the real world is complicated and messy and unknown. So you can't just look ahead, like in a game of chess. You, you have to learn how the world works.

Some observers point out that MuZero, AlphaGo, and AlphaZero don’t really start from scratch. They use algorithms crafted by clever humans to learn how to perform a particular task. Does this miss the point?

I think it does, actually. You never truly have a blank slate. There's even a theorem in machine learning —the no-free-lunch theorem—that says you have to start with something or you don't get anywhere. But in this case, the slate is as blank as it gets. We're providing it with a neural network , and the neural network has to figure out for itself, just from the feedback of the wins and losses in games or the score, how to understand the world.

One thing people picked up on is that we tell MuZero the legal moves in each situation. But if you take reinforcement learning, which is all about trying to solve problems in situations where the world is unknown, it's normally assumed that you're told what you can do. You have to tell the agent what choices it has available, and then it takes one of them.

You might critique what we've done with it so far. The real world is massively complex, and we haven't built something which is like a human brain that can adapt to all these things. So that's a fair critique. But I think MuZero really is discovering for itself how to build a model and understand it just from first principles.

DeepMind recently announced that it had used the technology behind AlphaZero to solve an important practical problem—predicting the shape that a protein will fold into. Where do you think MuZero will have its first big impact?