But you’ll never stop The Simpsons. After a few months, Shearer relented and signed a new deal. The show often jokes about the replaceability of voice actors in animation, but as it pushes through its fourth decade, it’s the iconic voices behind the laughter that could pose the biggest threat to its continued presence. The actors who play Springfield’s residents are approaching retirement age—they’re mostly in their sixties or seventies; Shearer is 77—and they might soon decide they don’t want to do it anymore. They certainly don’t need the money—between fees for new episodes and residuals from repeats of old ones, they’re sitting on tens of millions of dollars.But maybe the producers of the show don’t actually need voice actors anymore. In a recent episode, Edna Krabappel—Bart’s long-suffering teacher, whose character was retired from the show after the death of voice actor Marcia Wallace in 2013—was brought back for a final farewell using recordings that had been made for previous episodes.
Advances in computing power mean that you could extend that principle to any character. Deepfake technology can make convincing lookalikes from a limited amount of training data, and the producers of the show have 30 years worth of audio to work from. So could The Simpsons replace its voice cast with an AI?“You could certainly come up with an episode of The Simpsons that is voiced by the characters in a believable way,” says Tim McSmythurs, a Canada-based AI researcher and media producer who has built a speech model that can be trained to mimic anyone’s voice. “Whether that would be as entertaining is another question.”
On his YouTube channel, Speaking of AI, McSmythurs recasts an iconic scene from Notting Hill with Homer playing the Julia Roberts role; Donald Trump stands in for Ralph Wiggum, and Joe Biden ties an onion to his belt, which was the style at the time.McSmythurs built a generic AI model that can turn any text into audio speech in English. When he wants to make a new voice, he tunes the model further with two or three hours of new data of that particular person speaking, along with a text transcript. “It focuses in on what makes a Homer voice a Homer voice, and the different frequencies,” he says.After that, it’s a matter of asking the model to generate multiple takes—each one will vary slightly —and choosing the best one for your purposes. The outputs are recognizably Homer, but they sound a little emotionally flat, as if he’s reading out something that he doesn’t really understand the meaning of. “It does depend on the training data,” McSmythurs says. “If the model hasn’t been exposed to those quite wide ranges of emotion it can’t create it from scratch. So it doesn’t sound as energetic as Homer might.”