Millions of people routinely say “hey” to voice assistants like Siri and Alexa , even though the experience can be frustratingly glitchy. On Tuesday, Google previewed new technology that makes speech recognition strikingly more responsive, suggesting voice control could soon be seamless enough to be irresistible.
Tom Simonite covers artificial intelligence for WIRED.
At its annual developer conference in Mountain View, Google boasted of shrinking its speech recognition software to 1/25th of its prior size. CEO Sundar Pichai called that a “milestone” because it means software that traditionally lives in Google’s cloud servers can be installed in Pixel smartphones Google will launch later this year, allowing the devices to respond to a person’s voice much more quickly.
In a series of demos Tuesday, Google showed phones that could recognize words the moment they’re uttered, instead of shipping audio to a distant server. The quicker response has the potential to change how people relate to their gadgets. In one scenario shown, Google’s virtual assistant appeared strikingly more powerful and private than competitors such as Siri.
In that demo, Google employee Meggie Hollinger showed how she could get things done on her phone by blitzing through a series of voice commands with nary a pause. Each was fulfilled more quickly than Siri and other assistants have trained users to expect—and she did not have to say the wake word “Ok Google” between requests. Virtual assistants typically use that to help them identify the audio to ship to the cloud.
To message a friend a snapshot from a recent vacation, Hollinger quickly uttered three phrases, and made just two taps. “Show me my photos from Yellowstone; the ones with animals; send it to Justin,” she said. Within seconds, Google’s Photos app had searched through her collection and a portrait of a bison was on its way. Watching her achieve the same result with taps and swipes alone would have felt interminable.
Google’s AI chief Jeff Dean tweeted that the segment showed how wholly on-device speech recognition would “change the way you interact with your phones.” Google previously used on-device and cloud recognition algorithms together, but Tuesday said its devices were ready to discard their cloud crutch in many situations.
Improved speed and a lower glitch rate can be crucial for a consumer technology to flip from encouraging to essential. Consider how faster PCs and broadband made video chat and multiplayer gaming practical. Google has not yet released its on-device speech recognition and it will initially work only on high-end devices, but Tuesday’s examples suggested the technology could transform the experience of talking to your phone.
Modern voice assistants are outgrowths of AI research from 2012, which showed that a technique called deep learning could make speech recognition much more accurate. Google has said the technology immediately reduced its error rate by 25 percent. At the time, though, the technology had to run on servers; sending audio to the cloud imposed speed constraints and introduced glitches due to network errors.
Google has spent years researching how to shrink deep learning software, and in 2019 got its speech recognition models below a crucial threshold. A recent version of Google’s cloud server speech recognition package was about 2 gigabytes in size, the company said Tuesday, making it too large and demanding to run on a smartphone. The on-device version emerged after AI researchers redesigned the technology to provide similar performance from code weighing in at just 80 megabytes, 1/25th as large.
The WIRED Guide to Artificial Intelligence
Speech recognition that is noticeably snappier than the first generation of cloud-based deep learning technology could prompt users to use voice commands and queries more often. It might also give Google’s devices and services a boost over rivals such as Apple and Amazon—at least until those companies develop similar technology of their own. Apple lin 2017 applied for a patent for an “offline personal assistant.”
Werner Goertz, a research director at Gartner, calls the shift to on-device speech recognition “game changing” and potentially a significant challenge to Apple’s and Amazon’s more conventional speech systems. “Latency has always been an issue,” he said, and most users have experienced the problem.
Google also used its on-device technology to create a new feature for its future phones called Live Caption. Once activated, captions appear on screen for any speech playing on the phone, such as a video from a friend, or a podcast. Because the processing takes place on the phone, it works even in airplane mode.
The company also showed a research project called Euphonia that aims to adapt speech recognition to people with speech problems, for example due to a stroke or disease. Google appealed for volunteers to contribute samples of their voices. Product manager Julie Cattiau said on-device voice recognition could help the project one day become a widely available product, because the recognition software on a person’s phone could be tuned to their individual voice. “It opens opportunity for personalization,” she said.
Processing speech on-device instead of transmitting it to the cloud can also be more private than the conventional model—although in some cases the transcribed text will be sent to Google anyway. Pichai and other executives made privacy a theme of Tuesday’s event, gently trying to neutralize Google’s reputation for data-slurping. The company showcased redesigned privacy settings, and a new “incognito mode” for Google Maps that pauses the service’s default tracking of a device’s movements.
- Why I love my teeny-tiny knockoff Nokia
- Donald Glover, Adidas, Nike, and the fight for cool
- The quietly lucrative business of donating human eggs
- Are we there yet? A reality check on self-driving cars
- The Battle of Winterfell: a tactical analysis
- 📱 Torn between the latest phones? Never fear—check out our iPhone buying guide and favorite Android phones
- 📩 Hungry for even more deep dives on your next favorite topic? Sign up for the Backchannel newsletter