Computers can now drive cars , beat world champions at board games like chess and Go , and even write prose . The revolution in artificial intelligence stems in large part from the power of one particular kind of artificial neural network, whose design is inspired by the connected layers of neurons in the mammalian visual cortex. These “convolutional neural networks” (CNNs) have proved surprisingly adept at learning patterns in two-dimensional data—especially in computer vision tasks like recognizing handwritten words and objects in digital images.
Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.
But when applied to data sets without a built-in planar geometry—say, models of irregular shapes used in 3D computer animation, or the point clouds generated by self-driving cars to map their surroundings—this powerful machine learning architecture doesn’t work well. Around 2016, a new discipline called geometric deep learning emerged with the goal of lifting CNNs out of flatland.
Now, researchers have delivered, with a new theoretical framework for building neural networks that can learn patterns on any kind of geometric surface. These “gauge-equivariant convolutional neural networks,” or gauge CNNs, developed at the University of Amsterdam and Qualcomm AI Research by Taco Cohen, Maurice Weiler, Berkay Kicanaoglu and Max Welling, can detect patterns not only in 2D arrays of pixels, but also on spheres and asymmetrically curved objects. “This framework is a fairly definitive answer to this problem of deep learning on curved surfaces,” Welling said.
Already, gauge CNNs have greatly outperformed their predecessors in learning patterns in simulated global climate data, which is naturally mapped onto a sphere. The algorithms may also prove useful for improving the vision of drones and autonomous vehicles that see objects in 3D, and for detecting patterns in data gathered from the irregularly curved surfaces of hearts, brains or other organs.
The researchers’ solution to getting deep learning to work beyond flatland also has deep connections to physics. Physical theories that describe the world, like Albert Einstein’s general theory of relativity and the Standard Model of particle physics, exhibit a property called “gauge equivariance.” This means that quantities in the world and their relationships don’t depend on arbitrary frames of reference (or “gauges”); they remain consistent whether an observer is moving or standing still, and no matter how far apart the numbers are on a ruler. Measurements made in those different gauges must be convertible into each other in a way that preserves the underlying relationships between things.
For example, imagine measuring the length of a football field in yards, then measuring it again in meters. The numbers will change, but in a predictable way. Similarly, two photographers taking a picture of an object from two different vantage points will produce different images, but those images can be related to each other. Gauge equivariance ensures that physicists’ models of reality stay consistent, regardless of their perspective or units of measurement. And gauge CNNs make the same assumption about data.
“The same idea [from physics] that there’s no special orientation—they wanted to get that into neural networks,” said Kyle Cranmer, a physicist at New York University who applies machine learning to particle physics data. “And they figured out how to do it.”
Escaping FlatlandMichael Bronstein, a computer scientist at Imperial College London, coined the term “geometric deep learning” in 2015 to describe nascent efforts to get off flatland and design neural networks that could learn patterns in nonplanar data. The term—and the research effort—soon caught on.Bronstein and his collaborators knew that going beyond the Euclidean plane would require them to reimagine one of the basic computational procedures that made neural networks so effective at 2D image recognition in the first place. This procedure, called “convolution,” lets a layer of the neural network perform a mathematical operation on small patches of the input data and then pass the results to the next layer in the network.
“You can think of convolution, roughly speaking, as a sliding window,” Bronstein explained. A convolutional neural network slides many of these “windows” over the data like filters, with each one designed to detect a certain kind of pattern in the data. In the case of a cat photo, a trained CNN may use filters that detect low-level features in the raw input pixels, such as edges. These features are passed up to other layers in the network, which perform additional convolutions and extract higher-level features, like eyes, tails or triangular ears. A CNN trained to recognize cats will ultimately use the results of these layered convolutions to assign a label—say, “cat” or “not cat”—to the whole image.