Despite multiple conferences dedicated to explicating Mochizuki’s proof, number theorists have struggled to come to grips with its underlying ideas. His series of papers, which total more than 500 pages, are written in an impenetrable style, and refer back to a further 500 pages or so of previous work by Mochizuki, creating what one mathematician, Brian Conrad of Stanford University, has called “a sense of infinite regress.”

Between 12 and 18 mathematicians who have studied the proof in depth believe it is correct, wrote Ivan Fesenko of the University of Nottingham in an email. But only mathematicians in “Mochizuki’s orbit” have vouched for the proof’s correctness, Conrad commented in a blog discussion last December. “There is nobody else out there who has been willing to say even off the record that they are confident the proof is complete.”

Nevertheless, wrote Frank Calegari of the University of Chicago in a December blog post, “mathematicians are very loath to claim that there is a problem with Mochizuki’s argument because they can’t point to any definitive error.”

That has now changed. In their report, Scholze and Stix argue that a line of reasoning near the end of the proof of “Corollary 3.12” in Mochizuki’s third of four papers is fundamentally flawed. The corollary is central to Mochizuki’s proposed *abc* proof.

“I think the *abc* conjecture is still open,” Scholze said. “Anybody has a chance of proving it.”

Scholze and Stix’s conclusions are based not only on their own study of the papers but also on a weeklong visit they paid to Mochizuki and his colleague Yuichiro Hoshi in March at Kyoto University to discuss the proof. That visit helped enormously, Scholze said, in distilling his and Stix’s objections down to their essence. The pair “came to the conclusion that there is no proof,” they wrote in their report.

But the meeting led to an oddly unsatisfying conclusion: Mochizuki couldn’t convince Scholze and Stix that his argument was sound, but they couldn’t convince him that it was unsound. Mochizuki has now posted Scholze’s and Stix’s report on his website, along with several reports of his own in rebuttal. (Mochizuki and Hoshi did not respond to requests for comments for this article.)

In his rebuttal, Mochizuki attributes Scholze and Stix’s criticism to “certain fundamental misunderstandings” about his work. Their “negative position,” he wrote, “does not imply the existence of any flaws whatsoever” in his theory.

Just as Mochizuki’s high reputation made mathematicians view his work as a serious attempt on the *abc* conjecture, Scholze and Stix’s stature guarantees that mathematicians will pay attention to what they have to say. Though only 30, Scholze has risen quickly to the top of his field. He was awarded the Fields Medal, mathematics’ highest honor, in August. Stix, meanwhile, is an expert in Mochizuki’s particular area of research, a field known as anabelian geometry.

“Peter and Jakob are extremely careful and thoughtful mathematicians,” Conrad said. “Any concerns that they have … definitely merit being cleared up.”

The *abc* conjecture, which Conrad has called “one of the outstanding conjectures in number theory,” starts with one of the simplest equations imaginable: *a *+ *b *= *c*. The three numbers *a*, *b* and *c* are supposed to be positive integers, and they are not allowed to share any common prime factors — so, for example, we could consider the equation 8 + 9 = 17, or 5 + 16 = 21, but not 6 + 9 = 15, since 6, 9 and 15 are all divisible by 3.

Given such an equation, we can look at all the primes that divide any of the three numbers — so, for instance, for the equation 5 + 16 = 21, our primes are 5, 2, 3 and 7. Multiplying these together produces 210, a much larger number than any of the numbers in the original equation. By contrast, for the equation 5 + 27 = 32, whose primes are 5, 3 and 2, the prime product is 30 — a smaller number than the 32 in the original equation. The product comes out so small because 27 and 32 have only small prime factors (3 and 2, respectively) that get repeated many times to make them.

If you start playing around with other *abc* triples, you’ll find that this second scenario is extremely rare. For example, among the 3,044 different triples you can make in which *a* and *b* are between 1 and 100, there are only seven in which the product of primes is smaller than *c*. The *abc* conjecture, which was first formulated in the 1980s, codifies the intuition that this kind of triple hardly ever happens.

More specifically, coming back to the 5 + 27 = 32 example, 32 is larger than 30, but only by a little. It’s smaller than 30^{2}, or 30^{1.5}, or even 30^{1.02}, which is about 32.11. The *abc* conjecture says that if you pick any exponent bigger than 1, then there are only finitely many *abc* triples in which *c* is larger than the product of the prime factors raised to your chosen exponent.

“The *abc* conjecture is a very elementary statement about multiplication and addition,” said Minhyong Kim of the University of Oxford. It’s the kind of statement, he said, where “you feel like you’re revealing some kind of very fundamental structure about number systems in general that you hadn’t seen before.”

And the simplicity of the *a *+ *b *= *c* equation means that a wide range of other problems fall under the conjecture’s sway. For instance, Fermat’s Last Theorem is about equations of the form *x ^{n }*+

The conjecture “always seems to lie on the boundary of what is known and what is unknown,” Dorian Goldfeld of Columbia University has written.

The wealth of consequences that would spring from a proof of the *abc* conjecture had convinced number theorists that proving the conjecture was likely to be very hard. So when word spread in 2012 that Mochizuki had presented a proof, many number theorists dived enthusiastically into his work — only to be stymied by the unfamiliar language and unusual presentation. Definitions went on for pages, followed by theorems whose statements were similarly long, but whose proofs only said, essentially, “this follows immediately from the definitions.”

“Each time I hear of an analysis of Mochizuki’s papers by an expert (off the record) the report is disturbingly familiar: vast fields of trivialities followed by an enormous cliff of unjustified conclusions,” Calegari wrote in his December blog post.

Scholze was one of the paper’s early readers. Known for his ability to absorb mathematics quickly and deeply, he got further than many number theorists, completing what he called a “rough reading” of the four main papers shortly after they came out. Scholze was bemused by the long theorems with their short proofs, which struck him as valid but insubstantial. In the two middle papers, he later wrote, “very little seems to happen.”

Then Scholze got to Corollary 3.12 in the third paper. Mathematicians usually use the word “corollary” to denote a theorem that is a secondary consequence of a previous, more important theorem. But in the case of Mochizuki’s Corollary 3.12, mathematicians agree that it is at the core of the proof of *abc*. Without it, “there is no proof at all,” Calegari wrote. “It is a critical step.”

This corollary is the only theorem in the two middle papers whose proof is longer than a few lines — it fills nine pages. As Scholze read through them, he reached a point where he couldn’t follow the logic at all.

Scholze, who was only 24 at the time, believed the proof was flawed. But he mostly stayed out of discussions about the papers, except when asked directly for his thoughts. After all, he thought, perhaps other mathematicians would find significant ideas in the paper that he had missed. Or, perhaps, they would eventually come to the same conclusion as he had. One way or the other, he thought, the mathematics community would surely be able to sort things out.

Meanwhile, other mathematicians were grappling with the densely written papers. Many had high hopes for a meeting dedicated to Mochizuki’s work in late 2015 at the University of Oxford. But as several of Mochizuki’s close associates tried to describe the key ideas of the proof, a “cloud of fog” seemed to descend over the listeners, Conrad wrote in a report shortly after the meeting. “Those who understand the work need to be more successful at communicating to arithmetic geometers what makes it tick,” he wrote.

Within days of Conrad’s post, he received unsolicited emails from three different mathematicians (one of them Scholze), all with the same story: They had been able to read and understand the papers until they hit a particular part. “For each of these people, the proof that had stumped them was for 3.12,” Conrad later wrote.

Kim heard similar concerns about Corollary 3.12 from another mathematician, Teruhisa Koshikawa, currently at Kyoto University. And Stix, too, got perplexed in the same spot. Gradually, various number theorists became aware that this corollary was a sticking point, but it wasn’t clear whether the argument had a hole or Mochizuki simply needed to explain his reasoning better.

Then in late 2017 a rumor spread, to the consternation of many number theorists, that Mochizuki’s papers had been accepted for publication. Mochizuki himself was the editor-in-chief of the journal in question, *Publications of the Research Institute for Mathematical Sciences*, an arrangement that Calegari called “poor optics” (though editors generally recuse themselves in such situations). But much more concerning to many number theorists was the fact that the papers were still, as far as they were concerned, unreadable.

“No expert who claims to understand the arguments has succeeded in explaining them to any of the (very many) experts who remain mystified,” Matthew Emerton of the University of Chicago wrote.

Calegari wrote a blog post decrying the situation as “a complete disaster,” to a chorus of amens from prominent number theorists. “We do now have the ridiculous situation where ABC is a theorem in Kyoto but a conjecture everywhere else,” Calegari wrote.

PRIMS soon responded to press inquiries with a statement that the papers had not, in fact, been accepted. Before they had done so, however, Scholze resolved to state publicly what he had been saying privately to number theorists for some time. The whole discussion surrounding the proof had gotten “too sociological,” he decided. “Everybody was talking just about how this feels like it isn’t a proof, but nobody was actually saying, ‘Actually there is this point where nobody understands the proof.’”

So in the comments section below Calegari’s blog post, Scholze wrote that he was “entirely unable to follow the logic after Figure 3.8 in the proof of Corollary 3.12.” He added that mathematicians “who do claim to understand the proof are unwilling to acknowledge that more must be said there.”

Shigefumi Mori, Mochizuki’s colleague at Kyoto University and a winner of the Fields Medal, wrote to Scholze offering to facilitate a meeting between him and Mochizuki. Scholze in turn reached out to Stix, and in March the pair traveled to Kyoto to discuss the sticky proof with Mochizuki and Hoshi.

Mochizuki’s approach to the *abc* conjecture translates the problem into a question about elliptic curves, a special type of cubic equation in two variables, *x* and *y*. The translation, which was well known before Mochizuki’s work, is simple — you associate each *abc* equation with the elliptic curve whose graph crosses the *x*-axis at *a*, *b* and the origin — but it allows mathematicians to exploit the rich structure of elliptic curves, which connect number theory to geometry, calculus and other subjects. (This same translation is at the heart of Andrew Wiles’ 1994 proof of Fermat’s Last Theorem.)

The *abc* conjecture then boils down to proving a certain inequality between two quantities associated with the elliptic curve. Mochizuki’s work translates this inequality into yet another form, which, Stix said, can be thought of as comparing the volumes of two sets. Corollary 3.12 is where Mochizuki presents his proof of this new inequality, which, if true, would prove the *abc* conjecture.

The proof, as Scholze and Stix describe it, involves viewing the volumes of the two sets as living inside two different copies of the real numbers, which are then represented as part of a circle of six different copies of the real numbers, together with mappings that explain how each copy relates to its neighbors along the circle. To keep track of how the volumes of sets relate to one another, it’s necessary to understand how volume measurements in one copy relate to measurements in the other copies, Stix said.

“If you have an inequality of two things but the measuring stick is sort of shrunk by a factor which you don’t control, then you lose control over what the inequality actually means,” Stix said.

It is at this crucial spot in the argument that things go wrong, Scholze and Stix believe. In Mochizuki’s mappings, the measuring sticks are locally compatible with one another. But when you go around the circle, Stix said, you end up with a measuring stick that looks different from if you had gone around the other way. The situation, he said, is akin to Escher’s famous winding staircase, which climbs and climbs only to somehow end up below where it started.

This incompatibility in the volume measurements means that the resulting inequality is between the wrong quantities, Scholze and Stix assert. And if you adjust things so the volume measurements are globally compatible, then the inequality becomes meaningless, they say.

Scholze and Stix have “identified a way that the argument can’t possibly work,” said Kiran Kedlaya, a mathematician at the University of California, San Diego, who has studied Mochizuki’s papers in depth. “So if the argument is to be correct, it has to do something different, and something a lot more subtle” than what Scholze and Stix describe.

Something more subtle is exactly what the proof does, Mochizuki contends. Scholze and Stix err, he wrote, in making arbitrary identifications between mathematical objects that should be regarded as distinct. When he told colleagues the nature of Scholze and Stix’s objections, he wrote, his descriptions “were met with a remarkably unanimous response of utter astonishment and even disbelief (at times accompanied by bouts of laughter!) that such manifestly erroneous misunderstandings could have occurred.”

Mathematicians will now have to absorb Scholze and Stix’s argument and Mochizuki’s response. But Scholze hopes that, in contrast with the situation for Mochizuki’s original series of papers, this should not be a protracted process, since the gist of his and Stix’s objection is not highly technical. Other number theorists “would have totally been able to follow the discussions that we had had this week with Mochizuki,” he said.

Mochizuki sees things very differently. In his view, Scholze and Stix’s criticism stems from a “lack of sufficient time to reflect deeply on the mathematics under discussion,” perhaps coupled with “a deep sense of discomfort, or unfamiliarity, with new ways of thinking about familiar mathematical objects.”

Mathematicians who are already skeptical of Mochizuki’s *abc* proof may well consider Scholze and Stix’s report the end of the story, said Kim. Others will want to study the new reports for themselves, an activity that Kim himself has commenced. “I don’t think I can completely avoid the need to check more carefully for myself before making up my mind,” he wrote in an email.

In the past couple of years, many number theorists have given up on trying to understand Mochizuki’s papers. But if Mochizuki or his followers can provide a thorough and coherent explanation for why Scholze and Stix’s picture is too simplistic (assuming that it is), “this might go a long way towards relieving some of the fatigue and maybe giving people more willingness to look into this thing again,” Kedlaya said.

In the meantime, Scholze said, “I think this should not be considered a proof until Mochizuki does some very substantial revisions and explains this key step much better.” Personally, he said, “I didn’t really see a key idea that would get us closer to the proof of the *abc* conjecture.”

Regardless of the eventual outcome of this discussion, the pinpointing of such a specific part of Mochizuki’s argument should lead to greater clarity, Kim said. “What Jakob and Peter have done is an important service to the community,” he said. “Whatever happens, I’m pretty confident that the reports will be progress of a definite sort.”

]]>“It’s a clever and important study that reminds us that ‘deep learning’ isn’t really that deep,” said Gary Marcus, a neuroscientist at New York University who was not affiliated with the work.

The result takes place in the field of computer vision, where artificial intelligence systems attempt to detect and categorize objects. They might try to find all the pedestrians in a street scene, or just distinguish a bird from a bicycle (which is a notoriously difficult task). The stakes are high: As computers take over critical tasks like automated surveillance and autonomous driving, we’ll want their visual processing to be at least as good as the human eyes they’re replacing.

It won’t be easy. The new work accentuates the sophistication of human vision — and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene — an image of an elephant. The elephant’s mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.

“There are all sorts of weird things happening that show how brittle current object detection systems are,” said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto.

Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.

Eyes wide open, we take in staggering amounts of visual information. The human brain processes it in stride. “We open our eyes and everything happens,” said Tsotsos.

Artificial intelligence, by contrast, creates visual impressions laboriously, as if it were reading a description in Braille. It runs its algorithmic fingertips over pixels, which it shapes into increasingly complex representations. The specific type of AI system that performs this process is called a neural network. It sends an image through a series of “layers.” At each layer, the details of the image — the colors and brightnesses of individual pixels — give way to increasingly abstracted descriptions of what the image portrays. At the end of the process, the neural network produces a best-guess prediction about what it’s looking at.

“It’s all moving from one layer to the next by taking the output of the previous layer, processing it and passing it along to the next layer, like a pipeline,” said Tsotsos.

Neural networks are adept at specific visual chores. They can outperform humans in narrow tasks like sorting objects into best-fit categories — labeling dogs with their breed, for example. These successes have raised expectations that computer vision systems might soon be good enough to steer a car through crowded city streets.

They’ve also provoked researchers to probe their vulnerabilities. In recent years there have been a slew of attempts, known as “adversarial attacks,” in which researchers contrive scenes to make neural networks fail. In one experiment, computer scientists tricked a neural network into mistaking a turtle for a rifle. In another, researchers waylaid a neural network by placing an image of a psychedelically colored toaster alongside ordinary objects like a banana.

This new study has the same spirit. The three researchers fed a neural network a living room scene: A man seated on the edge of a shabby chair leans forward as he plays a video game. After chewing on this scene, a neural network correctly detected a number of objects with high confidence: a person, a couch, a television, a chair, some books.

Then the researchers introduced something incongruous into the scene: an image of an elephant in semiprofile. The neural network started getting its pixels crossed. In some trials, the elephant led the neural network to misidentify the chair as a couch. In others, the system overlooked objects, like a row of books, that it had correctly detected in earlier trials. These errors occurred even when the elephant was far from the mistaken objects.

Snafus like those extrapolate in unsettling ways to autonomous driving. A computer can’t drive a car if it might go blind to a pedestrian just because a second earlier it passed a turkey on the side of the road.

And as for the elephant itself, the neural network was all over the place: Sometimes the system identified it correctly, sometimes it called the elephant a sheep, and sometimes it overlooked the elephant completely.

“If there is actually an elephant in the room, you as a human would likely notice it,” said Rosenfeld. “The system didn’t even detect its presence.”

When human beings see something unexpected, we do a double take. It’s a common phrase with real cognitive implications — and it explains why neural networks fail when scenes get weird.

Today’s best neural networks for object detection work in a “feed forward” manner. This means that information flows through them in only one direction. They start with an input of fine-grained pixels, then move to curves, shapes, and scenes, with the network making its best guess about what it’s seeing at each step along the way. As a consequence, errant observations early in the process end up contaminating the end of the process, when the neural network pools together everything it thinks it knows in order to make a guess about what it’s looking at.

“By the top of the neural network you have everything connected to everything, so you have the potential to have every feature in every location interfering with every possible output,” said Tsotsos.

The human way is better. Imagine you’re given a very brief glimpse of an image containing a circle and a square, with one of them colored blue and the other red. Afterward you’re asked to name the color of the square. With only a single glance to go on, you’re likely to confuse the colors of the two shapes. But you’re also likely to recognize that you’re confused and to ask for another look. And, critically, when you take that second look, you know to focus your attention on just the color of the square.

“The human visual system says, ‘I don’t have right answer yet, so I have to go backwards to see where I might have made an error,’” explained Tsotsos, who has been developing a theory called selective tuning that explains this feature of visual cognition.

Most neural networks lack this ability to go backward. It’s a hard trait to engineer. One advantage of feed-forward networks is that they’re relatively straightforward to train — process an image through these six layers and get an answer. But if neural networks are to have license to do a double take, they’ll need a sophisticated understanding of when to draw on this new capacity (when to look twice) and when to plow ahead in a feed-forward way. Human brains switch between these different processes seamlessly; neural networks will need a new theoretical framework before they can do the same.

Leading researchers in the world are working on it, though, and they’re calling for backup. Earlier this month, Google AI announced a contest to crowdsource image classifiers that can see their way through adversarial attacks. The winning entry will need to unambiguously distinguish between an image of a bird and an image of a bicycle. It would be a modest first step — but also a necessary one.

]]>Scientists knew back in the 1980s that they could observe only a fraction of the atomic matter — or baryons — in the universe. (Today we know that all baryons taken together are thought to make up about 5 percent of the universe — the rest is dark energy and dark matter.) They knew that if they counted up all the stuff they could see in the universe — stars and galaxies, for the most part — the bulk of the baryons would be missing.

But exactly how much missing matter there was, and where it might be hiding, were questions that started to sharpen in the 1990s. Around that time, astronomer David Tytler of the University of California, San Diego, came up with a way to measure the amount of deuterium in the light of distant quasars — the bright cores of galaxies with active black holes at their center — using the new spectrograph at the Keck telescope in Hawaii. Tytler’s data helped researchers understand just how many baryons were missing in today’s universe once all the visible stars and gas were accounted for: a whopping 90 percent.

These results set off a firestorm of controversy, fanned in part by Tytler’s personality. “He he was right in spite of, at the time, a lot of seemingly contradictory evidence, and basically said everyone else was a bunch of idiots who didn’t know what they were doing,” said Romeel Dave, an astronomer at the University of Edinburgh. “Turns out, of course, he was right.”

Then in 1998, Jeremiah Ostriker and Renyue Cen, Princeton University astrophysicists, released a seminal cosmological model that tracked the history of the universe from its beginnings. The model suggested that the missing baryons were likely wafting about in the form of diffuse (and at the time undetectable) gas between galaxies.

As it happens, Dave could have been the first to tell the world where the baryons were, beating Ostriker and Cen. Months before their paper came out, Dave had finished his own set of cosmological simulations, which were part of his Ph.D. work at the University of California, Santa Cruz. His thesis on the distribution of baryons suggested that they might be lurking in the warm plasma between galaxies. “I didn’t really appreciate the result for what it was,” said Dave. “Oh well, win some, lose some.”

Dave continued to work on the problem in the years to follow. He envisioned the missing matter as hiding in ghostly threads of extremely hot and very diffuse gas that connect galaxy pairs. In astro-speak, this became the “warm-hot intergalactic medium,” or WHIM, a term that Dave coined.

Many astronomers continued to suspect that there might be some very faint stars in the outskirts of galaxies that could account for a significant chunk of the missing matter. But after many decades of searching, the number of baryons in stars, even the faintest ones that could be seen, amounted to no more than 20 percent.

More and more sophisticated instruments came online. In 2003, the Wilkinson Microwave Anisotropy Probe measured the universe’s baryon density as it stood some 380,000 years after the Big Bang. It turned out to be the same density as indicated by the cosmological models. A decade later, the Planck satellite confirmed the number.

With the eventual failure to find hidden stars and galaxies that might be holding the missing matter, “attention turned toward gas in between the galaxies — the intergalactic medium distributed over billions of light years of low-density intergalactic space,” said Michael Shull, an astrophysicist at University of Colorado, Boulder. He and his team began searching for the WHIM by studying its effects on the light from distant quasars. Atoms of hydrogen, helium and heavier elements such as oxygen absorb the ultraviolet and X-ray radiation from these quasar lighthouses. The gas “steals a portion of light from the beam,” said Shull, leaving a deficit of light — an absorption line. Find the lines, and you’ll find the gas.

The most prominent absorption lines of hydrogen and ionized oxygen are at very short wavelengths, in the ultraviolet and X-ray portions of the spectrum. Unfortunately for astronomers (but fortunately for the rest of life on Earth), our atmosphere blocks these rays. In part to solve the missing matter problem, astronomers launched X-ray satellites to map this light. With the absorption line method, Shull said, scientists eventually “accounted for most, if not all, of the predicted baryons that were cooked up in the hot Big Bang.”

Other teams took different approaches, looking for the missing baryons indirectly. As my story from last week shows, three teams, including Shull’s, are now saying that all the baryons are accounted for.

But the WHIM is so faint, and the matter so diffuse, that it’s hard to definitely close the case. “Over the years, there have been many exchanges among researchers arguing for or against possible detections of the warm-hot intergalactic medium,” said Kenneth Sembach, director of the Space Telescope Science Institute in Baltimore. “I suspect there will be many more. The recent papers appear to be another piece in this complex and interesting cosmic puzzle. I’m sure there will be more pieces to come, and associated debates about how best to fit these pieces together.”

]]>To overcome those limitations, some research groups are turning back to the brain for fresh ideas. But a handful of them are choosing what may at first seem like an unlikely starting point: the sense of smell, or olfaction. Scientists trying to gain a better understanding of how organisms process chemical information have uncovered coding strategies that seem especially relevant to problems in AI. Moreover, olfactory circuits bear striking similarities to more complex brain regions that have been of interest in the quest to build better machines.

Computer scientists are now beginning to probe those findings in machine learning contexts.

State-of-the-art machine learning techniques used today were built at least in part to mimic the structure of the visual system, which is based on the hierarchical extraction of information. When the visual cortex receives sensory data, it first picks out small, well-defined features: edges, textures, colors, which involves spatial mapping. The neuroscientists David Hubel and Torsten Wiesel discovered in the 1950s and ’60s that specific neurons in the visual system correspond to the equivalent of specific pixel locations in the retina, a finding for which they won a Nobel Prize.

As visual information gets passed along through layers of cortical neurons, details about edges and textures and colors come together to form increasingly abstract representations of the input: that the object is a human face, and that the identity of the face is Jane, for example. Every layer in the network helps the organism achieve that goal.

Deep neural networks were built to work in a similarly hierarchical way, leading to a revolution in machine learning and AI research. To teach these nets to recognize objects like faces, they are fed thousands of sample images. The system strengthens or weakens the connections between its artificial neurons to more accurately determine that a given collection of pixels forms the more abstract pattern of a face. With enough samples, it can recognize faces in new images and in contexts it hasn’t seen before.

Researchers have had great success with these networks, not just in image classification but also in speech recognition, language translation and other machine learning applications. Still, “I like to think of deep nets as freight trains,” said Charles Delahunt, a researcher at the Computational Neuroscience Center at the University of Washington. “They’re very powerful, so long as you’ve got reasonably flat ground, where you can lay down tracks and have a huge infrastructure. But we know biological systems don’t need all that — that they can handle difficult problems that deep nets can’t right now.”

Take a hot topic in AI: self-driving cars. As a car navigates a new environment in real time — an environment that’s constantly changing, that’s full of noise and ambiguity — deep learning techniques inspired by the visual system might fall short. Perhaps methods based loosely on vision, then, aren’t the right way to go. That vision was such a dominant source of insight at all was partly incidental, “a historical fluke,” said Adam Marblestone, a biophysicist at the Massachusetts Institute of Technology. It was the system that scientists understood best, with clear applications to image-based machine learning tasks.

But “every type of stimulus doesn’t get processed in the same way,” said Saket Navlakha, a computer scientist at the Salk Institute for Biological Studies in California. “Vision and olfaction are very different types of signals, for example. … So there may be different strategies to deal with different types of data. I think there could be a lot more lessons beyond studying how the visual system works.”

He and others are beginning to show that the olfactory circuits of insects may hold some of those lessons. Olfaction research didn’t take off until the 1990s, when the biologists Linda Buck and Richard Axel, both at Columbia University at the time, discovered the genes for odor receptors. Since then, however, the olfactory system has become particularly well characterized, and it’s something that can be studied easily in flies and other insects. It’s tractable in a way that visual systems are not for studying general computational challenges, some scientists argue.

“We work on olfaction because it’s a finite system that you can characterize relatively completely,” Delahunt said. “You’ve got a fighting chance.”

“People can already do such fantastic stuff with vision,” added Michael Schmuker, a computational neuroscientist at the University of Hertfordshire in England. “Maybe we can do fantastic stuff with olfaction, too.”

Olfaction differs from vision on many fronts. Smells are unstructured. They don’t have edges; they’re not objects that can be grouped in space. They’re mixtures of varying compositions and concentrations, and they’re difficult to categorize as similar to or different from one another. It’s therefore not always clear which features should get attention.

These odors are analyzed by a shallow, three-layer network that’s considerably less complex than the visual cortex. Neurons in olfactory areas randomly sample the entire receptor space, not specific regions in a hierarchy. They employ what Charles Stevens, a neurobiologist at the Salk Institute, calls an “antimap.” In a mapped system like the visual cortex, the position of a neuron reveals something about the type of information it carries. But in the antimap of the olfactory cortex, that’s not the case. Instead, information is distributed throughout the system, and reading that data involves sampling from some minimum number of neurons. An antimap is achieved through what’s known as a sparse representation of information in a higher dimensional space.

Take the olfactory circuit of the fruit fly: 50 projection neurons receive input from receptors that are each sensitive to different molecules. A single odor will excite many different neurons, and each neuron represents a variety of odors. It’s a mess of information, of overlapped representations, that is at this point represented in a 50-dimensional space. The information is then randomly projected to 2,000 so-called Kenyon cells, which encode particular scents. (In mammals, cells in what’s known as the piriform cortex handle this.) That constitutes a 40-fold expansion in dimension, which makes it easier to distinguish odors by the patterns of neural responses.

“Let’s say you have 1,000 people and you stuff them into a room and try to organize them by hobby,” Navlakha said. “Sure, in this crowded space, you might be able to find some way to structure these people into their groups. But now, say you spread them out on a football field. You have all this extra space to play around with and structure your data.”

Once the fly’s olfactory circuit has done that, it needs to figure out a way to identify distinct odors with non-overlapping neurons. It does this by “sparsifying” the data. Only around 100 of the 2,000 Kenyon cells — 5 percent — are highly active in response to given smells (less active cells are silenced), providing each with a unique tag.

In short, while traditional deep networks (again taking their cues from the visual system) constantly change the strength of their connections as they “learn,” the olfactory system generally does not seem to train itself by adjusting the connections between its projection neurons and Kenyon cells.

As researchers studied olfaction in the early 2000s, they developed algorithms to determine how random embedding and sparsity in higher dimensions helped computational efficiency. One pair of scientists, Thomas Nowotny of the University of Sussex in England and Ramón Huerta of the University of California, San Diego, even drew connections to another type of machine learning model, called a support vector machine. They argued that the ways both the natural and artificial systems processed information, using random organization and dimensionality expansion to represent complex data efficiently, were formally equivalent. AI and evolution had converged, independently, on the same solution.

Intrigued by that connection, Nowotny and his colleagues continue to explore the interface between olfaction and machine learning, looking for a deeper link between the two. In 2009, they showed that an olfactory model based on insects, initially created to recognize odors, could also recognize handwritten digits. Moreover, removing the majority of its neurons — to mimic how brain cells die and aren’t replaced — did not affect its performance too much. “Parts of the system might go down, but the system as a whole would keep working,” Nowotny said. He foresees implementing that type of hardware in something like a Mars rover, which has to operate under harsh conditions.

But for a while, not much work was done to follow up on those findings — that is until very recently, when some scientists began revisiting the biological structure of olfaction for insights into how to improve more specific machine learning problems.

Delahunt and his colleagues have repeated the same kind of experiment Nowotny conducted, using the moth olfactory system as a foundation and comparing it to traditional machine learning models. Given fewer than 20 samples, the moth-based model recognized handwritten digits better, but when provided with more training data, the other models proved much stronger and more accurate. “Machine learning methods are good at giving very precise classifiers, given tons of data, whereas the insect model is very good at doing a rough classification very rapidly,” Delahunt said.

Olfaction seems to work better when it comes to speed of learning because, in that case, “learning” is no longer about seeking out features and representations that are optimal for the particular task at hand. Instead, it’s reduced to recognizing which of a slew of random features are useful and which are not. “If you can train with just one click, that would be much more beautiful, right?” said Fei Peng, a biologist at Southern Medical University in China.

In effect, the olfaction strategy is almost like baking some basic, primitive concepts into the model, much like a general understanding of the world is seemingly hard-wired into our brains. The structure itself is then capable of some simple, innate tasks without instruction.

One of the most striking examples of this came out of Navlakha’s lab last year. He, along with Stevens and Sanjoy Dasgupta, a computer scientist at the University of California, San Diego, wanted to find an olfaction-inspired way to perform searches on the basis of similarity. Just as YouTube can generate a sidebar list of videos for users based on what they’re currently watching, organisms must be able to make quick, accurate comparisons when identifying odors. A fly might learn early on that it should approach the smell of a ripe banana and avoid the smell of vinegar, but its environment is complex and full of noise — it’s never going to experience the exact same odor again. When it detects a new smell, then, the fly needs to figure out which previously experienced odors the scent most resembles, so that it can recall the appropriate behavioral response to apply.

Navlakha created an olfactory-based similarity search algorithm and applied it to data sets of images. He and his team found that their algorithm performed better than, and sometimes two to three times as well as, traditional nonbiological methods involving dimensionality reduction alone. (In these more standard techniques, objects were compared by focusing on a few basic features, or dimensions.) The fly-based approach also “used about an order of magnitude less computation to get similar levels of accuracy,” Navlakha said. “So it either won in cost or in performance.”

Nowotny, Navlakha and Delahunt showed that an essentially untrained network could already be useful for classification computations and similar tasks. Building in such an encoding scheme leaves the system poised to make subsequent learning easier. It could be used in tasks that involve navigation or memory, for instance — situations in which changing conditions (say, obstructed paths) might not leave the system with much time to learn or many examples to learn from.

Peng and his colleagues have started research on just that, creating an ant olfactory model to make decisions about how to navigate a familiar route from a series of overlapped images.

In work currently under review, Navlakha has applied a similar olfaction-based method for novelty detection, the recognition of something as new even after having been exposed to thousands of similar objects in the past.

And Nowotny is examining how the olfactory system processes mixtures. He’s already seeing possibilities for applications to other machine learning challenges. For instance, organisms perceive some odors as a single scent and others as a mix: A person might take in dozens of chemicals and know she’s smelled a rose, or she might sense the same number of chemicals from a nearby bakery and differentiate between coffee and croissants. Nowotny and his team have found that separable odors aren’t perceived at the same time; rather, the coffee and croissant odors are processed very rapidly in alternation.

That insight could be useful for artificial intelligence, too. The cocktail party problem, for example, refers to how difficult it is to separate numerous conversations in a noisy setting. Given several speakers in a room, an AI might solve this problem by cutting the sound signals into very small time windows. If the system recognized sound coming from one speaker, it could try to suppress inputs from the others. By alternating like that, the network could disentangle the conversations.

In a paper posted last month on the scientific preprint site arxiv.org, Delahunt and his University of Washington colleague J. Nathan Kutz took this kind of research one step further by creating what they call “insect cyborgs.” They used the outputs of their moth-based model as the inputs of a machine learning algorithm, and saw improvements in the system’s ability to classify images. “It gives the machine learning algorithm much stronger material to work with,” Delahunt said. “Some different kind of structure is being pulled out by the moth brain, and having that different kind of structure helps the machine learning algorithm.”

Some researchers now hope to also use studies in olfaction to figure out how multiple forms of learning can be coordinated in deeper networks. “But right now, we’ve covered only a little bit of that,” Peng said. “I’m not quite sure how to improve deep learning systems at the moment.”

One place to start could lie not only in implementing olfaction-based architecture but also in figuring out how to define the system’s inputs. In a paper just published in *Science Advances*, a team led by Tatyana Sharpee of the Salk Institute sought a way to describe smells. Images are more or less similar depending on the distances between their pixels in a kind of “visual space.” But that kind of distance doesn’t apply to olfaction. Nor can structural correlations provide a reliable bearing: Odors with similar chemical structures can be perceived as very different, and odors with very different chemical structures can be perceived as similar.

Sharpee and her colleagues instead defined odor molecules in terms of how often they’re found together in nature (for the purposes of their study, they examined how frequently molecules co-occurred in samples of various fruits and other substances). They then created a map by placing odor molecules closer together if they tended to co-activate, and farther apart if they did so more rarely. They found that just as cities map onto a sphere (the Earth), the odor molecules map onto a hyperbolic space, a sphere with negative curvature that looks like a saddle.

Sharpee speculated that feeding inputs with hyperbolic structure into machine learning algorithms could help with the classification of less-structured objects. “There’s a starting assumption in deep learning that the inputs should be done in a Euclidean metric,” she said. “I would argue that one could try changing that metric to a hyperbolic one.” Perhaps such a structure could further optimize deep learning systems.

Right now, much of this remains theoretical. The work by Navlakha and Delahunt needs to be scaled up to much more difficult machine learning problems to determine whether olfaction-inspired models stand to make a difference. “This is all still emerging, I think,” Nowotny said. “We’ll see how far it will go.”

What gives researchers hope is the striking resemblance the olfactory system’s structure bears to other regions of the brain across many species, particularly the hippocampus, which is implicated in memory and navigation, and the cerebellum, which is responsible for motor control. Olfaction is an ancient system dating back to chemosensation in bacteria, and is used in some form by all organisms to explore their environments.

“It seems to be closer to the evolutionary origin point of all the things we’d call cortex in general,” Marblestone said. Olfaction might provide a common denominator for learning. “The system gives us a really conserved architecture, one that’s used for a variety of things across a variety of organisms,” said Ashok Litwin-Kumar, a neuroscientist at Columbia. “There must be something fundamental there that’s good for learning.”

The olfactory circuit could act as a gateway to understanding the more complicated learning algorithms and computations used by the hippocampus and cerebellum — and to figuring out how to apply such insights to AI. Researchers have already begun turning to cognitive processes like attention and various forms of memory, in hopes that they might offer ways to improve current machine learning architectures and mechanisms. But olfaction might offer a simpler way to start forging those connections. “It’s an interesting nexus point,” Marblestone said. “An entry point into thinking about next-generation neural nets.”

]]>“Don’t you mean a needle?” I almost interjected. Then he said it again.

In mathematics, it turns out, conventional modes of thought sometimes get turned on their head. The mathematician I was speaking with, Dave Jensen of the University of Kentucky, really did mean “hay in a haystack.” By it, he was expressing a strange fact about mathematical research: Sometimes the most common things are the hardest to find.

“In many areas of mathematics you’re looking for examples of something, and examples are really abundant, but somehow any time you try to write down an example, you get it wrong,” said Jensen.

The hay-in-a-haystack phenomenon is at work in one of the first objects that kids encounter in mathematics: the number line. Points on the number line include the positive and negative integers (such as 2 and –29), rational numbers (ratios of integers like $latex \frac{3}{2}$ and $latex \frac{1}{137}$) and all irrational numbers — those numbers, like pi or $latex \sqrt{2}$, that can’t be expressed as a ratio.

Irrational numbers occupy the vast, vast majority of space on a number line — so vast, in fact, that if you were to pick a number on the number line at random, there is literally a 100 percent chance that it will be irrational.*

Yet despite their overwhelming presence, we almost never encounter irrational numbers in our daily lives. Instead we count with whole numbers and follow recipes with fractions. The numbers we know best are the extremely rare numbers, the special numbers — the needles in the haystack.

The hay is hard to find precisely because it’s so unexceptional. Rational numbers have the distinctive property that it’s possible to write them down. This calls them to our attention. Irrational numbers have an infinite decimal expansion. You couldn’t write one down even with an endless amount of time. That these numbers lack the exceptional property of “write-down-able-ness” is what makes them nearly invisible to our way of seeing.

“We’re looking with a magnet, and you’re not going to find hay with magnet; you’re only going to find needles,” said Dhruv Ranganathan, a mathematician who is in the midst of a move from the Massachusetts Institute of Technology to the University of Cambridge.

The search for hay in a haystack characterizes many different areas of math, including the subject of my most recent *Quanta* article, “Tinkertoy Models Produce New Geometric Insights.” There I wrote about mathematicians who are investigating the relationship between geometric shapes and the equations used to describe them. In rare cases, objects can be expressed by simple equations. These are the needles, the shapes we know best: lines, parabolas, circles, spheres.

The overwhelming preponderance of shapes resist such elegant formulation. They may be everywhere, but because you can’t write down the equations that describe them, it’s hard to establish that even a single one of them exists.

In my article, I explained how techniques from a field called “tropical geometry” serve as an especially sly way of deducing the existence of these ubiquitous geometric objects — the ones that, like the irrational numbers, are everywhere, even if you can’t write them down.

In mathematics it often happens that either something doesn’t exist, or it exists in abundance. The nature of those abundant objects might make them hard to detect, but if you’re a mathematician and you believe they’re there, and you believe they constitute almost all of everything, your task is straightforward: Find just one.

It’s as if you were convinced the oceans were filled with water, said Ranganathan, but every time you took a sample, you came up with something else — a shell, a rock, a plant. Yet to start to believe your hypothesis was correct, you’d hardly need to empty the sea.

“All you have to do is find any water,” he said. “One droplet of water will do.”

]]>