This article was posted by Ingrid Daubechies on Quanta Magazine. Ingrid is the James B. Duke Professor of Mathematics and Electrical and Computer Engineering at Duke University. She served as president of the International Mathematical Union from 2011 to 2014.
Machine learning works spectacularly well, but mathematicians aren’t quite sure why.
At a dinner I attended some years ago, the distinguished differential geometer Eugenio Calabi volunteered to me his tongue-in-cheek distinction between pure and applied mathematicians. A pure mathematician, when stuck on the problem under study, often decides to narrow the problem further and so avoid the obstruction. An applied mathematician interprets being stuck as an indication that it is time to learn more mathematics and find better tools.
I have always loved this point of view; it explains how applied mathematicians will always need to make use of the new concepts and structures that are constantly being developed in more foundational mathematics. This is particularly evident today in the ongoing effort to understand "big data" — data sets that are too large or complex to be understood using traditional data-processing techniques.
Our current mathematical understanding of many techniques that are central to the ongoing big-data revolution is inadequate, at best. Consider the simplest case, that of supervised learning, which has been used by companies such as Google, Facebook and Apple to create voice- or image-recognition technologies with a near-human level of accuracy. These systems start with a massive corpus of training samples — millions or billions of images or voice recordings — which are used to train a deep neural network to spot statistical regularities. As in other areas of machine learning, the hope is that computers can churn through enough data to "learn" the task: Instead of being programmed with the detailed steps necessary for the decision process, the computers follow algorithms that gradually lead them to focus on the relevant patterns.
In mathematical terms, these supervised-learning systems are given a large set of inputs and the corresponding outputs; the goal is for a computer to learn the function that will reliably transform a new input into the correct output. To do this, the computer breaks down the mystery function into a number of layers of unknown functions called sigmoid functions. These S-shaped functions look like a street-to-curb transition: a smoothened step from one level to another, where the starting level, the height of the step and the width of the transition region are not determined ahead of time.
Inputs enter the first layer of sigmoid functions, which spits out results that can be combined before being fed into a second layer of sigmoid functions, and so on. This web of resulting functions constitutes the “network” in a neural network. A “deep” one has many layers.
To read more, click here.
Top DSC Resources