On the foundations of computational mathematics, Smale’s 18th problem and the potential limits of AI
There is a profound optimism on the impact of deep learning (DL) and AI in the sciences with Geoffrey Hinton concluding that 'They should stop educating radiologists now'. However, DL has an Achilles heel: it is universaly unstable so that small changes in the initial data can lead to large errors in the final result. This has been documented in a wide variety of applications. Paradoxically, the existence of stable neural networks for these applications is guaranteed by the celebrated Universal Approximation Theorem, however, the stable neural networks are not computed by the current training approaches. We will address this problem and the potential limitations of AI from a foundations point of view. Indeed, the current situation in AI is comparable to the situation in mathematics in the early 20th century, when David Hilbert’s optimism (typically reflected in his 10th problem) suggested no limitations to what mathematics could prove and no restrictions on what computers could compute. Hilbert’s optimism was turned upside down by Goedel and Turing, who established limitations on what mathematics can prove and which problems computers can solve (however, without limiting the impact of mathematics and computer science).
We predict a similar outcome for modern AI and DL, where the limitations of AI (the main topic of Smale’s 18th problem) will be established through the foundations of computational mathematics. We sketch the beginning of such a program by demonstrating how there exist neural networks approximating classical mappings in scientific computing, however, no algorithm (even randomised) can compute such a network to even 1-digit accuracy with probability better than 1/2. We will also show how instability is inherit in the methodology of DL demonstrating that there is no easy remedy, given the current methodology. Finally, we will demonstrate basic examples in inverse problems where there exists (untrained) neural networks that can easily compute a solution to the problem, however, the current DL techniques will need 10^80 data points in the training set to get even 1% success rate.
Wed July 22
Thematic Day on the Mean Field Training of Deep Neural Networks
Tradeoffs between Robustness and Accuracy
Standard machine learning produces models that are highly accurate on average but that degrade dramatically when the test distribution deviates from the training distribution. While one can train robust models, this often comes at the expense of standard accuracy (on the training distribution). We study this tradeoff in two settings, adversarial examples and minority groups, creating simple examples which highlight generalization issues as a major source of this tradeoff. For adversarial examples, we show that even augmenting with correctly annotated data to promote robustness can produce less accurate models, but we develop a simple method, robust self training, that mitigates this tradeoff using unlabeled data. For minority groups, we show that overparametrization of models can hurt accuracy on the minority groups, though it improves standard accuracy. These results suggest that the "more data" and "bigger models" strategy that works well for the standard setting where train and test distributions are close, need not work on out-of-domain settings.
This is based on joint work with Sang Michael Xie, Shiori Sagawa, Pang Wei Koh, Fanny Yang, John Duchi and Percy Liang.
Wed Aug 05
A Few Thoughts on Deep Network Approximation
Deep network approximation is a powerful tool of function approximation via composition. We will present a few new thoughts on deep network approximation from the point of view of scientific computing in practice: given an arbitrary width and depth of neural networks, what is the optimal approximation rate of various function classes? Does the curse of dimensionality exist for generic functions? Can we obtain exponential convergence for generic functions?