Thematic day on the mean field training of multi-layer networks
The thematic day is scheduled for Saturday, July 25th 2020.
A mean-field theory for certain deep neural networks
A natural approach to understand overparameterized deep neural networks is to ask if there is some kind of natural limiting behavior when the number of neurons diverges. We present a rigorous limit result of this kind for for networks with complete connections and "random-feature-style" first and last layers. Specifically, we show that network weights are approximated by certain "ideal particles" whose distribution and dependencies are described by McKean-Vlasov mean-field model. We will present the intuition behind our approach; sketch some of the key technical challenges along the way; and connect our results to some of the recent literature on the topic.
A recording of the talk can be found here.
Mean field limits of neural networks: typical behavior and fluctuations
Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas are driven by neural networks. There are also growing applications in finance, engineering, robotics, and medicine. Despite their immense success in practice, there is limited mathematical understanding of neural networks. Our work shows how neural networks can be studied via stochastic analysis, and develops approaches for addressing some of the technical challenges which arise. We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the neural network and we show that, under suitable assumptions on the activation functions and the behavior for large times, the limit neural network recovers a global minimum (with zero loss for the objective function). In addition, we rigorously prove a central limit theorem, which describes the neural network's fluctuations around its mean-field limit. The fluctuations have a Gaussian distribution and satisfy a stochastic partial differential equation. We demonstrate the theoretical results in the study of the evolution of parameters in the well known MNIST and CIFAR10 data sets.
A recording of the talk can be found here.
A general framework for the mean field limit of multilayer neural networks
Recent progress has witnessed the discovery of a new operating regime for neural networks. In this regime, the characteristics of the network under gradient-based training as the width increases tend to a meaningful and nonlinear limit, known as the mean field limit. This has led to an exciting venue of research that has been met with new ideas and sophisticated mathematical tools. Nevertheless, understanding of the mean field limit has so far been mostly restricted to the shallow case.
We will describe a novel framework to formulate the mean field limit for general multilayer neural networks based on the idea of a neuronal embedding. Our formulation goes beyond i.i.d. initializations and apply to a wide class of initialization distributions. When restricted to the i.i.d. case, it gives a way to easily recover degeneracy properties of the training dynamics in that setting. Using our formulation, we derive first-known guarantees on global convergence of three-layer neural networks with i.i.d. initialization under stochastic gradient descent in the mean field regime. Guarantees on global convergence of neural networks of arbitrary depth are shown for a class of initialization distributions that avoid the degeneracy of i.i.d. initializations. In both cases, global convergence guarantees hold even at the absence of convexity-based assumptions.
On the Banach spaces for multi-layer networks and connections to mean field training
The function spaces which have been successful in classical, often low-dimensional, problems in PDEs and the calculus of variations seem ill-equipped for the challenges of machine learning. This shortcoming has been partially adressed by the introduction of new function spaces such as reproducing kernel Hilbert spaces for random feature models and Barron space for two-layer neural networks.
In this talk, we extend the analysis to multi-layer neural networks with the path-norm. We bound the Rademacher complexity of the class of infinitely wide networks with small path-norm and prove direct and inverse approximation theorems by finite networks. Several representation formulas are discussed.
For suitable parameter initialization, we show that the path-norm increases at most polynomially under natural training dynamics.
With any further questions, please reach out to one of the organizers Chao, Song or Stephan.