Upcoming Events


Wed Mar 3
12 noon ET

Structure preservation and convergence in scientific machine learning

Physics-informed techniques have emerged as a means of incorporating prior knowledge into machine learning. These techniques generally function by minimizing a hybrid loss, regularizing a traditional $\ell_2$ error with a PDE residual. While remarkably effective, these approaches suffer two major shortcomings. Firstly, such neural network (NN) solutions of PDEs generally fail to converge with increasing architecture size. Despite recent work establishing NNs may approximate at least as well as hp-finite element spaces, in practice when training with gradient methods O(1) optimization errors prevent realizing consistency. Secondly, the regularized losses introduce physics via a penalized residual, and it is well known from classical numerical analysis that the approximation space must be designed in tandem with the residual to ensure converge to a given PDE.

We conjecture that the same tools used to design convergent and structure-preserving properties in forward simulation may be used to design scientific ML architectures with similar guarantees. In this talk, we present two current works which address each of these issues. First, we introduce partition of unity networks (POUnets) to develop convergent approximation with deep networks. It has been shown that traditional feed forward networks may approximate by emulating partitions of unity (POU), and then emulating monomials on each partition, ultimately yielding a localized polynomial approximation and associated hp-convergence. Rather than emulating these components, POUnets function by directly incorporating both the POU and polynomials into the architecture. The resulting approximation breaks the curse of dimensionality and admits a fast least-squares optimization strategy. Predictions are competitive with high-order finite element spaces, and provide superior approximation for problems with reduced regularity.

Secondly, we introduce a data-driven exterior calculus (DDEC) which may be used to endow scientific ML architectures with the structure-preserving properties of mimetic PDE discretization. Traditional mimetic methods function by exploiting the exterior calculus structures offered by a mesh to construct discrete operators that exactly mimic the topological properties of continuum operators. We show how graphs may be used as a surrogate for the topology offered by graphs, and present new network architectures which allows "physics-informed" machine learning which exactly preserves conservation, guarantees extraction of well-posed problems, and allows handling of the non-trivial null-spaces occurring in fields such as electromagnetics.

If time permits, we will additionally share some current results applying these tools in challenging data-driven modeling effort at Sandia, related to data-driven shock hydrodynamics in metals and discovery of surrogates for semiconductors in radiation environments.

Wed Mar 10
12 noon ET

Finite Width, Large Depth Neural Networks as Perturbatively Solvable Models

Deep neural networks are often considered to be complicated "black boxes," for which a systematic analysis is not only out of reach but potentially impossible. In this talk, which is based on ongoing joint work with Dan Roberts and Sho Yaida, I will make the opposite claim. Namely, that deep neural networks at initialization are perturbatively solvable models. The perturbative parameter is the width n of the network and we can obtain corrections to all orders in n. Our approach applies to networks at finite width n and large depth L. A key point is an emergent tension between depth and width. Large values of n make neural networks more like Gaussian processes, which are well behaved but incapable of feature learning due to a frozen NTK (at least with standard initialization schemes). Large values of L, in contrast, amplify higher cumulants and change in the NTK, both of which scale with the network aspect ratio L/n.

Wed Mar 17
12 noon ET

TBA

TBA

Wed Mar 24
12 noon ET

TBA

TBA

Wed Mar 31
12 noon ET

TBA

TBA

TBA

Wed Apr 4
12 noon ET

An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions

We study deep neural networks with binary activation functions (BDNN), i.e. the activation function only has two states. We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers. Additionally, a heuristic solution algorithm is presented and we study the model under data uncertainty, applying a two-stage robust optimization approach. We implemented our methods on random and real datasets and show that the heuristic version of the BDNN outperforms classical deep neural networks on the Breast Cancer Wisconsin dataset while performing worse on random data.