One World Seminar Series on the 

Mathematics of Machine Learning

The One World Seminar Series on the Mathematics of Machine Learning is an online platform for research seminars, workshops and seasonal schools in theoretical machine learning. The focus of the series lies on theoretical advances in machine learning and deep learning as a complement to the one world seminars on probability, on Information, Signals and Data (MINDS), on methods for arbitrary data sources (MADS), and on imaging and inverse problems (IMAGINE).

The series was started during the Covid-19 epidemic in 2020 to bring together researchers from all over the world for presentations and discussions in a virtual environment. It follows in the footsteps of other community projects under the One World Umbrella which originated around the same time.

We welcome suggestions for speakers concerning new and exciting developments and are committed to providing a platform also for junior researchers. We recognize the advantages that online seminars provide in terms of flexibility, and we are experimenting with different formats. Any feedback on different events is welcome.

Next Event

Wed Oct. 4

12 noon ET

Bridge Discrete Variables with Back-Propagation and Beyond

Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables and sparse computations. To address the challenge posed by discrete latent variables, we first assess the Straight-Through (ST) heuristic, formally establishing it works as a first-order gradient approximation. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on benchmark datasets demonstrate the superiority of ReinMax over the state of the art. 


In addition, the inherent sparse computation of the Mixture-of-Expert (MoE) models poses unique challenges on gradient estimations. To reconciles the dense backpropagation with sparse expert routing, we present SparseMixer, a gradient estimator specifically designed for MoE. Unlike typical MoE training which strategically neglects certain gradient terms for the sake of sparse computation and scalability, SparseMixer provides scalable gradient approximations for these terms, enabling reliable gradient estimation in MoE training. Also grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a different second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pretraining and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence by up to 2 times.

Mailing List and Google Calendar

Sign up here to join our mailing list and receive announcements. If your browser automatically signs you into a google account, it may be easiest to join on a university account by going through an incognito window. With other concerns, please reach out to one of the organizers. 

Sign up here for our google calendar with all seminars.

Format

Seminars are held online on Zoom. The presentations are recorded and video is made available on our youtube channel. A list of past seminars can be found here. All seminars, unless otherwise stated, are held on Wednesdays at 12 noon ET. The invitation will be shared on this site before the talk and distributed via email.

Board

Wuyang Chen (UC Berkeley)

Bin Dong (Peking University)

Boumediene Hamzi (Caltech)

Franca Hoffmann (Caltech)

Issa Karambal (Quantum Leap Africa)

Qianxiao Li (National University of Singapore)

Matthew Thorpe (University of Warwick)

Tiffany Vlaar (Mila/McGill University)

Stephan Wojtowytsch (University of Pittsburgh)

Former Board Members

Simon Shaolei Du (University of Washington)

Surbhi Goel (Microsoft Research NY)

Chao Ma (Stanford University)


Song Mei (UC Berkeley)

Philipp Petersen (University of Vienna)