Upcoming Events

Wed June 26

12 noon ET

How Over-Parameterization Slows Down Gradient Descent

We investigate how over-parameterization impacts the convergence behaviors of gradient descent through two examples. In the context of learning a single ReLU neuron, we prove that the convergence rate shifts from $exp(−T)$ in the exact-parameterization scenario to an exponentially slower $1/T^3$ rate in the over-parameterized setting. In the canonical matrix sensing problem, specifically for symmetric matrix sensing with symmetric parametrization, the convergence rate transitions from $exp(−T)$ in the exact-parameterization case to $1/T^2$ in the over-parameterized case. Interestingly, employing an asymmetric parameterization restores the $exp(−T)$ rate, though this rate also depends on the initialization scaling. Lastly, we demonstrate that incorporating an additional step within a single gradient descent iteration can achieve a convergence rate independent of the initialization scaling.

Summer Break