Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit ℓ^1-regularization

Wednesday, November 13, 2024 - 15:30

427 Thackeray Hall

Speaker Information
Hannes Matt
Catholic University of Eichstaett-Ingolstadt

Abstract or Additional Information

Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, a number of works have studied diagonal linear neural networks in the regression setting. They observed that gradient descent with small initialization exhibits an implicit regularization effect towards the minimizer with the smallest $\ell^1$-norm. We investigate implicit regularization in diagonal linear neural networks of depth $D\ge 2$ for overparameterized linear regression problems. Specifically, we analyze the approximation error between the limit point of gradient flow trajectories and the solution to the $\ell^1$-minimization problem. By deriving tight upper and lower bounds on the approximation error, we precisely characterize the dependence of the approximation error on the scale of initialization $\alpha$. Our results reveal that for network depths $D \ge 3$, the error decreases linearly with $\alpha$, whereas for $D=2$, it decreases with rate $\alpha^{1-\rho}$, where the parameter $\rho \in [0,1)$ can be explicitly characterized. Surprisingly, this parameter is closely connected to so-called null space property constants studied in the space recovery literature. We demonstrate the asymptotic tightness of our bounds through explicit examples. Numerical experiments confirm our theoretical findings, showing that deeper networks, i.e., $D \ge 3$, exhibit stronger implicit regularization toward sparsity, particularly in noisy settings. Additionally, we extend our analysis to cases where the $\ell^1$-minimization problem does not have a unique solution.

This is joint work with Dominik Stöger.