427 Thackeray Hall
Abstract or Additional Information
The neural tangent kernel (NTK) perspective establishes a fundamental connection between neural networks (NNs) trained by gradient descent (GD) and kernel methods. A key question arising from this insight is whether GD-based training of NNs can achieve optimal statistical rates comparable to kernel methods.
In this talk, I will present our recent work addressing this question, offering new insights into the theoretical guarantees of gradient-based deep learning models. For binary classification, we show that gradient descent (GD) applied to training two-layer ReLU neural networks with the logistic loss can achieve the optimal margin bound, provided the data is NTK-separable. Our analysis highlights the intricate interplay between optimization and generalization, leveraging a reference model and a refined estimation of Rademacher complexity. For least-squares regression, we demonstrate that both GD and SGD with two-layer ReLU NNs can attain optimal minimax rates under polynomial overparameterization. Finally, I will discuss how these results extend to deep ReLU networks in the regression setting.