Training Machine Learning (ML) models is like finding the quickest path down a winding mountain—too slow, and you never reach the bottom; too fast, and you might veer off course. One way to speed up learning without losing control is momentum, a technique that helps the training algorithm adjust the update direction intelligently. Momentum-based methods, such as Nesterov acceleration, are widely used in ML training, but they are traditionally studied under ideal conditions—when the learning landscape is convex and the gradients are reliable. In reality, training ML models often involves noisy updates and bumpy terrain. In this talk, I will introduce two advances in momentum-based optimization algorithms:
Learning with Noisy Gradients: We show that momentum still works, even when ML models receive highly uncertain gradient feedback during training.
Learning in Non-Convex Landscapes: We show that even when the learning trajectory has some non-convexity, momentum can still provide acceleration.
These results help bridge the gap between theory and practice, explaining why momentum-based methods are so effective in modern deep learning. The talk does not assume any background in machine learning or optimization.
Thackeray Hall Room 703