Александр Владимирович Гасников

Алгоритмическая стохастическая оптимизация (Algorithmic Stochastic Programming)

Спецкурс-спецсеминар для студентов МФТИ и НМУ,
под руководством: А.Н.Безносиков, А.В.Гасников (ответственный за курс), Э.А.Горбунов, П.Е.Двуреченский, Д.А.Ковалев и др.

Курс будет проходить по вторникам с 19.00 по мск.
Ссылка для подключения
Онлайн-трансляция на ютьюб-канале (здесь же можно будет посмотреть прошедшие выступления).

Программа курса

состоит из 6 частей. Каждая часть - это 2 (изредка 3) лекции.

Part 1. Stochastic optimization and Data Science

This part aims to motivate stochastic optimization problems by statistical learning theory, where the goal is to minimize population risk, possibly based on the minimization of empirical risk.

1.1 Motivation by statistics: stochastic optimization viewpoint, Fisher's theorem, maximum likelihood estimation.

1.2 Motivation by machine learning: stochastic optimization viewpoint, elements of statistical learning theory, empirical risk minimization, regularization.

1.3 Basics of stochastic optimization: online (Stochastic Approximation) and offline (Sample Average Approximation/Monte-Carlo) approaches, setup of parallel and distributed computations.

Part 2. Convex stochastic optimization. Non-smooth problems

This part introduces the main algorithmic schemes for stochastic convex optimization with non-smooth objective functions.

2.1 Stochastic SubGradient Descent (SSGD). Convergence rate, lower bounds for convergence rate in non-smooth setting, optimality of SSGD.

2.2 Stochastic Mirror Descent. Convergence rate, flexibility w.r.t. the geometry of the feasible set, lower bounds for convergence rate in non-smooth setting, optimality of MD.

2.3 Stochastic SubGradient Descent and Mirror Descent in Online optimization, connection to stochastic optimization, the flexibility of SSGD and MD.

2.4 Convergence rates with high probability. Stochastic subgradients with heavy-tailed distribution, subgradient clipping technique, convergence rates with high probability.

Part 3. Convex stochastic optimization. Smooth case

This part introduces the main algorithmic schemes for stochastic convex optimization with smooth objective functions, in particular accelerated methods for smooth stochastic optimization.

3.1 Stochastic gradient descent: convergence rate analysis, convergence to the vicinity of the solution with constant step size, convergence rate under interpolation condition, incorporating mini-batch stochastic gradients.

3.2 Catalyst - a universal framework for acceleration of randomized optimization methods. Application to stochastic optimization algorithms.

3.3 Direct acceleration of SGD under weak and strong growth conditions, convergence rate analysis. Over-parameterized learning models and analysis of accelerated methods in this setting.

3.4 Batch-parallel computations and Federated Learning. Lower bounds and optimal algorithms.

3.5 Connection between deterministic methods with inexact oracle for smooth problems and algorithms for stochastic optimization: mini-batch stochastic gradient as an inexact oracle, convergence rate analysis.

3.6 Convergence rates with high probability. Stochastic gradients with heavy-tailed distribution, gradient clipping technique, convergence rates with high probability.

Part 4. Convex-concave stochastic saddle-point problems and monotone stochastic variational inequalities

In this part, we consider a broader perspective that includes saddle-point problems and, more generally, variational inequalities.

4.1 Motivation for convex-concave stochastic saddle-point problems and monotone stochastic variational inequalities: Equilibrium problems, game theory, robust optimization, robust machine learning problems, e.g. training Generative Adversarial Networks (GAN)

4.2 Variational inequalities and saddle-point problems with non-smooth operators or objectives. Stochastic Mirror Descent and its convergence rate.

4.3 Variational inequalities and saddle-point problems with smooth operators or objectives. Stochastic Mirror Prox and its convergence rate. Stochastic Mirror Prox with batch parallelization, applications to federated learning and distributed computations setting.

Part 5. Adaptive methods for convex stochastic optimization problems and monotone stochastic variational inequalities

The aim of this cpart is to present alternative ways of choosing a step size based on local information on the properties of the objective or operator in variational inequality.

5.1 AdaGrad-type algorithms for non-smooth optimization problems. Convergence rate analysis.

5.2 Adam-type methods for smooth problems and non-smooth optimization problems.

5.3 Universal stochastic Mirror Prox for smooth and non-smooth variational inequalities. Convergence rate analysis

Part 6. Randomized methods

This part presents the idea of randomization that allows to reduce the cost of one iteration of an algorithm and often does not increase the number of iterations.

6.1 Random gradient-free methods for non-smooth zeroth-order optimization: one-point and two-point feedback, convergence rate analysis.

6.2 Random gradient-free methods for smooth zeroth-order optimization: one-point and two-point feedback. Random (block) coordinate descent methods for smooth optimization problems. A unifying framework for accelerated randomized methods.

6.3 (Accelerated) variance reduced methods for finite-sum convex optimization and saddle-point problems.

6.4 Centralized distributed methods for finite-sum problems that use the statistical similarity of the data.

Основная литература

Shapiro A., Dentcheva D., Ruszczyński A. Lectures on stochastic programming: modeling and theory. – Society for Industrial and Applied Mathematics, 2014.
Shalev-Shwartz S., Ben-David S. Understanding machine learning: From theory to algorithms. – Cambridge university press, 2014.
Bubeck S. Convex optimization: Algorithms and complexity // Foundations and Trends® in Machine Learning Volume 8 Issue 3-411 pp 231–357. – 2015.
Goodfellow I., Bengio Y., Courville A. Deep learning. – MIT press, 2016.
Duchi J. C. Introductory lectures on stochastic optimization // The mathematics of data. – 2018. – V. 25. – P. 99-185.
Hazan E. Lecture notes: Optimization for machine learning // arXiv preprint arXiv:1909.03550. – 2019.
Lin Z., Li H., Fang C. Accelerated optimization for machine learning. – Springer, 2020.
Lan G. First-order and Stochastic Optimization Methods for Machine Learning. – Springer Nature, 2020.

По вопросам сдачи курса можно писать Александру Гасникову gasnikov@yandex.ru

Более подробная информация об авторах курса:

Aleksandr Beznosikov is a M.Sc student in Applied Mathematics and Physics at Moscow Institute of Physics and Technology. His research interests include convex and non-convex optimization, stochastic optimization, machine, and federated learning. He is a researcher in MIPT, HSE, and Yandex.Research and teaching assistant in MIPT and MADE (Mail.ru group academy). He received and receives various awards and personal scholarships from different organizations.

Pavel Dvurechensky is a research associate at Weierstrass Institute for Applied Analysis and Stochastics in Berlin. He received a Ph.D. in Mathematics in 2014 from the Faculty of Control and Applied Mathematics of Moscow Institute of Physics and Technology and received a B.Sc and M.Sc in Mathematics in 2008 and 2010 from the same university. In 2021 he received (jointly with Alexander Gasnikov) the Award for Young Scientists from the Moscow Government. His main area of research is optimization algorithms. He regularly publishes in top-tier machine learning conferences such as NeurIPS, ICML and has papers in Q1 journals such as EJOR, OMS, and JOTA.

Alexander Gasnikov is a professor at Moscow Institute of Physics and Technology. He received a Doctor degree (Habilitation) in Mathematics in 2016 from the Faculty of Control and Applied Mathematics of Moscow Institute of Physics and Technology and received a B.Sc, M.Sc, Ph.D. in Mathematics in 2004, 2006, and 2007 from the same university. In 2019 he received Yahoo Faculty Research Engagement Program (jointly with Cesar Uribe). In 2020 he received the Yandex Award (Ilya Segalovich Award). In 2021 he received the Award for Young Scientists from the Moscow Government (jointly with Pavel Dvurechensky). His main area of research is optimization algorithms. He regularly publishes in top-tier machine learning conferences such as NeurIPS, ICML and has papers in Q1 journals such as EJOR, OMS, and JOTA.

Eduard Gorbunov is Ph.D. in Computer Science at Moscow Institute of Physics and Technology. He received a B.Sc, a M.Sc and PhD in Applied Mathematics and Physics from Moscow Institute of Physics and Technology. His research interests include stochastic optimization, distributed optimization, federated learning, and derivative-free optimization. Eduard Gorbunov received several awards and scholarships for his research including The Ilya Segalovich Award 2019, Andrei Raigorodskii personal scholarship (main prize), Huawei scholarship for bachelor and master students at MIPT. He regularly publishes in top-tier machine learning conferences such as NeurIPS, ICML, AISTATS, ICLR, receives awards as the best reviewer there, and has papers in Q1 journals such as SIOPT, EJOR.

Dmitry Kovalev is a Ph.D. student at the King Abdullah University of Science and Technology. His research interests include stochastic optimization, distributed optimization, federated learning, and saddle-points problems. Dmitry received several awards and scholarships for his research including The Ilya Segalovich Award 2020, 2021, and best student paper award at ICML 2021 workshop. He regularly publishes in top-tier machine learning conferences such as NeurIPS, ICML, AISTATS, ICLR, and has papers in Q1 journals.