Continuous optimization methods have played a major role in the development of fast algorithms for problems arising in areas such as Theoretical Computer Science, Discrete Optimization, Data Science, Statistics, and Machine Learning. The approach is to first formulate the problem as a continuous optimization problem, even if the problem may be over a discrete domain, and then adapt or develop continuous methods to solve it.
The goal of these lecture notes is to present key methods in continuous optimization and show how to apply them to derive fast algorithms for various discrete and continuous optimization problems through convex optimization formulations.
The methods covered in these lectures include:
- Gradient Descent
- Mirror Descent
- Multiplicative Weight Update Method
- Accelerated Gradient Descent
- Newton’s Method
- Interior Point Methods
- Cutting Plane and Ellipsoid Methods
The intended audience includes advanced undergraduate students, graduate students, or researchers from Computer Science, Discrete Optimization, and Machine Learning.
These lecture notes have been developed over several courses that I have taught in the last four years. The notes posted here are from Spring 2018 and would not be possible without the hard work of my recent Ph.D. student Damian Straszak.
The notes are still rough and in the next few months, I will be working on them to improve the content and the presentation. It would be great to have your comments, high or low level, to help me in this process. Please feel free to comment on the webpage or send me an email.
Preliminaries, Convexity, Duality
In this lecture, we develop the basic mathematical preliminaries and tools to study convex optimization. These include some standard facts from multivariate calculus and linear algebra, convex sets and functions, and the important notion of Lagrangian and Fenchel duality.
Convex Programming and Efficiency
In this lecture, we formalize convex programming problems, discuss what it means to solve them efficiently and present various ways in which a convex set or a function can be specified.
This lecture introduces gradient descent — a meta-algorithm for unconstrained minimization. Under convexity of the objective, we prove a convergence time bound when the gradient of the function is Lipschitz. Subsequently, it is shown how to use this continuous optimization method to come up with a fast algorithm for a discrete optimization problem: computing maximum flows in a graph.
Mirror Descent and the Multiplicative Weight Update Method
In this lecture, we derive a new optimization algorithm — called Mirror Descent — via a different local optimization principle. First, the Mirror Descent algorithm is developed for optimizing convex functions over the probability simplex. Subsequently, we show how to generalize it and, importantly, derive the Multiplicative Weights Update (MWU) algorithm from it. This latter algorithm is then used to present a fast approximate algorithm to solve the bipartite matching problem on graphs.
Nesterov’s Accelerated Gradient Method
In this lecture, we derive the Accelerated Gradient Descent algorithm whose convergence rate is quadratically better than the standard gradient descent. The method and the analysis that is presented here works for any pair of dual norms.
In this lecture, we derive and analyze Newton’s method which is an example of a second-order optimization method and has its origins in finding roots of functions. We argue that Newton’s method can be seen as gradient descent on a Hessian manifold, which then motivates an affinely invariant analysis of its convergence.
Primal Path Following Interior Point Methods
In this lecture, we build on Newton’s method and its convergence presented in the previous lecture to derive a polynomial time algorithm for Linear Programming. Key to the reduction from constrained to unconstrained optimization is the notion of barrier functions and the corresponding central path.
Variants of the Path Following Interior Point Method and Self-Concordance
In this lecture, we present various generalizations and extensions of the path following IPM which we have introduced in the previous lecture for the case of Linear Programming. As an application, we derive a fast algorithm for the minimum cost flow problem. Subsequently, we introduce the notion of Self-Concordance and present various barrier functions such as Vaidya’s barrier function, Lee-Sidford’s barrier function, and the Canonical barrier function.
Cutting Plane and Ellipsoid Methods for Linear Programming
In this lecture, we introduce a class of cutting plane methods for convex optimization and present an analysis of a special case of it: the ellipsoid method. We then show how to use this ellipsoid method to optimize linear programs with exponentially many constraints when only the separation oracle is provided. Examples include the matching polytope for general graphs and matroid polytopes.
Convex Programming using the Ellipsoid Method
In this lecture, we show how to adapt the Ellipsoid method to solve more general convex programs other than linear programming. This allows us to give a polynomial time algorithm for submodular minimization and apply it to the problem of computing maximum entropy distributions. Submodular minimization allows us, in turn, to obtain separation oracles for matroid polytopes. We end with faster variants of the Ellipsoid method due to Vaidya and Lee-Sidford-Wong.
Beyond Convexity: Geodesic Convexity and Optimization
Sometimes, functions that are non-convex in the Euclidean space turn out to be convex if one introduces a suitable metric on the space and redefines convexity with respect to the straight lines (“geodesics”) induced by the metric. Such a function is called “geodesically convex” and, when a function has this property and how to optimize over it, is not well-understood. This lecture introduces geodesic convexity and presents applications to certain non-convex matrix optimization problems.