Materials for ORIE 7191: Topics in Optimization for Machine Learning
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
syllabus add more papers Jan 22, 2019 add more papers Jan 22, 2019

ORIE 7191: Topics in Optimization for Machine Learning

This reading course will explore modern challenges at the interface of continuous optimization and machine learning. Our inquiry will be guided by two motivating questions:

  1. Can we use classical ideas in optimization to better understand (and improve) algorithms for challenging problems in machine learning?
  2. How can modern insights in machine learning guide the design of new and improved methods for optimization?

Topics may include low rank optimization, generalization in deep learning, regularization (implicit and explicit) for deep learning, connections between control theory and modern reinforcement learning, and optimization for trustworthy machine learning (including fair, causal, or interpretable models). Topics may change according to student interest.

The format will consist of student-guided discussion on papers. The course will culminate in a final research project which will constitute a majority of the grade. Students will deliver short presentations on their projects in class in addition to written reports.


  • When: 11:40 - 12:55 TTh
  • Where: Hollister Hall 372 or via Zoom
  • Quiz (same link for all quizzes)
  • CMT for paper reviews and peer reviews
  • Slack (use your address) for general questions and comments.

Office hours will be chosen via an in-class poll. You can also talk with Prof. Udell after class or contact her by email at to set up an appointment.

Course format and requirements

Students are required to attend class, with at most two absences. (Up to two classes may also be attended remotely via Zoom by prior arrangement with the instructor. Students at Cornell Tech may attend all classes remotely.)

Course grades have four components (weights in parentheses for 3 credit option):

  • Reviews (.3): Students will write reviews of some (about half) of the papers we read for class and upload them to CMT.
  • Quiz (.1): Most classes will begin with a two-question quiz. The questions will be easy to answer if you read the paper. Grading on this portion of the course will be nonlinear (eg, 0 points if too few quiz questions are answered correctly).
  • Present a paper (.3): Students will each lead the discussion twice (or so), possibly in teams depending on course enrollment. The student leading the discussion will prepare a 30 minute presentation using slides, two true/false or multiple choice questions on the paper, and a class activity or questions for discussion. The presentations will be graded by peer review.
  • Research project (.3): The final research project should be on a topic with connections to both continuous optimization and machine learning. Projects can be in teams of up to two students except by special advance permission from the instructor. Students will prepare an initial project proposal, midterm report, and final report on the project, and will make an in-class presentation. Research projects can (and should!) advance your PhD research.

Students taking the class for 1 credit will be required to write fewer reviews, present only one paper, and will not be required to complete the research project. The quiz requirement will be the same.


This google doc serves as our schedule. Sign up for presentation slots on the google doc by adding your names and a link to the paper you'll present. (Make sure not to choose a paper someone else has already picked!) We may spend more or less time on a topic depending on student interest.


The course readings will be selected from the following, or other papers suggested by students in the class.

Low Rank Optimization

Our tour begins with low rank optimization. Many of the ideas developed (rigorously) for low rank optimization carry over (heuristically) to deep learning. Here we see two ways to formulate the problem: as a biconvex problem, or as a convex problem with a low rank constraint (or, sometimes, side information). Here, we know how to provably find the solution to the problem (using convex methods); we can sometimes prove that the nonconvex method finds the same solution, and can always evaluate the quality of any putative solution relative to the true optimal solution.

Smooth nonconvex low rank optimization

  • Barvinok 1995, Pataki 1998 show that rank of exact solutions to SDP with m linear constraints grow as O(sqrt(m))
  • Barvinok 2002 p. 140: rank of epsilon approximation to a PSD matrix satisfying m linear constraints grows as log(m / epsilon)
  • So, Ye, and Zhang 2008 Similar to above low rank approximation idea, more algorithmic. Still requires solving the SDP first.
  • Hazan 2008 suggests using Franke Wolfe for semidefinite programming
  • Garber and Hazan 2011 integrates the above idea into a method to produce approximate solutions of SDP
  • Jaggi 2013 Solving matrix completion with Frank Wolfe ensures rank is bounded by number of iterations
  • Yurtsever et al. 2017 The first storage-optimal convex method for matrix completion

Nonsmooth convex low rank optimization

Optimization for Deep Learning

What properties of deep nets make them easy or hard to optimize?

Is optimization enough? Is SGD necessary to provide implicit regularization?

Can we improve deep learning using more sophisticated ideas from optimization?

Learn better with fewer parameters

Other interesting ideas connecting optimization and deep learning

(Deep) Learning Regularizers

Continuous Control and Reinforcement Learning

Optimization for Trustworthy Machine Learning

Trustworthy machine learning is an umbrella term that includes

  • Privacy-preserving Statistics and Machine Learning
  • Fairness
  • Interpretability
  • Robust Statistics and Machine Learning
  • Causality
  • Adversarial Examples

Some nice recent paperson the topic that are fair game for this class: