EECS 395, 495: Optimization Techniques for Machine Learning and Deep Learning

Quarter Offered

Fall : MW 9:30-11 ; Borhani, Watt


A thorough understanding of Linear Algebra and Vector Calculus, and strong familiarity with the Python programming language (e.g., basic data manipulation libraries, how to construct functions and classes, etc.). Python will be used for all coding assignments. No other language can be used to complete programming assignments.


Fundamental​ ​machine​ ​learning​ ​algorithms​ ​are​ ​now​ ​baked​ ​into​ ​almost​ ​every​ ​moment​ ​of  our​ ​digital​ ​lives.​ ​​Machine​ ​learning​ ​is​ ​there​ ​when​ ​we​ ​surf​ ​the​ ​web​ ​using​ ​Google,​ ​when​ ​we​ ​upload​ ​photos​ ​to  Facebook​ ​and​ ​our​ ​friends​ ​are​ ​automatically​ ​tagged,​ ​or​ ​when​ ​Amazon​ ​recommends​ ​products​ ​to​ ​us.​ ​​​And  machine​ ​learning​ ​is​ ​the​ ​lynchpin​ ​of​ ​many​ ​technologies​ ​that​ ​lie​ ​just​ ​over​ ​the​ ​horizon​ ​-​ ​from​ ​the​ ​self​ ​driving  cars,​ ​to​ ​chatbots​ ​and​ ​fully​ ​formed​ ​digital​ ​assistants,​ ​to​ ​fully​ ​automated​ ​factories.

This​ ​course​ ​offers​ ​a​ ​holistic​ ​and​ ​hands-on​ ​introduction​ ​to​ ​the​ ​fundamentals​ ​of​ ​mathematical​ ​optimization​ ​for  machine​ ​learning​ ​and​ ​deep​ ​learning.​ ​​Using​ ​a​ ​range​ ​of​ ​real​ ​datasets​ ​and​ ​basic​ ​Python​ ​libraries​ ​for​ ​data  manipulation,​ ​vector/matrix​ ​algebra,​ ​and​ ​automatic​ ​differentiation​ ​students​ ​will​ ​code​ ​up​ ​-​ ​from​ ​scratch​ ​-  fundamental​ ​optimization​ ​algorithms​ ​for​ ​popular​ ​machine​ ​learning​ ​/​ ​deep​ ​learning​ ​models​ ​including:​ ​least  squares​ ​linear​ ​and​ ​nonlinear​ ​regression,​ ​logistic​ ​regression,​ ​support​ ​vector​ ​machines,​ ​feedforward​ ​neural  networks,​ ​convolutional​ ​networks,​ ​gradient​ ​boosting,​ ​K-Means,​ ​Principal​ ​Component​ ​Analysis,​ ​and  Recommender​ ​Systems.  

  • This course fulfills the Technical Elective requirement.

INSTRUCTORS: Dr. Reza Borhani and Dr. Jeremy Watt


1. Machine learning / deep learning overview in the context of mathematical optimization

  • a. Supervised learning and unconstrained optimization: deep nets, trees, and kernels
  • b. The tools in the warchest: first and second order methods
  • c. Deep learning and engineering-focused first order methods
  • d. Dimension reduction, clustering, and constrained optimization
  • e. Dealing with discontinuities via heuristic methods

2. Overview of fundamental mathematical and computational tools

  • a. Review of Linear Algebra, probability and statistics
  • b. Calculus of multivariate functions, Taylor series and local function approximation
  • c. Optimality conditions for unconstrained / constrained optimization
  • d. Numerical methods for computing derivatives, automatic differentiation
  • e. Numerical computation of eigenvalues / singular values

3. Unconstrained optimization via local search

  • a. Local search as a general optimization paradigm
  • b. Random local search and the curse of dimensionality
  • c. Steplength rules and intuition
  • d. Formal principles of random local search

4. First order methods: gradient descent and extensions

  • a. Gradient descent - normalized and unnormalized versions
  • b. Step size selection and convergence
  • c. Conservative theoretical guarantees and linesearch
  • d. Formal principles of gradient descent
  • e. Steepest descent - norm generalization
  • f. Batch / Stochastic gradient descent
  • g. Variations on normalized stochastic gradient descent (e.g., RMSprop, Adagrad)

5. Second order methods: Newton and quasi-Newton methods

  • a. Newton’s method basics - normalized and unnormalized versions
  • b. Newton’s method, stepsize, and backtracking linesearch
  • c. Honest adjustments for non-convex functions
  • d. Scaling issues with Newton’s method
  • e. Newton and secant method as zero-finding algorithms
  • f. Quasi-Newton Methods
  • g. BFGS and L-BFGS algorithms

6. Methods of constrained optimization

  • a. Projected / proximal gradient algorithms
  • b. Barrier and penalty methods
  • c. Interior point methods
  • d. Primal-dual methods

COURSE HAND-OUTS: This course has no required textbook. Handouts authored by the instructors will be made freely available to students for notes. In addition a small number of relevant sections from the author’s textbook Machine Learning Refined (Cambridge University Press) will be made freely available to students.

PROBLEM SETS: 6-7 problem sets will be assigned and graded.

COURSE PROJECT: There will be one course project worth 25% of the final grade.

COURSE GRADE: Final grades for the course are based on homework (75%) and course project (25%)