IEMS 455: Machine Learning

Quarter Offered

Spring : MW; 4-5:20 PM ; Nocedal


Strong foundations in statistics and probability (at the undergraduate level), multivariate calculus and linear algebra, and good knowledge of a computer programming (C/C++, Python, etc). Interest in working with large data sets, desire to experiment with machine learning models, and to learn and implement modern optimization methods. The course is recommended for graduate students (or seniors with strong math and computing backgrounds).


The course provides a survey of large-scale machine learning with emphasis on neural networks and kernel methods. The course discusses model formulation, large-scale applications and training (optimization). Case studies include: text classification, image and speech recognition, and recommender systems. On the practical side, students will be asked to construct deep neural networks (using Theano) and use them on large data sets. On the theoretical side, the course will provide an understanding of the optimization process as well as a brief introduction to learning theory.

  • This is not a required class. The course is recommended for graduate students (or seniors with strong math and computing backgrounds).


  • Students will learn optimization methods used in machine learning
  • Students will be able to apply regression, classification, recommender systems and Deep Learning
  • Students will learn how to construct Neural Networks in Theano
  • Student will learn basic learning theory
  • Student will work on a project of their choice and explore that domain in greater depth


  • Introduction to Machine Learning
    • Case Studies of Supervised Learning
    • Text Classification
    • Deep Neural Networks in Speech and Image Recognition
    • Risk Minimization, training-validation-testing, structure
    • Brief review of linear regression
      • Normal equations vs. the stochastic gradient method
      • Logistic Regression
        • Model and Loss function
        • Decision boundary
        • Stochastic and Batch optimization
        • Multi-class classification
        • Practicalities
        • Neural Network – Basics
          • Motivation
          • Feed forward Neural networks
          • Examples
          • Back-propagation (algorithmic differentiation)
          • Stochastic gradient method
          • Multi-class classification
          • Deep Neural Networks
            • Modular construction
            • Convolutional networks
            • Recurrent networks
            • Practicalities
            • Theano tutorial
            • Optimization methods
              • Batch vs. Stochastic
              • Understanding the stochastic gradient method         
              • Mini-batches, iterate averaging, momentum
              • Noise reducing methods
              • Gauss-Newton, Newton, quasi-Newton
              • Generative Learning Algorithms
                • Gaussian discriminant analysis
                • Relationship to logistic regression
                • Naïve Bayes
                • Support Vector Machines
                  • Maximum margin classifier
                  • Primal and Dual formulations
                  • Kernel methods
                  • Constructing Kernels
                  • Text classification: practical issues
                  • Sparsity Inducing Models
                    • L1 regularization
                    • Optimization methods for L1 regularized models
                    • Recommender Systems
                      • Content based and collaborative filtering
                      • Low rank matrix factorization
                      • Optimization: coordinate descent
                      • Model and Feature Selection
                      • Learning Theory
                        • Bias-variance trade-off
                        • VC dimension


  • Required: (none)
  • Recommended:
    • o Pattern Recognition and Machine Learning, Christopher Bishop, ISBN: 978-0387310732
    • o Deep Learning, Ian Goodfellow, Yoshua Bengio and Aaron Courville, ISBN: 978-0262035613

1. Christopher Bishop, “Pattern Recognition and Machine Learning” Springer 2006

2.  Online book by Goodfellow, Bengio and Courville “Deep Learning”.