IEMS 304: Statistical Learning for Data Analysis

Quarter Offered

Fall : MWF 1:00-1:50 (Lab: TH 11:00, 12:00) ; Apley
Winter : MWF 1:00-1:50 (Lab: W 3:00, 4:00) ; Nelson
Spring : TTH 12:30-1:50 (Lab: F 12:00, 1:00) ; Malthouse


IEMS 303 and EECS 111, or equivalents


Predictive modeling of data using modern regression and classification methods. Multiple linear regression; logistic regression; pitfalls and diagnostics; nonparametric and nonlinear regression and classification such as trees, nearest neighbors, neural networks, and ensemble methods.

  • This course counts as an IE/OR elective for Industrial Engineering.


  • Understand common data structures in modern predictive and explanatory modeling problems in business, engineering and the sciences and how to formulate the most appropriate solutions
  • Learn R statistical software basics and how to use it for regression and classification problems
  • Develop ability to fit appropriate linear and logistic regression models, including model selection and diagnostics
  • Develop ability to interpret fitted linear and logistic regression models for explanatory and predictive purposes
  • Learn fundamental concepts in nonlinear regression and classification, including maximum likelihood estimation, cross-validation, ridge and lasso shrinkage
  • Learn how to fit and interpret popular supervised learning models including trees, smoothers, nearest neighbors, random forests, and boosted trees


  • Multiple linear regression basics:  model fitting, statistical inference, prediction
  • Multiple linear regression:  influence, residual diagnostics, multicollinearity, interactions, categorical predictors, variable selection, model evaluation criteria, ridge and lasso regression
  • Logistic regression: model fitting and interpretation, statistical inference, diagnostics
  • Nonlinear regression basics:  maximum likelihood estimation, nonlinear least squares, cross-validation, bootstrapping
  • Classification and regression trees
  • Nearest neighbors for classification and regression
  • Boosted trees and random forests
  • R statistical software throughout the course


Required:  An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, Springer.  ISBN 978-1-4614-7138-7. Electronic version available free.