Academics
  /  
Courses
  /  
Descriptions
IEMS 404-2: Predictive Analytics II


VIEW ALL COURSE TIMES AND SESSIONS

Prerequisites

IEMS 401 and IEMS 462-1 or equivalent

Description

  • This course covers nonparametric modeling of complex, nonlinear predictive relationships between variables. Covered supervised learning methods include neural networks, trees, nearest neighbors, local kernel weighting, boosted trees, random forests, support vector machines, and naive Bayes. Emphasis is on practical implementation of predictive modeling, as well as theoretical concepts for building a deeper understanding of the methods.

Learning Objectives

  • This course is the second part of a two-part sequence and builds on material covered in the first part, IEMS 462-1. The broad objective of the two-part sequence is to cover concepts and tools that enable students to skillfully build, interpret, and use predictive models with medium to large data sets.
  • Whereas 462-1 focuses on classical parametric models (primarily linear and logistic regression and some generalized linear models), in this course students will learn nonparametric models that are able to capture complex, nonlinear predictive relationships between variables.
  • Although this course is applied in that the emphasis is on how to effectively implement predictive modeling, students will also learn certain theoretical concepts that help build an understanding of the methods.

Topics

  • (Brief review of) fundamental concepts in supervised learning, including MLE, nonlinear least squares, bootstrapping, Fisher information, model selection and evaluation, shrinkage/regularization and bias vs. variance tradeoff, ridge regression, LASSO, stepwise regression, ideal Bayes classifier and predictor, imbalanced response classes
  • Neural networks
  • Visualizing black-box predictive models
  • Classification and regression trees
  • Nearest neighbors
  • Local methods and kernel smoothing
  • Generalized additive models, projection pursuit regression, and basis functions
  • Ensemble methods: Bagging, stacking, boosted trees, random forests
  • Support vector machines
  • Naive Bayes
  • Unsupervised learning concepts in predictive modeling, including PCA, nonlinear dimensionality reduction, kernel density estimation

Materials

  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed., by Trevor Hastie, Robert Tibshirani, Jerome Friedman, Springer, 2009.
  • An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, Springer Texts in Statistics. ISBN 978-1-4614-7138-7.
  • Class Notes

Prerequisites

  • IEMS 401 and IEMS 462-1 or equivalent: An in-depth understanding of linear and logistic regression, including model fitting, maximum likelihood estimation, variable and model selection, multicollinearity, outliers, ridge and lasso regression. Familiarity with applied linear algebra and basic Bayesian statistics is also assumed. Proficiency in programming with data analysis software (R or Python) is assumed.