# Academics  /  Courses  /  DescriptionsIEMS 304: Statistical Learning for Data Analysis

### Quarter Offered

Fall : MWF 1:00-1:50 PM (Lab: F 11:00 AM, 12:00 PM) ; Apley
Winter : MWF 1:00-1:50 PM (Lab: F 11:00, 12:00 PM) ; Z. Wang
Spring : MWF 12:00-12:50 PM (Lab: F 10:00 AM, 11:00 AM) ; Z. Wang

### Prerequisites

IEMS 303 or equivalent; CS 150 or equivalent

### Description

Predictive modeling of data using modern regression and classification methods. Multiple linear regression; logistic regression; pitfalls and diagnostics; nonparametric and nonlinear regression and classification such as trees, nearest neighbors, neural networks, and ensemble methods.

• This course counts as an IE/OR elective for Industrial Engineering.

LEARNING OBJECTIVES

• Understand common data structures in modern predictive and explanatory modeling problems in business, engineering and the sciences and how to formulate the most appropriate solutions
• Learn R statistical software basics and how to use it for regression and classification problems
• Develop ability to fit appropriate linear and logistic regression models, including model selection and diagnostics
• Develop ability to interpret fitted linear and logistic regression models for explanatory and predictive purposes
• Learn fundamental concepts in nonlinear regression and classification, including maximum likelihood estimation, cross-validation, ridge and lasso shrinkage
• Learn how to fit and interpret popular supervised learning models including trees, smoothers, nearest neighbors, random forests, and boosted trees

TOPICS

• Multiple linear regression basics:  model fitting, statistical inference, prediction
• Multiple linear regression:  influence, residual diagnostics, multicollinearity, interactions, categorical predictors, variable selection, model evaluation criteria, ridge and lasso regression
• Logistic regression: model fitting and interpretation, statistical inference, diagnostics
• Nonlinear regression basics:  maximum likelihood estimation, nonlinear least squares, cross-validation, bootstrapping
• Classification and regression trees
• Nearest neighbors for classification and regression
• Boosted trees and random forests
• R statistical software throughout the course

MATERIALS

Required:  An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, Springer.  ISBN 978-1-4614-7138-7. Electronic version available free.