MSIT 423: Data Mining and Business Intelligence

Quarter Offered

Spring : Saturday, 2:00pm-5:00pm ; Edward Malthouse, Ph.D


In the rapidly changing business environment, with global competition and maturing markets, competitive advantage is extremely important. Business can exploit the wealth of massive amounts of data being collected through operational processes as well as from external sources. This course introduces techniques for data mining and its use in various business applications to enable business intelligence. The course uses both hands-on experience using state-of-the-art data mining tools to model business problems and discover interesting patterns for decision support as well as several cases that discuss strategies, outcomes and impact on organizations when using data mining.

REQUIRED TEXT: James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning, Springer, 2013.


COURSE GOALS: Students will understand and manage the entire process of using data to make better business decisions: extraction, cleaning, understanding, modeling, and presenting. Students will also understand the limitations of data. 


Week 1: Introduction to Predictive Analytics

  • Course introduction
  • Simple linear regression
  • Multiple linear regression, interpretation, and basic inference
  • Readings: JWHT, sections 3.1, 3.2, 3.6.1-3.6.3

Week 2: Model Accounting and Multicollinearity

  • Extra and partial sums of squares, R-squared
  • Newfood and Quality Control cases
  • Multicollinearity
  • Quality control case
  • Residual, QQ and influence plots
  • Readings: JWHT, section

Week 3: Diagnostics and Transformations

  • Transformations, the multiplicative model, polynomials
  • Business failure and purifier cases
  • Readings: JWHT

Week 4: Categorical Predictor Variables, Interactions and Logistic Regression

  • Dummy variables
  • Interactions
  • Logistic regression
  • Readings: JWHT 3.6.4, 3.6.6, 4.1-4.3 (skip discriminant analysis)

Week 5: Model Evaluation, Selection and Regularization

  • Confusion tables, ROC curves, AUC
  • Penalized measures of fit
  • Test sets and k-fold cross validation
  • Variable subset selection
  • Ridge regression and the lasso
  • Readings: JWHT 5.1, 5.3.1, 5.3.3; 6.1, 6.2, 6.5, 6.6

Week 6: Midterm and Smoothing

  • In-class midterm, 80 minutes, covers chapters 3 and 4 (not 5 and 6)
  • Bin smoothers, k-nearest neighbors
  • Step functions, piecewise linear models and cubic splines
  • Readings: JWHT Ch. 7.1-6

Week 7: GAMS and Trees

  • Generalized additive models
  • CART
  • Readings: JWHT sections 7.7, 8.1, 8.3.1, 8.3.2

Week 8: Bagging, Random Forests, Principal Components

  • Bagging and random forests
  • Stumps, shrubs, boosted trees as time permits
  • Principal component analysis
  • JWHT sections 8.2, 8.3.3, 10.1-2, 10.4

Week 9: Clustering and Recommendation Systems

  • K-means and hierarchical clustering
  • Distance metrics
  • Overview of recommendation systems: popularity, user-based, item-based, SVD as time permits
  • JWHT sections 10.3, 10.5; Ekstrand chapter on Canvas

Week 10: Project Presentations

Week 11: Final Exam Due

HOMEWORK ASSIGNMENTS: There will be weekly recommended homework problems (with answers). You will have an in-class midterm and a take-home final that must be completed individually.


  • Homework: 20%
  • Midterm: 25%
  • Project: 20%
  • Final: 35% 

COURSE OBJECTIVES: As a result of this course, students will be able to:

1. Identify data-collection biases;
2. Design effective graphics presentations of data;
3. Estimate and interpret classical and data mining models using the R software package;
4. Draw conclusions about causal relationships and recommend actions that should be taken based on an analysis;

Faculty Profile

Edward Malthouse, Ph.D