MSIA 421: Data Mining

Quarter Offered

Winter ; Edward C. Malthouse


This course is part of a three-course sequence on statistical learning models, which also includes MSiA 401 (Statistical Methods for Data Mining) and 420 (Predictive Analytics). This course will define “data mining” and discuss its relationship with “probabilistic/statistical models.” Both approaches consist of two types of models, supervised learning models, where the objective is to uncover and model structure in the joint density of multiple observed variables. The focus of this course will be on understanding and using unsupervised learning methods and applying the methods to large, real-world datasets from a company. The class usually applies the methods to project involving segmenting customers, personalizing contact points and quantifying the long-term effects of these actions.

Topics include: clustering (k-means, partitioning, mixture models), dimension reduction (principal components/factor analysis), recommender systems (association rules, content-based and collaborative filtering, matrix decomposition methods), and customer lifetime value.