COMP_SCI 396: Introduction to the Data Science Pipeline

Quarter Offered

Winter : 9:30 - 10:50 TuTh ; Hu
Summer : 10 -11:40 TuTh Eight-week format ; Hu


CS 110 or CS 150 or 211 or graduate students or instructor consent


This course aims to cover various tools in the process of data science for obtaining, cleaning, visualizing, modeling, and interpreting data. Most of the tools introduced in this course will be based on Python, although the idea can be applied to similar tools in other programming languages. As the outcome of this course, the students should be able to independently work on real-life datasets with large scales and gain insights from them.

PREREQUISITES: CS 110 or CS 150 or graduate students or instructor consent



Related Materials

    1. “Python Data Science Handbook: Essential Tools for Working with Data” by Jake VanderPlas
    2. “Learning Data Mining with Python” by Robert Layton


Grades will be assigned according to the distribution below. Letter grades will be assigned based on the default percentage-to-letter-grade mapping on Canvas. There will be a course project where students will experience the whole data science pipeline based on real data.

  • Homework assignments (30%)
  • Midterm exam (20%)
  • Course Project (50%)

Course Outline

  • Introduction to Data Science Pipeline (1 lecture)
  • Obtaining Data (1 lecture)
  • Data management (2 lectures)
    • Relational databases
    • Scrubbing/Cleaning data
  • Exploratory Data Analysis (5 lectures)
    • Overview
    • Dimensionality reduction
    • Statistical and hypothesis testing
    • Data visualization
    • In-class demonstration
  • Midterm Exam (1 lecture)
  • Modeling Data with Machine Learning (5 lectures)
    • Overview
    • Basic concept of applied text mining
    • Applied text mining using NLTK
    • Basic concept of network analysis
    • Large scale network analysis using NetworkX
  • Interpreting Data and Storytelling (2 lectures)
    • Data visualization
    • Data Storytelling
  • Project Presentation (1 lecture)
  • Course Review (1 lecture)
  • Final Exam