COMP_SCI 217 : Data Management and Information Processing

Quarter Offered

Fall : 9:40-11 TuTh ; Hu
Spring : 9:30-10:50 TuTh ; Hu


COMP_SCI 110 or COMP_SCI 111 or COMP_SCI 211 or COMP_SCI 230 or other programming experience


This course will teach students to how organize and analyze real-world data sets using tools that are most commonly used in the business world.  In particular, students will learn the SQL language for analyzing data in relational databases.  Students will also learn the details of common data encodings (integer, floating point, fixed point, text, date and time), how such data are structured in data files (CSV, JSON, XML), and how to model complex data sets as a series of SQL tables.  In other words, students will learn how to organize large data sets, and to answer questions using that data.  The SQL skills taught in COMP_SCI 217 are essential for “data science” practitioners, especially when working with business data. COMP_SCI 217 is all about data, but not really about statistics, visualization, or programming (except SQL, which will be taught).

COMP_SCI 217 is different from the COMP_SCI 339 and ELEC_ENG/COMP_ENG 495 “Introduction to Databases” courses that we offer to computer science students in that COMP_SCI 217 does not teach the details of how database management systems are built.  In other words, the students in this class will learn how to use a system like MySQL, not how to build MySQL from scratch.

Homework assignments will use the SQLite and MySQL database management systems.

  • Formerly COMP_SCI 317
  • NOTE: This course does not count for credit for CS and CE majors (they are expected to take COMP_SCI 339) – however, it counts for other majors.

COURSE INSTRUCTOR: Prof. Hu (Fall & Spring)



  • Bits and bytes: how data is represented in computers
    • Integers, two’s complement, fixed point
    • Floating-point numbers
    • Date, time, and text encodings
  • Data modeling for relational databases
    • Primary and foreign keys
    • Table relationships (many-to-one, many-to-many, and subsets)
    • Functional dependencies and basic normalization ideas
  • Structured Query Language (SQL)
    • SELECT statements
    • INNER and LEFT JOINs
    • Subqueries and combining selects
    • Views
  • Advanced Topics
    • Indexes
    • Regular Expressions
    • Semi-structured data in JSON and XML files
    • Cleaning messy data
    • NoSQL & Big Data
    • Recursive queries on networks


  • Homework assignments (6 × 6.67% = 40%)
  • Midterm exam (25%)
  • Final exam (35%)

COURSE OUTCOMES: After completing this course, a student should be able to:

  • Draw a data model diagram to represent a complex data set.
  • Choose appropriate data types to store various data.
  • Define data integrity constraints using primary, foreign, and unique keys.
  • Define indexes to optimize the performance of particular queries on a database.
  • Implement a data model with “CREATE TABLE” commands in the SQL language.
  • Load data into the database tables from CSV and other data file formats.
  • Write complex SQL SELECT queries to answer various questions using the database.