COMP_SCI 217 : Data Management and Information Processing



COMP_SCI 110 or COMP_SCI 111 or COMP_SCI 150 or COMP_SCI 211 or other programming experience


This course will teach students how to organize and analyze real-world data sets using tools that are most commonly used in the business world. In particular, students will learn the SQL language for analyzing data in relational databases. Students will also learn the details of common data encodings (integer, floating point, fixed point, text, date and time), how such data are structured in data files, and how to model complex data sets as a series of SQL tables. In other words, students will learn how to organize large data sets, and to answer questions using that data. The SQL skills taught in COMP_SCI 217 are essential for “data science” practitioners, especially when working with business data. COMP_SCI 217 is all about data, but not really about statistics, visualization, or programming (except SQL, which will be taught). Homework assignments will use the SQ database management systems. Some homework needs basic python programming as well.
COMP_SCI 217 is different from the COMP_SCI 339 and ELEC_ENG/COMP_ENG 495 “Introduction to Databases” courses that we offer to computer science students in that COMP_SCI 217 does not teach the details of how database management systems are built. In other words, the students in this class will learn how to use a database system, not how to build it from scratch.Formerly COMP_SCI 317.

  • Formerly COMP_SCI 317
  • NOTE: This course does not count for credit for CS and CE majors (they are expected to take COMP_SCI 339) – however, it counts for other majors.

COURSE INSTRUCTOR: Prof. Joe Hummel (Spring) and Prof. Huiling Hu (Fall & Winter)

COURSE COORDINATOR: Prof. Hu and Prof. Hummel


  • Structured Query Language (SQL)
    • Basic SQL statements
    • JOINs and aggregates
    • Subqueries and combining selects
    • Advanced queries
  • Data modeling for relational databases
    • Primary and foreign keys
    • Table relationships (many-to-one, many-to-many, and subsets)
    • Design database model
    • Representation of data: Integers, Floating-point numbers, text, etc
  • Beyond SQL database
    • Databases in a distributed setting
    • NoSQL


  • Homework assignments (60%, 6 assignments)
  • Exam I (20%)
  • Exam II (20%)

COURSE OUTCOMES: After completing this course, a student should be able to:

  • Draw a data model diagram to represent a complex data set.
  • Choose appropriate data types to store various data.
  • Define data integrity constraints using primary, foreign, and unique keys.
  • Define indexes to optimize the performance of particular queries on a database.
  • Implement a data model with “CREATE TABLE” commands in the SQL language.
  • Load data into the database tables from CSV and other data file formats.
  • Write complex SQL SELECT queries to answer various questions using the database.