MSIA 413: Introduction to Databases & Information Retrieval

Quarter Offered

Fall ; Nikos Hardavellas


MSiA-413 teaches data engineering skills that are essential for “data science” practitioners, in particular how to model, organize, store and analyze data in modern relational database management systems (e.g., MySQL, postgress) using the SQL language. During the course of the quarter, the students will access and analyze real-world datasets (e.g., Yelp, stackoverflow) with complex data modalities (e.g., GPS coordinates, UTF-8, integers, floating point). While we may delve into the inner-workings of database systems at times, where appropriate, and discuss how they are built and the algorithms they use, this will not be the main focus. MSiA-413 is all about data, but not about statistics, visualization, or programming (except SQL, which will be taught at length). Familiarity with programing is, however, expected.

More specifically, the course objectives are to:

  1. Understand the representation details and operations of standard data formats (e.g., integer, FP, fixed point, UTF-8, time, JSON, XML)
  2. Model complex data sets and their relationships to create effective relational databases
  3. Create and populate databases with real-world data using a series of SQL tables
  4. Access and analyze complex data in relational databases using SQL (selects, joins, sets, quantifiers, predication, views, recursive queries on networks)
  5. Devise and enforce data integrity rules in SQL (cascading, null values, triggers, exceptions, conflict resolution)
  6. Access data using on-line transactions
  7. Understand and effectively use modern system optimizations (indexing, partitioning, memory hierarchy)