Courses
  /  
Descriptions
EECS 317: Data Management and Information Processing

Quarter Offered

Fall : 1-1:50 MWF ; Tarzia
Spring : 2-3:20 TuTh ; Hardavellas

Prerequisites

Enrollment Requirements: Students must have completed EECS 211 or EECS 230 or EECS 111 to successfully enroll in this course. If you are a Graduate student and would like to enroll into this course please email the professor for a permission number.

Description

This course will teach students to how organize and analyze real-world data sets using tools that are most commonly used in the business world.  In particular, students will learn the SQL language for analyzing data in relational databases.  Students will also learn the details of common data encodings (integer, floating point, fixed point, text, date and time), how such data are structured in data files (CSV, JSON, XML), and how to model complex data sets as a series of SQL tables.  In other words, students will learn how to organize large data sets, and to answer questions using that data.  The SQL skills taught in EECS-317 are essential for “data science” practitioners, especially when working with business data.  EECS-317 is all about data, but not really about statistics, visualization, or programming (except SQL, which will be taught).

EECS-317 is different from the EECS-339 and EECS-495 “Introduction to Databases” courses that we offer to computer science students in that EECS-317 does not teach the details of how database management systems are built.  In other words, the students in this class will learn how to use a system like MySQL, not how to build MySQL from scratch.

Homework assignments will use the SQLite and MySQL database management systems.

  • NOTE: This course does not count for credit for CS and CE majors (they are expected to take EECS 339) – however, it counts for other majors.

COURSE INSTRUCTOR: Dr. Stephen Tarzia (Fall), Prof. Nikos Hardavellas (Spring)

COURSE OUTLINE:

  • Bits and bytes: how data is represented in computers
    • Integers, two’s complement, fixed point
    • Floating-point numbers
    • Date, time, and text encodings
  • Data modeling for relational databases
    • Primary and foreign keys
    • Table relationships (many-to-one, many-to-many, and subsets)
    • Functional dependencies and basic normalization ideas
  • Structured Query Language (SQL)
    • SELECT statements
    • INNER and LEFT JOINs
    • Subqueries and combining selects
    • Views
  • Advanced Topics
    • Indexes
    • Regular Expressions
    • Semi-structured data in JSON and XML files
    • Cleaning messy data
    • NoSQL & Big Data
    • Recursive queries on networks

REQUIRED BOOKS:

GRADING:

  • Homework assignments (6 × 6.67% = 40%)
  • Midterm exam (25%)
  • Final exam (35%)

COURSE OUTCOMES: After completing this course, a student should be able to:

  • Draw a data model diagram to represent a complex data set.
  • Choose appropriate data types to store various data.
  • Define data integrity constraints using primary, foreign, and unique keys.
  • Define indexes to optimize the performance of particular queries on a database.
  • Implement a data model with “CREATE TABLE” commands in the SQL language.
  • Load data into the database tables from CSV and other data file formats.
  • Write complex SQL SELECT queries to answer various questions using the database.