MSIA 430: Introduction to Data Management for Business Intelligence

Quarter Offered

Spring ; Goce Trajcevski


This course is a continuation of MSiA 490 (Introduction to Data Management) and will focus on exploring the management of a subset of the five V’s[1]  of the big data, in the context of improving the decision-making capabilities in business intelligence. Specifically, the course will accentuate the Volume and Velocity (and, in an implicit manner, the other three V’s).

The course will consist of “2.5” major units:

The first part will focus on the notion of Data Warehousing (DW). Here, the main “V” will be the Volume – we will see how large amounts of “raw” (i.e., transactional) data can be used to generate exponentially larger data sets which do have meaning/value in decision making and planning. After the introductory part and defining the different “views”, the course will firstly address the conceptual and logical design of data warehouses (e.g., star schemata) and will introduce the basic operators (e.g., ROLLUP, DRILLDOWN, etc.). The next stage will address the concept of querying DWs – both with extensions to SQL, as well as via the novel DW-centric query language MDX (Multi-Dimensional eXtension). This part of the course will be wrapped up with discussion of a few topics such as ETL (Extract-Transform-Load); different data types in DW (e.g., spatial, spatio-temporal); definition of KPI (Key Performance Indicators); etc… The expectation is to have 2-3 projects in this part. Implicitly, we will also deal with Variety and Value aspects of big data.

The second part of the course will focus on Streaming Data Management (SDM) and Complex Events Processing (CEP). Here, we will consider the settings in which the data arrival rate is so fast that it simply cannot be stored in the memory and then processed. In the SDM portion, we will consider issues related to probabilistic guarantees on the quality of the answers to different categories of queries, and the trade-offs between approximate-answers and data compression. Naturally, this will have the biggest accent on the Velocity aspect and the strong ties with Veracity. Complementary to this, the CEP portion will deal with specification and processing of “significant events” from heterogeneous sources, in real time (Velocity with ties to Variety/Value). This part is expected to have 1-2 projects.

Lastly, the “.5” part of the course will expose the students to different “currently-hot” research issues, related to topics addressed in this course. But few examples: DW vs. Hadoop; Crowdsourcing and Data Compression; Sentiment analysis and Linguistics; Health-care analytics;… Here, the main source will be “fresh” research articles from conferences, journals and workshops and the students will be expected to read and analyze the papers.

[1] Volume, Velocity, Variety, Veracity, Value…