395/495 - Computational Auditory Scene Analysis

Quarter Offered

None ;


CATALOG DESCRIPTION: Computational auditory scene analysis (CASA) is the study of how a computational system can organize sound into perceptually meaningful elements. Problems in this field include source separation (splitting audio mixtures into individual sounds), source identification (labeling a source sound), and streaming (finding which sounds belong to a single explanation/event). This course is an advanced graduate course covering current research in the field.

REQUIRED TEXTBOOK: Advanced research papers in the field.

REFERENCE TEXTBOOKS: (not required purchases) Excerpts from the following texts may be provided, however the focus will be on research papers published in the field.

DeLiang Wang, Guy J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Bregman, Albert S., Auditory Scene Analysis: The Perceptual Organization of sound.


COURSE GOALS: The goal of this course is to familiarize graduate students (and advanced undergraduates) with the current state-of-the-art in machine perception of audio. Students will read recently published papers in the field and become well informed on at least one sub-field within machine perception of audio. The class will also explore basics of audio perception, including the relationship between pitch and frequency and the difficulties inherent in auditory scene analysis by humans and machines. Basic classification and sequence alignment techniques will also be introduced.

PREREQUISITES: Understanding of signal processing including topics such as Fourier Transforms and filter design is a prerequisite. Knowledge of machine learning techniques (Markov models, support vector machines, etc.) is also helpful, but not required.


What follows is an example syllabus. As topics of current interest in the field shift, course content will vary to reflect research trends.

Week 1: Perception of periodic complex sounds, auditory filters, critical bands

Week 2: Representations of audio: spectrograms, cepstrograms, atomic representations

Week 3: Audio fingerprinting

Week 4: Pitch tracking

Week 5: Melody matching

Week 6: Source identification

Week 7: Source identification

Week 8: Source separation

Week 9: Source separation

Week 10: Streaming


Presentation on topic (30%)

Research paper synopses (30%)

Report on research area (30%)

Class participation (10%)

COURSE OBJECTIVES: When a student completes this course, s/he should:

  • have a general understanding of the current state-of-the art in machine perception of audio.

  • be able to distill large amounts of research into coherent summaries.

  • be able to think critically about work in the field.