MSAI 352: Machine Perception of Music & Audio

Quarter Offered

None : TDB ; TBD


Machine extraction of musical structure in audio, MIDI and score files, covering areas such as source separation and perceptual mapping of audio to machine-quantifiable measures.

  • Approved for the Breadth Interfaces & project requirement in the CS curriculum.

REQUIRED TEXTBOOK : Signals Sound and Sensation by William M. Hartmann.

REFERENCE TEXTBOOKS: (not required purchases) Excerpts from the following texts may be provided, as well as current research papers in the field of Music Information Retrieval

Bregman, Albert S., Auditory Scene Analysis: The Perceptual Organization of sound .
Moore, F. Richard, Elements of Computer Music
Rabiner, L.R. and Schafer, R.W., Digital Processing of Speech Signals
Yost, William A., Fundamentals of Hearing: An Introduction


COURSE GOALS: How do you tell the sound of a clarinet from the sound of a kazoo? Is this song a waltz or a tango? If your friend likes Yo La Tengo, would she prefer a CD by the Flaming Lips or Bon Jovi? Can a computer answer these questions?

Researchers in computational music perception apply signal processing, psychology, music theory, machine learning, and natural language processing techniques to auditory user interfaces for human-computer interaction. Current application areas include vocal interfaces and search engines for music databases, machine accompaniment of human musicians, automated music recommendation systems, and tools for music production.

Machine Perception of Music will introduce students to the field of computational music perception through a combination of lectures, readings, and lab work in MATLAB. Students will learn basics of how sound and music are recorded and encoded by computers as .wav and MIDI files. The class will also explore basics of audio perception, including the relationship between pitch and frequency and the difficulties inherent in auditory scene analysis by humans and machines. Basic classification and sequence alignment techniques will also be introduced.

Prior programming experience sufficient to be able to do laboratory assignments in MATLAB is required. 

An ability and willingness to

1) … learn about things outside your own area of expertise.
2) ….come up with something cool and MAKE it in software.
3) ….think scientifically about the arts. (Yes, it IS possible)

DETAILED COURSE TOPICS: What follows is an example syllabus. As topics of current interest in machine perception of music shift, course content will vary to reflect research trends.

Week 1: Pure tones, power, intensity and loudness
Week 2: Perception of periodic complex sounds, auditory filters, critical bands
Week 3: Musical measures of frequency, pitch perception, MP3 encoding
Week 4: Representations of audio: spectrograms, cepstrograms
Week 5: Music theory (chords, scales, time signatures) and MIDI
Week 6: Beat tracking, finding rhythms
Week 7: Pitch tracking, melody transcription
Week 8: Source separation and scene analysis
Week 9: Audio fingerprinting, query by humming, melody matching
Week 10: Music recommendation systems, current research in MIR

Three written assignments (30%)
Three laboratory projects (30%)
Midterm (20%)
Final project (20%)

COURSE OBJECTIVES: When a student completes this course, s/he should:

  • have a basic understanding of how audio is encoded by computers
  • understand the basics of human audio perception of sound
  • be able to create tools to find salient structures in music audio
  • be able to understand current research in the music information retrieval community
  • be able to think critically about arts and technology