See all NewsEngineering News

Centering Sound Artists in Generative Music

Scholars of computer science, history, and law consider human-centered approaches to AI and the sound arts

Commercial digital music catalogs — from Apple iTunes to Amazon Music to Spotify to BandCamp — have long employed recommendation algorithms to guide and influence consumers to consider artists and genres similar to their browsing and purchase history.

Rapid developments in generative artificial intelligence (AI) are now impacting music production, with reverberations for composers, the music industry, and copyright law.

During a symposium on January 26, researchers and sound artists discussed the implications of generative models in music, outlined the critical and cultural response to generative music, examined the human role in the co-creativity process, and highlighted the design challenges for a human-centered approach to AI design that seeks to support – rather than supplant – the creative process.

The event was cosponsored by the Center for Human-Computer Interaction + Design (HCI+D) — a collaboration between Northwestern Engineering and Northwestern’s School of Communication — the Department of Radio, Television and Film in Northwestern's School of Communication, the Great Lakes Association for Sound Studies, Northwestern’s Bienen School of Music, and Northwestern’s Master of Arts in Sound Arts and Industries (SAI).

Jacob SmithJacob Smith, professor and associate chair of radio, television, film in the School of Communication and director of the MA in SAI graduate program, welcomed the guests.

“I'm delighted that we've been able to convene such an impressive roster of thinkers and makers to consider the possibilities of human-centered approaches to AI,” Smith said. “Today, we’re going to get a truly multidisciplinary perspective from scholars, software developers, lawyers, historians, and cultural critics.”

Centering the musician

Interactive Audio Lab lead Bryan Pardo and doctoral students Julia Barnett and Hugo Flores García presented their perspectives around putting the musician at the center of generative music making.

Pardo is codirector of HCI+D and professor of computer science in the McCormick School of Engineering and of radio, television, and film in Northwestern’s School of Communication. He studies fundamental problems in computer audition, content-based audio search, and generative modeling in audio. He also develops inclusive interfaces for audio production.

Pardo compared the intentional process of a sample artist or a pastiche artist that imitates the style of another artifact, artist, or period with the process of music production via a generative model employing a probability distribution for outputs.

Bryan Pardo

“When Andy Warhol used a publicity shot or tabloid photograph, he knew where it came from and built off of it,” Pardo said. “With a generative model, I put in the prompt, and I get output, but I don't know if I'm borrowing from a particular individual's work. I might be stealing verbatim. I might be copying another musician’s style whole-cloth.”

Pardo and Barnett, a third-year Technology and Social Behavior PhD student, are building an interface that can identify data attribution in generative music models and employ similarity measures to flag generated music in real-time that closely resembles a clip in the model’s training data.

Julia Barnett“Then people can examine the clips that are similar to their own piece of music and understand how novel they may or may not be. It might be an exact replica, it might be completely new,” Barnett said. “Then it's up to the user to decide: Is this too similar? Do I just want to learn about what these songs were? Should I rework this? Should I avoid monetizing this? Am I actually stealing from someone?”

Barnett’s research interests lie in algorithmic ethics and transparency, reducing the socio-technical harms of algorithmic systems, and deep generative applications in social contexts.

She discussed the unintended impacts and potential harms of generative music systems, including issues related to copyright and trademark infringement, cultural appropriation, the loss of agency and authorship, and creativity stifling.

Musician in the loop

Inspired by the anonymous maxim “writing about music is like dancing about architecture,” Pardo also illustrated the limitations of text-to-music AI applications.

“If I prompt ‘a really cool chill vibe with some keyboards,’ that generates output, but there are a million songs that might fit that description, and if you are an artist that works in the sound domain, the English language doesn't cover it — you can't specify it well enough to explain what you mean,” Pardo said. “This kind of text-based prompting is also not all that conducive to my feeling like I'm really part of the artistic process. I don't feel like I'm a musician in the loop — I feel like I'm more of an observer of something happening.”

A computer musician, composer, and audio improviser, Flores García is quickly frustrated by text-based generators like Google’s MusicFX, which can’t effectively capture abstractions like rhythms and pitch contours.

Flores García is a PhD student in computer science at Northwestern Engineering. He conducts research at the intersection of applied machine learning, music, and human-computer interaction. His research focuses on designing artist-centric, deep learning interfaces for the sound arts.

Flores García presented unloop, an open-source, co-creative musical interface that generates non-repeating variations of a recorded loop. Unloop is powered by VampNet, a masked acoustic token modeling approach to audio generation built by Flores García, Pardo, and two collaborators currently at Adobe Research — Prem Seetharaman (PhD ’19) and Rithesh Kumar

Inspired by the live tape-looping, complex layering music technique that artists like Brian Eno and Robert Fripp originated in the 1960s, Flores García built a neural looper.

Hugo Flores García (Photo by: Camilla Forte)“I take a sound from the real world and encode it into this token medium, then corrupt and destroy a part of these tokens, and regenerate the missing tokens with a neural network,” Flores García said. “I've found the resulting auditory process fascinating.”

Flores García experiments with the model using his own recorded music, adding reference sounds to create new audio amalgamations.

“I like to think of neural tape loops as having a kind of memory of the data that you use to fine-tune it,” Flores García said. “I decided to fine tune on music that evokes memories of my own.”

Appropriating open access music archives

Elena Razlogova, associate professor of history at Concordia University and author of The Listener's Voice: Early Radio and the American Public (University of Pennsylvania Press, 2011) presented background research from her forthcoming book on the rise of online music and the Jersey City, New Jersey community radio station WFMU.

During her session, titled “Freeform to Muzak: The Aesthetic Aspects of Appropriating Open Access Music Archives for Text-to-Music AI,” Razlogova considered how the case of the now-defunct Free Music Archive and Google’s text-to-music MusicLM app historicizes the aesthetic dimensions of a pattern of appropriation by developers of early twenty-first century music apps like Spotify and Shazam, who have used pirate music databases and music collections to train the recommendation and recognition engines.

During the afternoon session, Peter C. DiCola, professor of law at Northwestern Pritzker School of Law, continued the discussion around the copyright and intellectual property implications of generative AI.

To conclude the event, a panel of multidisciplinary scholars provided cultural and social perspectives on human-centered AI and human-AI interaction.

The panelists included:

  • Patrick Feaster, sound media historian
  • Benjamin Lindquist, Mellon Postdoctoral Fellow in the Department of History Science in Human Culture program at Northwestern’s Weinberg College of Arts and Sciences
  • Duri Long, assistant professor of communication studies at Northwestern’s School of Communication and (by courtesy) of computer science at Northwestern Engineering