Building Tools to Enhance Creative Expression

Hugo Flores García and the Interactive Audio Lab team won the best paper award at the 2021 International Society for Music Information Retrieval Conference.

Jazz guitarist and audiovisual artist Hugo Flores García is a PhD student in computer science and member of the Interactive Audio Lab in Northwestern Engineering.

Hugo Flores García (Photo by: Camilla Forte)Flores García works at the intersection of machine learning, signal processing, and human-computer interaction. His research interests include sound event detection, audio source separation, and designing interfaces for inclusive music creation.

ISMIR 2021 Best Paper Award

The Interactive Audio Lab team won the Best Paper Award at the International Society for Music Information Retrieval (ISMIR) Conference, held virtually November 7-12, 2021.

Flores García is first author of the publication “Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition,” coauthored by Aldo Aguilar, research assistant and undergraduate student in computer science; Ethan Manilow, PhD candidate in computer science at the McCormick School of Engineering, and Bryan Pardo, head of the Interactive Audio Lab, co-director of the Center for Human-Computer Interaction + Design, and professor of computer science in Northwestern Engineering and of radio/television/film in the School of Communication.

Musical instrument recognition is a machine learning mechanism that aims to recognize and locate musical instruments in an audio recording. Flores García presented the research at ISMIR 2021 – one of more than 100 accepted papers – which addresses a limitation of current systems and discusses a new method of hierarchical prototypical networks to identify a more diverse range of musical instruments.

Current recognition systems are typically trained on a subset of approximately 10 common musical instruments with abundant public datasets, including flute, guitar, saxophone, trumpet, and violin. The immense volume of training data required to incorporate additional instruments is prohibitive.

To overcome the problem, Flores García and the research team applied musical instrument hierarchy classifications – such as strings (plucked vs. bowed), reeds, and winds – to a prototypical model that requires fewer support examples to predict similar instruments.

“Even though I've never heard a Venezuelan Cuatro before, a rare instrument in ML datasets, I know that it is a plucked string instrument, so I know its sound is going to resemble that of a classical guitar, a common instrument in ML datasets,” Flores García said.

The team found its approach performs significantly better than the non-hierarchical baseline classification method.

“The rise in deep learning and data-driven models led the community to focus on solving problems for which we have abundant data, which is often dominated by musical instruments and styles from the primarily white Western tradition,” Flores García said. “We hope our work is a step forward in the direction toward more inclusive music interfaces, empowering content creators who have diverse cultural backgrounds with the ability to use diverse collections of musical instruments in their AI-assisted creation environments.”

Inclusive interfaces

Flores García is also researching AI-driven, inclusive audio editing interfaces for sound engineers, musicians, and podcasters who are blind or visually impaired.

“Producing audio content has become an increasingly visual task, even though the act of making music or recording a podcast isn't very visual at all,” Flores García said. “For example, inaccessible audio production software makes finding the silent parts in a long recording of speech a tedious task, as users may have to listen to the entire recording and manually place time markers if they are unable to visually locate the spots where the audio waveform is flat.”

To solve this problem, Flores García is using Voice Activity Detection (VAD) technology to automatically place markers in all the silent regions in a recording, alleviating the time-consuming process of scrubbing through long segments of audio.


Flores García is a board member of Latin@CS, a new student organization for current and prospective students in computing fields who identify as Latin American, Latinx, or Hispanic.

The goal of Latin@CS is to provide support, mentorship, and community for its members. The group hosted its first in-person social event this fall and is planning quarterly networking activities.

Related Links

McCormick News Article