Jump directly to content

Oriol Nieto: GenAI for sound design

At RISE Learning Machines Seminar on March 20, 2025, we have the pleasure to listen to Oriol Nieto, Adobe, give his talk: GenAI for sound design.

Seminar Details:

When: March 20, 2025, 15:00 CET   
Where: Online via Zoom.

Register here

Abstract

This presentation explores the forefront of generative AI research for sound design at Adobe Research. I will provide an overview of Latent Diffusion Models, which form the foundation of our work, and introduce several recent advancements focused on controllability and multimodality.

 I will begin with SILA [1], a technique designed to enhance the control of sound effects generated through text prompts. Following this, I will present Sketch2Sound [2], a model that generates sound effects conditioned on both audio recordings and text. Lastly, I will examine MultiFoley [3], a model capable of generating sound effects from both silent videos and text. 

Throughout the talk, I will showcase a series of examples and demos to illustrate the practical applications and potential of these models, making the case that we are only beginning to unveil a completely new paradigm in how to approach sound design.

[1] Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol Nieto, "SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation", In review for IEEE SPL

[2] Hugo Flores García, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman, "Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations", ICASSP 2025

[3] Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon, "Video-Guided Foley Sound Generation with Multimodal Controls", In review for CVPR 2025

About the speaker

Oriol is a Senior Research Engineer at Adobe Research, where he focuses on human-centered AI for audio creativity, encompassing everything from music to audiobooks, video editing, and sound design. He holds a PhD in Music Technology from MARL, NYU, a Master's in Music, Science, and Technology from Stanford University, and a Master's in Information Technologies from Pompeu Fabra University. 

Highly involved with the Music Information Retrieval community, he was one of the three General Chairs for ISMIR 2024 in San Francisco this past November. Oriol has helped develop relevant open-source MIR packages such as librosa, mir-eval, and MSAF; contributed to PyTorch; and plays guitar, violin, cajón, and sings (and screams) in his spare time.

Olof Mogren

Contact person

Olof Mogren

Senior Researcher

+46 73 023 56 09

Read more about Olof

Contact Olof
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

* Mandatory 

By submitting the form, RISE will process your personal data.