Sony AI releases Woosh foundation model for sound effect generation | exclusive interview

Sony AI has released Woosh, a foundation AI model built specifically for sound effect generation – an area most generative audio models have largely overlooked in favor of music or general audio generation. Sony AI, which is a research arm of Sony Corp., described Woosh in a research paper and I interviewed two of the authors for this article: Mark Ferras and Hakim Missoum. Other authors included Gaetan Hadjeres, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrici, Joan Serra, and Yuki Mitsufuji. Built for workflows used in gaming, film, and interactive media, Woosh supports both: Text-to-audio: generating a sound effect from a written description. Video-to-audio: generating sound directly from a video sequence, with an optional text prompt to guide the output. The project was built around a core insight: professional sound design requires fundamentally different data and controls than general audio AI systems. One of the clearest findings was the significant gap between public and private training data. Sony AI created two versions of the model: A private model trained on licensed professional sound effect libraries, including Pro Sound Effects and BOOM, optimized for studio-grade output. A public model that uses the same architecture as the private model but is trained on publicly available datasets. The private model, trained on commercial libraries, significantly outperforms public alternatives on professional sound effect data. The public model outperforms comparable open-source models on public benchmarks. The public model is now available for the research community to access and experiment with. The private model is also available for those…

Continue reading →

Work & Theory on June 17, 2026 Uncategorized