カストナー マークアウレル

博士(情報学)

Transformer-Based Audio Generation Conditioned by 2D Latent Maps: A Demonstration

研究業績へ戻る

著者: Christian Limberg, Zhe Zhang, Marc A. Kastner

あらすじ:

This paper presents a demonstration of an improved framework for audio sample generation using interactive 2D latent maps. Building upon the foundational work "Mapping the Audio Landscape for Innovative Music Sample Generation", we enhance the framework by introducing visualization techniques for exploring the 2D audio landscape through different audio features such as energy and bandwidth. Additionally, we train a t-SNE embedding over these features to create a more abstract visualization of the audio samples on the map. This demo also significantly improves usability and user interactivity, allowing for a more intuitive and efficient exploration of the generated audio samples. The demo, remotely accessible via https://limchr.github.io/gesam_demo/ showcases these improvements in real-time, providing users with an enhanced novel interface for generating high-quality audio samples.

種類: 31th Intl. Conf. on MultiMedia Modeling (MMM2025)

日付: To be published in Jan 2025

外部リンク: [ demo ]


この研究についてコメントやご意見がある場合、ぜひ以下にコメントを投稿してくだい。メールにてご連絡も大歓迎です。
© 2013-2023 Marc A. Kastner. Powered by KirbyCMS. Some rights reserved. Privacy policy.