D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

Investigating conceptual blending of a diffusion model for improving nonword-to-image generation

Reen al la antaŭa paĝo

Aŭtoroj: Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Takatsugu Hirayama, Ichiro Ide

Resumo:

Text-to-image diffusion models sometimes depict blended concepts in generated images. One promising use case of this effect would be the nonword-to-image generation task which attempts to generate images intuitively imaginable from a non-existing word (nonword). To realize nonword-to-image generation, an existing study focused on associating nonwords with similar-sounding words. Since each nonword can have multiple similar-sounding words, generating im- ages containing their blended concepts would increase intuitiveness, facilitating creative activities and promoting computational psy- cholinguistics. Nevertheless, no existing study has quantitatively evaluated this effect in either diffusion models or the nonword-to- image generation paradigm. Therefore, this paper first analyzes the conceptual blending in one of the pretrained diffusion models called Stable Diffusion. The analysis reveals that a high percentage of generated images depict blended concepts when inputting an embedding interpolating between the text embeddings of two text prompts referring to different concepts. Next, this paper explores the best text embedding space conversion method of an existing nonword-to-image generation framework to ensure both the oc- currence of conceptual blending and image generation quality. We compare the conventional direct prediction approach with the pro- posed method that combines 𝑘-nearest neighbor search and linear regression. Evaluation reveals that the enhanced accuracy of the embedding space conversion by the proposed method improves the image generation quality, while the emergence of conceptual blending could be attributed mainly to the specific dimensions of the high-dimensional text embedding space.

Tipo: Oral presentation at ACM Multimedia (ACMMM) 2024 (Nominated as Best paper candidate)

Dato de publikigo: October 2024

DOI: 10.1145/3664647.3681202

Linkoj: [ arXiv ]

Dosieroj

presentation

poster

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.