D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

Toward Visual Storytelling using Scene-Graph Contexts

Reen al la antaŭa paĝo

Aŭtoroj: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

Resumo:

VIsual STorytelling (VIST) is a task to transform a sequence of images into narrative text stories. Successfully generating a narrative story requires an understanding of the contexts and relationships among images. Our study introduces a story generation framework based on the Attention Mechanism on Long-Short-Term Memory (LSTM). In the generation process, both local and global contexts of the image sequence are considered. First, local context is based on individual image content, which utilizes the image features and scene-graph of each image. This context focuses on generating captions for each image and providing image details. Second, the global context refers to comprehensive information on the overall image sequence, which is constructed by aggregating all individual image content. The global context ensures that each caption fits cohesively within the overall story, maintaining continuity and coherence. Both the local and global contexts are used to generate a cohesive and engaging narrative. The VIST dataset is used to train and evaluate the proposed framework. Preliminary results highlight the importance of understanding image sequence contexts in generating coherent and engaging stories.

Tipo: Poster at MIRU Symposium (画像の認識・理解シンポジウム)

Dato de publikigo: August 2024

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.