D-ro Marc A. Kastner

Pri mi

Aliaj Lingvoj

Deutsch

English

日本語

Towards Visual Storytelling by Understanding Narrative Context through Scene-Graphs

Reen al la antaŭa paĝo

Aŭtoroj: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

Resumo:

VIsual STorytelling (VIST) is a task that transforms a sequence of images into narrative text stories. A narrative story requires an understanding of the contexts and relationships among images. Our study introduces a story generation process that emphasizes creating a coherent narrative by constructing both image and narrative contexts to control the coherence. First, the image contexts are generated from the content of individual images, using image features and scene-graphs that detail the elements of the images. Second, the narrative context is generated by focusing on the overall image sequence. Ensuring that each caption fits coherency within the overall story maintains continuity and coherence. We also introduce a narrative concept summary, which is external knowledge represented as a knowledge-graph. This summary encapsulates the narrative concept of an image sequence to enhance the understanding of its overall content. Following this, both image and narrative contexts are used to generate a coherent and engaging narrative. This framework is based on Long Short-Term Memory (LSTM) with an attention mechanism. We evaluate the proposed method using the VIST dataset, and the results highlight the importance of understanding contexts of an image sequence in generating coherent and engaging stories. The study demonstrates the importance of involving narrative context in the generation process to ensure the coherence of the generated narrative.

Tipo: 31th Intl. Conf. on MultiMedia Modeling (MMM2025)

Dato de publikigo: To be published in Jan 2025

Se vi havas demandojn aŭ komentojn pri ĉi tiu esplorado, bonvolu lasi komenton sube aŭ sendi al mi retpoŝton. Mi respondos rapide.