Dr. Marc A. Kastner

Über mich

Andere Sprachen

English

Esperanto

日本語

Toward Visual Storytelling using Scene-Graph Contexts

Zurück zu Veröffentlichungen

Authoren: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

Abstrakt:

VIsual STorytelling (VIST) is a task to transform a sequence of images into narrative text stories. Successfully generating a narrative story requires an understanding of the contexts and relationships among images. Our study introduces a story generation framework based on the Attention Mechanism on Long-Short-Term Memory (LSTM). In the generation process, both local and global contexts of the image sequence are considered. First, local context is based on individual image content, which utilizes the image features and scene-graph of each image. This context focuses on generating captions for each image and providing image details. Second, the global context refers to comprehensive information on the overall image sequence, which is constructed by aggregating all individual image content. The global context ensures that each caption fits cohesively within the overall story, maintaining continuity and coherence. Both the local and global contexts are used to generate a cohesive and engaging narrative. The VIST dataset is used to train and evaluate the proposed framework. Preliminary results highlight the importance of understanding image sequence contexts in generating coherent and engaging stories.

Typ: Poster at MIRU Symposium (画像の認識・理解シンポジウム)

Veröffentlichungsdatum: August 2024

Wenn Sie Fragen oder Kommentare zu dieser Forschung haben, zögern Sie nicht einen Kommentar zu hinterlassen oder mir eine email zu schreiben. Ich werde mich zeitnahe zurückmelden.