カストナー　マークアウレル

博士（情報学）

他の言語

Deutsch

English

Esperanto

Toward Visual Storytelling using Scene-Graph Contexts

研究業績へ戻る

著者: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

あらすじ:

VIsual STorytelling (VIST) is a task to transform a sequence of images into narrative text stories. Successfully generating a narrative story requires an understanding of the contexts and relationships among images. Our study introduces a story generation framework based on the Attention Mechanism on Long-Short-Term Memory (LSTM). In the generation process, both local and global contexts of the image sequence are considered. First, local context is based on individual image content, which utilizes the image features and scene-graph of each image. This context focuses on generating captions for each image and providing image details. Second, the global context refers to comprehensive information on the overall image sequence, which is constructed by aggregating all individual image content. The global context ensures that each caption fits cohesively within the overall story, maintaining continuity and coherence. Both the local and global contexts are used to generate a cohesive and engaging narrative. The VIST dataset is used to train and evaluate the proposed framework. Preliminary results highlight the importance of understanding image sequence contexts in generating coherent and engaging stories.

種類: Poster at MIRU Symposium (画像の認識・理解シンポジウム)

日付: August 2024

この研究についてコメントやご意見がある場合、ぜひ以下にコメントを投稿してくだい。メールにてご連絡も大歓迎です。

カストナー マークアウレル

Toward Visual Storytelling using Scene-Graph Contexts

著者: Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide

カストナー　マークアウレル