Dr. Marc A. Kastner

Über mich

Andere Sprachen

English

Esperanto

日本語

Leverage semantic alignment of object relations for image captioning

Zurück zu Veröffentlichungen

Authoren: Da Huo, Marc A. Kastner, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide

Abstrakt:

Image captioning is a popular task in vision and language, which aims to generate proper textual descriptions of images. Recently, some works use objects to ease image and text alignment for learning better cross-modal representation, resulting in good performance in this task. In this paper, we consider relation is also important for learning semantics, here we use relations between objects to explore if relations as a prior can also improve performance. First, we consider the annotated relations between objects, and use them as tags in an image captioning model for aligning the image and text. Moreover, we also aim at integrating relationships between text to image features. For this, we focus on the masking strategy and change the strategy from random masking to relation masking to further study the training strategy for enhancing semantic alignment of object relations. In the experiments, we found that considering object relations improved the captioning performance in common metrics. Further, when changing the masking strategy for focusing on a specific part in caption to be masked when training, we found that it could lead to capturing more object relations of an image, while it destroyed the randomness when training, the performance decreases and the relations appear to be not compatible with the image contents.

Typ: Poster at MIRU Symposium (画像の認識・理解シンポジウム)

Veröffentlichungsdatum: July 2023

Wenn Sie Fragen oder Kommentare zu dieser Forschung haben, zögern Sie nicht einen Kommentar zu hinterlassen oder mir eine email zu schreiben. Ich werde mich zeitnahe zurückmelden.