Skip to main content
SearchLoginLogin or Signup

Descriptive Image Captioning with Salient Retrieval Priors

Published onJun 08, 2021
Descriptive Image Captioning with Salient Retrieval Priors


Captions are often expected to carry detailed, essential information of images, but current image captioning models tend to play safe and generate generic captions that is less informative. Cross-modal retrieval is a promising solution as texts with more details has better performance in retrieval. In this
work, we first explore two types of salient n-grams, i.e., Support N-grams (SN) and Deletion N-grams (DN), in captions which significantly affect the performance of typical cross-modal retrieval models. We further exploit these n-grams to enhance the original learning objectives for generating descriptive captions with more details. The experiments on two benchmark datasets show that our proposed model outperforms baselines significantly when evaluated with a wide range of metrics.

Article ID: 2021S23

Month: May

Year: 2021

Address: Online

Venue: Canadian Conference on Artificial Intelligence

Publisher: Canadian Artificial Intelligence Association


No comments here
Why not start the discussion?