technologyneutral
Image Questions: Beyond One-Size-Fits-All
Monday, December 23, 2024
Our new model focuses on creating diverse questions from clear sources. Here's how it works: it first breaks down an image into parts, called a scene graph, using an unbiased method. This way, the questions have clear origins. To mix things up, our model picks different parts of this graph to generate questions from. It learns how humans choose these parts, making the questions varied and meaningful.
We tested this model on two big datasets, VQA v2. 0 and COCO-QA, and it did better than the usual methods. It created diverse questions that were easy to understand.
Actions
flag content