Image Questions: Beyond One-Size-Fits-All

Let's talk about image questions. These are the queries we can ask about pictures. We've come a long way in creating good questions from images automatically. But, most methods forget about something important: the variety and clarity of these questions. Imagine you're planning a trip and need to ask different questions about your destination to get all the details right.

Our new model focuses on creating diverse questions from clear sources. Here's how it works: it first breaks down an image into parts, called a scene graph, using an unbiased method. This way, the questions have clear origins. To mix things up, our model picks different parts of this graph to generate questions from. It learns how humans choose these parts, making the questions varied and meaningful. We tested this model on two big datasets, VQA v2. 0 and COCO-QA, and it did better than the usual methods. It created diverse questions that were easy to understand.

Actions