Visual Reasoning in the Real World

A New Dataset for Visual Question Answering

Drew Hudson & Christopher Manning

The GQA Dataset

Question Answering on Image Scene Graphs

Image

Semantic Representations

Each image comes with a scene graph of objects and relations. Each question comes with a structured representation of its semantics.

Image

Compositional

22M multi-step questions that require a diverse set of reasoning skills, with both binary and open questions.

Image

Balanced

The answer distribution biases are reduced for each question type to mitigate language priors and prevent educated guesses.

Image

Strong Supervision

The structured representations allow for a stronger and more informative error signal during training.

Image

New Metrics

A suite of new metrics to evaluate not only accuracy, but also the consistency, validity and plausibility of responses.

Image

Thorough Diagnosis

Supports careful analysis based on question and answer type, length, number of reasoning steps and difficulty.

Join the 2020 GQA Challenge for Real-World Visual Reasoning

GQA images are from COCO and Flickr. The image scene graphs are based on a
new cleaner version of Visual Genome. We thank COCO, Flickr, and Visual
Genome teams for their great work!

Read the Paper!
Image
Image
Image
Image
Read the Paper!
@article{hudson2018gqa,
    title={GQA: A New Dataset for Real-World Visual Reasoning 
    and Compositional Question Answering},
    author={Hudson, Drew A and Manning, Christopher D},
    journal={Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2019}
}