Meet HybridQA!

A large-scale multi-hop question answering dataset over heterogenesous information of both structured tabular and unstructured textual forms.

Why Hybrid Question Answering?


Image
HIGH-QUALITY

Mechanical Turk;
Strict Quality Control


Image
LARGE-SCALE

13k Wikipedia Tables;
293K hyperlinked passages;
70K natural questions.


Image
HYRBID

Semantic Understanding;
Symbolic Reasoning.


Image
Open-Domain

Reasoning over open domain Wikitables


Explore


We have designed an interface for you to view the data, please click here to explore the dataset and have fun!

Example


In the task, you are given a Wikipedia table with its hyperlinked passages, the goal is to answer a multi-hop question which involves informatino from both information forms (structured and unstructured data):
Image

Download (Train/Test Data, Code)


All the code and data are provided in github. The leaderboard is hosted in codalab

Reasoning Types


The questions require multi-hops between two information forms, the most typic reasoning patterns are demonstrated as follows:
Image

Paper


Please cite our paper as below if you use the Hybrid dataset.

@article{chen2020hybridqa,
  title={HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data},
  author={Chen, Wenhu and Zha, Hanwen and Chen, Zhiyu and Xiong, Wenhan and Wang, Hong and Wang, William},
  journal={Findings of EMNLP 2020},
  year={2020}
}
      

Contact


Image
Have any questions or suggestions? Feel free to contact us!