Name	Name	Last commit message	Last commit date
parent directory ..
t5_cbqa	t5_cbqa
README.md	README.md
nq_human_eval.tsv	nq_human_eval.tsv
requirements.txt	requirements.txt
run.sh	run.sh

Closed Book Question Answering w/ T5

This repository contains the code for reproducing the experiments in How Much Knowledge Can You Pack Into the Parameters of a Language Model?.

Usage
Released Model Checkpoints
How to Cite

Usage

To run this code, you first need to install the t5 library. General instructions for training, fine-tuning, evaluation, and exporting models for inference can be found in the t5 repo.

In order to use the additional CBQA tasks provided in this library with the t5_mesh_transformer commands, run from this directory and add the flag --module_import="t5_cbqa.tasks". If using the t5 API from an interactive shell or script, simply call import t5_cbqa.tasks.

As an example, you can fine-tune on a mixture of all 3 CBQA tasks (Natural Questions, Web Questions, and TriviaQA) with the T5.1.1-XXL + SSM model by running the command below from this directory.

The remaining experiments are shown in the tasks.py file.

PROJECT=yourproject
ZONE=yourzone
BUCKET=gs://yourbucket
TPU=yourtpu
TPU_SIZE=v3-64

ctpu up --name=$TPU --project=$PROJECT --zone=$ZONE --tpu-size=$TPU_SIZE --tpu-only --noconf

TASK=closed_book_qa
PRETRAINED_DIR=gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm
PRETRAINED_STEPS=1100000
FINETUNE_STEPS=10000
MODEL_DIR="${BUCKET}/${TASK}/xxl_ssm"

# Run fine-tuning
python -m t5.models.mesh_transformer_main \
  --module_import="t5_cbqa.tasks" \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="dataset.gin" \
  --gin_file="${PRETRAINED_DIR}/operative_config.gin" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.save_checkpoints_steps=1000" \
  --gin_param="utils.run.batch_size=('tokens_per_batch', 196608)" \
  --gin_param="utils.run.train_steps=$((PRETRAINED_STEPS+FINETUNE_STEPS))" \
  --gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'" \
  --gin_param="utils.run.learning_rate_schedule=@learning_rate_schedules.constant_learning_rate" \
  --gin_param="constant_learning_rate.learning_rate=1e-3" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds"

# Run eval
python -m t5.models.mesh_transformer_main \
  --module_import="t5_cbqa.tasks" \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="dataset.gin" \
  --gin_file="${MODEL_DIR}/operative_config.gin" \
  --gin_file="eval.gin" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '${TPU_SIZE}'" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.dataset_split = 'validation'" \
  --gin_param="utils.run.batch_size = 128" \
  --gin_param="utils.run.eval_checkpoint_step = 'all'" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds"

Released Model Checkpoints

To facilitate reproducibility and future work, we have released the model checkpoints for our largest (and best-performing) models, which are the most difficult to train.

Each was initialized with a pre-trained T5 checkpoint (available in the t5 repo) and pre-trained for an additional 100k steps with "salient span masking" (SSM) on the dataset of Wikipedia sentences from Guu et al., 2020.

The models fine-tuned on *_open_test tasks were fine-tuned with 10k steps on individual open-domain QA tasks using the full train splits (and also the validation split in the case of TriviaQA). The released checkpoint is from the final step of fine-tuning.

The models fine-tuned on *_open tasks were trained for 20k steps on ~90% of the train split. The released checkpoint is the one producing the best score on the held-out ~10% of the train split.

For more details on our training procedure, see our paper.

SSM Models with no fine-tuning:

Base Model	Path
T5-small	gs://t5-data/pretrained_models/cbqa/small_ssm
T5-large	gs://t5-data/pretrained_models/cbqa/large_ssm
T5-3B	gs://t5-data/pretrained_models/cbqa/3b_ssm
T5-11B	gs://t5-data/pretrained_models/cbqa/11b_ssm
T5.1.1-XXL	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm

SSM models fine-tuned on Natural Questions:

Base Model (+SSM)	Finetune Task	EM Score	Path
T5-small	`natural_questions_open_test`	25.5	gs://t5-data/pretrained_models/cbqa/small_ssm_nq
T5-large	`natural_questions_open_test`	30.4	gs://t5-data/pretrained_models/cbqa/large_ssm_nq
T5-3B	`natural_questions_open_test`	33.2	gs://t5-data/pretrained_models/cbqa/3b_ssm_nq
T5-11B	`natural_questions_open_test`	36.6	gs://t5-data/pretrained_models/cbqa/11b_ssm_nq
T5.1.1-XL	`natural_questions_open_test`	35.6	gs://t5-data/pretrained_models/cbqa/t5.1.1.xl_ssm_nq
T5.1.1-XXL	`natural_questions_open_test`	37.9	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm_nq
T5-large	`natural_questions_open`	29.0	gs://t5-data/pretrained_models/cbqa/large_ssm_nqo
T5-3B	`natural_questions_open`	31.7	gs://t5-data/pretrained_models/cbqa/3b_ssm_nqo
T5-11B	`natural_questions_open`	34.8	gs://t5-data/pretrained_models/cbqa/11b_ssm_nqo
T5.1.1-XXL	`natural_questions_open`	35.2	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm_nqo

SSM models fine-tuned on WebQuestions:

Base Model (+SSM)	Finetune Task	EM Score	Path
T5-11B	`web_questions_open_test`	44.7	gs://t5-data/pretrained_models/cbqa/11b_ssm_wq
T5.1.1-XXL	`web_questions_open_test`	43.5	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm_wq
T5-11B	`web_questions_open`	40.8	gs://t5-data/pretrained_models/cbqa/11b_ssm_wqo
T5.1.1-XXL	`web_questions_open`	42.8	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm_wqo

SSM models fine-tuned on TriviaQA:

Base Model (+SSM)	Finetune Task	EM Score	Path
T5-11B	`trivia_qa_open_test`	60.5*	gs://t5-data/pretrained_models/cbqa/11b_ssm_tqa
T5.1.1-XXL	`trivia_qa_open_test`	61.6*	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm_tqa
T5-11B	`trivia_qa_open`	51.0	gs://t5-data/pretrained_models/cbqa/11b_ssm_tqao
T5.1.1-XXL	`trivia_qa_open`	51.9	gs://t5-data/pretrained_models/cbqa/t5.1.1.xxl_ssm_tqao

Score for the private TriviaQA Wikipedia domain test set.

How to Cite

If you extend or use this work, please cite the paper where it was introduced:

@inproceedings{2020t5cqba,
  author = {Adam Roberts and Colin Raffel and Noam Shazeer},
  title = {How Much Knowledge Can You Pack Into the Parameters of a Language Model?},
  booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2020},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Closed Book Question Answering w/ T5

Table of Contents

Usage

Released Model Checkpoints

How to Cite

FilesExpand file tree

t5_closed_book_qa

Directory actions

More options

Directory actions

More options

Latest commit

History

t5_closed_book_qa

Folders and files

parent directory

README.md

Closed Book Question Answering w/ T5

Table of Contents

Usage

Released Model Checkpoints

How to Cite