Official Repository for Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (EMNLP 2024)

GELI: Global Explict Local Implicit Reward Decomposition

Try the GELI-trained Llama model at HuggingFace!
HF Model Card»

Paper · Pre-Trained Reward Functions · CANDOR Dataset

This is the official repository for Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents presented at EMNLP 2024 (Oral).

The CANDOR dataset can be downloaded here after signing the required license forms. Our models and weights strictly follow the same terms as originally listed in the original terms and conditions as the CANDOR dataset: CANDOR Dataset Download

Our Arxiv Paper can be found here: Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents

Overview

This repo is divided into the following sections:

Setting up Environment -- Setting up the environment
CANDOR Dataset Download -- Downloading CANDOR dataset, which our models were trained on
Reward Function Training -- Scripts to Train LI and GELI.
Using GELI for Reinforcement Learning with Human Feedback (RLHF) -- Scripts to Train Llama with GELI with RLHF
Accessing Pre-trained Reward Functions -- Downloading trained reward functions with LI and GELI (These are not required and be trained from scatch but we share these weights for convenience)

Setting up the Environment

Activate a Conda Environment or virtual env, then use the requirements.txt file to download all the dependencies.

pip install -r requirements.txt

CANDOR Dataset Download

Our work relies on the CANDOR Dataset, due to its long-term nature (length of conversations 31.3 mins on average), large-size (1656 conversations, 7+ million words, 850-hours). The CANDOR dataset also includes video data. Download the dataset from HERE, read through the terms carefully. Then place the downloaded data into the ./candor folder and unzip as necessary. The data should be stored such that for each pair, there is a unique id. For example, ./candor/0a0cf5b9-84f6-4d8d-8001-ec7fd4b7437a

GELI: Reward Function Training

We first train the reward function using just LI. Here we use the facial affect classifier as a proxy reward.

python train_reward_LI_only.py --reward_class overall_affect --model convo --batch_size 32 --K 32 --train_size 500 --val --small_model_shrink

This will output a reward model that are saved to THIS folder, which is saved with the wandb experiment name. For example, our best wandb experiment name was devout-brook-114, so it is saved in the folder as: reward1_devout-brook-114. The language model is also saved as lang_model_devout-brook-114.

Then, we will train the reward function using LI and GE. Here we use RRD as the global reward decomposition.

python train_reward_GELI.py --redist_type RRD --reward_class overall_affect --model convo --batch_size 1 --K 160 --train_size 100 --train --val --small_model_shrink --affect_path devout-brook-114

This will output a reward model that are saved to THIS folder, which is again saved with the wandb experiment name. For example, our best wandb experiment name was reward1_eager-glade-5, so it is saved in the folder as: reward1_eager-glade-5.

Using GELI for RLHF

Finally, we can run RLHF with GELI with the following code.

python train_rl_GELI.py --model_name llama --exp_name GELI --reward redistributed_reward --rf_model GELI --train --val --batch_size 24

We use PEFT in our implementation, this will save the adapter weights in ./rlmf_weights.

Accessing Pre-trained Reward Functions

We share the pre-trained reward functions here.

LI-Only: Drive.
GELI: Folder

Place them in the respective folders, if you would like to skip training them yourself.

License

Unless noted otherwise, we are strictly following the terms under that released by the CANDOR Dataset.

All other the contesnts of this repository are released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact

Dong Won Lee - X:@_dongwonlee - Email: dongwonl@mit.edu

(back to top)

Acknowledgments

We gratefully acknowledge and the creators and researchers of the following works, who were huge enablers allowing this project to come to fruition.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
SINGLE_INDEX_reward_function_convo_overall_affect_contra_False_shrink_False_curriculum_False_curriculum_exposureFalse		SINGLE_INDEX_reward_function_convo_overall_affect_contra_False_shrink_False_curriculum_False_curriculum_exposureFalse
candor		candor
images		images
visual_feedback_baselines_convo_overall_affect_baseline_RRD_K_160		visual_feedback_baselines_convo_overall_affect_baseline_RRD_K_160
README.md		README.md
data_loader_reward.py		data_loader_reward.py
data_loader_reward_single.py		data_loader_reward_single.py
exp.sh		exp.sh
options.py		options.py
options_reward.py		options_reward.py
requirements.txt		requirements.txt
train_reward_GELI.py		train_reward_GELI.py
train_reward_LI_only.py		train_reward_LI_only.py
train_rl_GELI.py		train_rl_GELI.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Repository for Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (EMNLP 2024)

GELI: Global Explict Local Implicit Reward Decomposition

Overview

Setting up the Environment

CANDOR Dataset Download

GELI: Reward Function Training

Using GELI for RLHF

Accessing Pre-trained Reward Functions

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Official Repository for Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents (EMNLP 2024)

GELI: Global Explict Local Implicit Reward Decomposition

Overview

Setting up the Environment

CANDOR Dataset Download

GELI: Reward Function Training

Using GELI for RLHF

Accessing Pre-trained Reward Functions

License

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages