B-cos LM

This repository contains the implementation of B-cos LM, as introduced in the paper "B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability".

Overview

B-cos LM is a modification of pre-trained language models to enhance interpretability while maintaining performance. Our implementation provides:

B-cos versions of BERT, DistilBERT, RoBERTa, GPT-2 and Llama models
Support for training B-cos and conventional models
Evaluation of B-cos and various post-hoc explanation methods

The core implementations are in:

bcos_lm/models/ – Contains B-cos model architectures
bcos_lm/modules/ – Contains essential components for B-cos adaptation

B-cos adaptations in the code are marked with ## bcos for clarity.

Environment

Our codes require transformers==4.45.2

Getting Started

1. Training B-cos LM

To train a B-cos LM model, run:

bash train_bcos_models.sh

You can specify:

Model (e.g., BERT, DistilBERT, RoBERTa)
Dataset
Hyperparameters

Modify train_bcos_models.sh to customize these settings.

2. Generating Explanations

To generate explanations using B-cos and other explanation methods, run:

bash generate_explanations.sh

You can specify explanation methods to use.

3. Perturbation-based Evaluation

To evaluate the model using perturbation-based methods, run:

bash run_perturbation_evaluation.sh

4. Sequence Pointing Game (SeqPG) Evaluation

Generate SeqPG examples using conventional models:
```
bash create_pointing_game_examples.sh
```
Evaluate using SeqPG:
```
bash run_pointing_game_evaluation.sh
```

Decoder-only Models

Decoder-only model experiments (GPT-2 & Llama) can be run by executing decoder_only_model_experiments.sh in decoder_experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bcos_lm		bcos_lm
decoder_experiments		decoder_experiments
saliency_evaluation		saliency_evaluation
saliency_generation		saliency_generation
saliency_utils		saliency_utils
LICENSE		LICENSE
README.md		README.md
create_pointing_game_examples.sh		create_pointing_game_examples.sh
generate_explanations.sh		generate_explanations.sh
model_finetuning.py		model_finetuning.py
run_perturbation_evaluation.sh		run_perturbation_evaluation.sh
run_pointing_game_evaluation.sh		run_pointing_game_evaluation.sh
train_bcos_models.sh		train_bcos_models.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

B-cos LM

Overview

Environment

Getting Started

1. Training B-cos LM

2. Generating Explanations

3. Perturbation-based Evaluation

4. Sequence Pointing Game (SeqPG) Evaluation

Decoder-only Models

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

B-cos LM

Overview

Environment

Getting Started

1. Training B-cos LM

2. Generating Explanations

3. Perturbation-based Evaluation

4. Sequence Pointing Game (SeqPG) Evaluation

Decoder-only Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages