Skip to content

Ewanwong/bcos_lm

Repository files navigation

B-cos LM

This repository contains the implementation of B-cos LM, as introduced in the paper "B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability".

Overview

B-cos LM is a modification of pre-trained language models to enhance interpretability while maintaining performance. Our implementation provides:

  • B-cos versions of BERT, DistilBERT, RoBERTa, GPT-2 and Llama models
  • Support for training B-cos and conventional models
  • Evaluation of B-cos and various post-hoc explanation methods

The core implementations are in:

  • bcos_lm/models/ – Contains B-cos model architectures
  • bcos_lm/modules/ – Contains essential components for B-cos adaptation

B-cos adaptations in the code are marked with ## bcos for clarity.

Environment

Our codes require transformers==4.45.2

Getting Started

1. Training B-cos LM

To train a B-cos LM model, run:

bash train_bcos_models.sh

You can specify:

  • Model (e.g., BERT, DistilBERT, RoBERTa)
  • Dataset
  • Hyperparameters

Modify train_bcos_models.sh to customize these settings.

2. Generating Explanations

To generate explanations using B-cos and other explanation methods, run:

bash generate_explanations.sh

You can specify explanation methods to use.

3. Perturbation-based Evaluation

To evaluate the model using perturbation-based methods, run:

bash run_perturbation_evaluation.sh

4. Sequence Pointing Game (SeqPG) Evaluation

  1. Generate SeqPG examples using conventional models:
    bash create_pointing_game_examples.sh
  2. Evaluate using SeqPG:
    bash run_pointing_game_evaluation.sh

Decoder-only Models

Decoder-only model experiments (GPT-2 & Llama) can be run by executing decoder_only_model_experiments.sh in decoder_experiments.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages