interpretability/patchscopes/code at master · PAIR-code/interpretability

Name	Name	Last commit message	Last commit date
parent directory ..
images	images
preprocessed_data	preprocessed_data
README.md	README.md
apply_delta.py	apply_delta.py
attribute_extraction.ipynb	attribute_extraction.ipynb
download_the_pile_text_data.py	download_the_pile_text_data.py
entity_processing.ipynb	entity_processing.ipynb
general_utils.py	general_utils.py
multihop-CoT.ipynb	multihop-CoT.ipynb
next_token_prediction.ipynb	next_token_prediction.ipynb
patch_cross_model.ipynb	patch_cross_model.ipynb
patchscopes_utils.py	patchscopes_utils.py
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

attribute_extraction.ipynb

download_the_pile_text_data.py

entity_processing.ipynb

general_utils.py

multihop-CoT.ipynb

next_token_prediction.ipynb

patch_cross_model.ipynb

patchscopes_utils.py

requirements.txt

🩺 Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Overview

We propose a framework that decodes specific information from a representation within an LLM by “patching” it into the inference pass on a different prompt that has been designed to encourage the extraction of that information. A "Patchscope" is a configuration of our framework that can be viewed as an inspection tool geared towards a particular objective.

For example, this figure shows a simple Patchscope for decoding what is encoded in the representation of "CEO" in the source prompt (left). We patch a target prompt (right) comprised of few-shot demonstrations of token repetitions, which encourages decoding the token identity given a hidden representation.

[Paper] [Project Website]

💾 Download textual data

The script is provided here. Use the following command to run it:

python3 download_the_pile_text_data.py

🦙 For using Vicuna-13B

Run the following command for using the Vicuna 13b model (see also details here):

python3 apply_delta.py --base meta-llama/Llama-2-13b-hf --target ./stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta

🧪 Experiments

(1) Next Token Prediction

The main code used appears here.

(2) Attribute Extraction

For this experiment, you should download the preprocessed_data directory. The main code used appears here.

(3) Entity Processing

The main code used appears here. The dataset is available for downloading here.

(4) Cross-model Patching

The main code used appears here.

(5) Self-Correction in Multi-Hop Reasoning

For this experiment, you should download the preprocessed_data directory. The main code used appears here. The code provided supports the Vicuna-13B model.

📙 BibTeX

@misc{ghandeharioun2024patchscopes,
      title={Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models},
      author={Ghandeharioun, Asma and Caciularu, Avi and Pearce, Adam and Dixon, Lucas and Geva, Mor},
      year={2024},
      eprint={2401.06102},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

🩺 Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Overview

💾 Download textual data

🦙 For using Vicuna-13B

🧪 Experiments

(1) Next Token Prediction

(2) Attribute Extraction

(3) Entity Processing

(4) Cross-model Patching

(5) Self-Correction in Multi-Hop Reasoning

📙 BibTeX

FilesExpand file tree

code

Directory actions

More options

Directory actions

More options

Latest commit

History

code

Folders and files

parent directory

README.md

🩺 Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Overview

💾 Download textual data

🦙 For using Vicuna-13B

🧪 Experiments

(1) Next Token Prediction

(2) Attribute Extraction

(3) Entity Processing

(4) Cross-model Patching

(5) Self-Correction in Multi-Hop Reasoning

📙 BibTeX