🩺 Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
We propose a framework that decodes specific information from a representation within an LLM by “patching” it into the inference pass on a different prompt that has been designed to encourage the extraction of that information. A "Patchscope" is a configuration of our framework that can be viewed as an inspection tool geared towards a particular objective.
For example, this figure shows a simple Patchscope for decoding what is encoded in the representation of "CEO" in the source prompt (left). We patch a target prompt (right) comprised of few-shot demonstrations of token repetitions, which encourages decoding the token identity given a hidden representation.
The script is provided here. Use the following command to run it:
python3 download_the_pile_text_data.pyRun the following command for using the Vicuna 13b model (see also details here):
python3 apply_delta.py --base meta-llama/Llama-2-13b-hf --target ./stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-deltaThe main code used appears here.
For this experiment, you should download the preprocessed_data directory.
The main code used appears here.
The main code used appears here. The dataset is available for downloading here.
The main code used appears here.
For this experiment, you should download the preprocessed_data directory.
The main code used appears here. The code provided supports the Vicuna-13B model.
@misc{ghandeharioun2024patchscopes,
title={Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models},
author={Ghandeharioun, Asma and Caciularu, Avi and Pearce, Adam and Dixon, Lucas and Geva, Mor},
year={2024},
eprint={2401.06102},
archivePrefix={arXiv},
primaryClass={cs.CL}
}