Skip to content

[EMNLP 2025] Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

Notifications You must be signed in to change notification settings

OPTML-Group/Unlearn-R2MU

Repository files navigation

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

Image Image Image
Image Image Image Image

How to run the code?

Install the conda enviroment

You can install the required dependencies as the instruction in SOUL:

Run the Unlearn part

bash run.sh

In run.sh, command is like:

# Put your own lm-evaluation-harness path here
export PYTHONPATH=lm-evaluation-harness:$PYTHONPATH

ALPHA="1.4,1.4"
LR="7.5e-5"
DATA_NUM="500" # This is the data number for unlearning
NAME="reasoning_assistant"
assist_loss="1"

MODEL_NAME="deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
OUTPUT_NAME="alpha${ALPHA//,/x}_lr${LR}_wmdp_${DATA_NUM}_${NAME}_assist_loss_${assist_loss}"
OUTPUT_DIR="models/${OUTPUT_NAME}"
LOG_FILE="${OUTPUT_NAME}.log"

CUDA_VISIBLE_DEVICES=0,1 python3 -m unlearn_wmdp \
  --model_name_or_path ${MODEL_NAME} \
  --max_num_batches ${DATA_NUM} \
  --batch_size 4 \
  --retain_corpora wikitext \
  --forget_corpora original \
  --steering_coeffs 6.5,6.5 \
  --alpha ${ALPHA} \
  --lr ${LR} \
  --assist_loss ${assist_loss} \
  --seed 42 \
  --output_dir ${OUTPUT_DIR} \
  --generated_path ./generated_all_wmdp.jsonl \ # This is the reasoning trace generated with your original model
  --raw_path ./bio_remove_dataset.jsonl \  # This is the WMPD bio dataset
  --max_gen_tokens 100 \
  --verbose

LLM API Evaluation

After you get the unlearned model, run the generation code to get the reasoning trace first:

The first step is change your model in utils.py, add your model like this:

    "RMU_unlearn_test_11_2_2025": {
        "model_name": "", # Add your own model path.
        "tokenizer_name": "", # Add your own model path.
        "special_token_id": 128014
    },

The second step is generation, run command:

bash ./evaluate/run.sh

The command in run.sh is like this:

Change the --max_samples to 100000 if you want run the whole WMPD evaluation. Change model_choice to your own model name.

CUDA_VISIBLE_DEVICES=0,1,2,3,4 torchrun --nproc_per_node=5 evaluate_claude_save.py --mode Reason_think --datasets wmdp --model_choice RMU_unlearn_test_11_2_2025 --wmdp_subject wmdp-bio --batch_size 4 --max_samples 10

And please change the API key in api_check_reasoning_trace_score_4.py and change the file path input_path in file then run the command:

python ./evaluate/api_check_reasoning_trace_score_4.py

Cite this work

@article{wang2025reasoning,
  title={Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills},
  author={Wang, Changsheng and Fan, Chongyu and Zhang, Yihua and Jia, Jinghan and Wei, Dennis and Ram, Parikshit and Baracaldo, Nathalie and Liu, Sijia},
  journal={arXiv preprint arXiv:2506.12963},
  year={2025}
}

Any problem about the code please contact the [email protected] directly!

About

[EMNLP 2025] Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published