Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

How to run the code?

Install the conda enviroment

You can install the required dependencies as the instruction in SOUL:

Run the Unlearn part

bash run.sh

In run.sh, command is like:

# Put your own lm-evaluation-harness path here
export PYTHONPATH=lm-evaluation-harness:$PYTHONPATH

ALPHA="1.4,1.4"
LR="7.5e-5"
DATA_NUM="500" # This is the data number for unlearning
NAME="reasoning_assistant"
assist_loss="1"

MODEL_NAME="deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
OUTPUT_NAME="alpha${ALPHA//,/x}_lr${LR}_wmdp_${DATA_NUM}_${NAME}_assist_loss_${assist_loss}"
OUTPUT_DIR="models/${OUTPUT_NAME}"
LOG_FILE="${OUTPUT_NAME}.log"

CUDA_VISIBLE_DEVICES=0,1 python3 -m unlearn_wmdp \
  --model_name_or_path ${MODEL_NAME} \
  --max_num_batches ${DATA_NUM} \
  --batch_size 4 \
  --retain_corpora wikitext \
  --forget_corpora original \
  --steering_coeffs 6.5,6.5 \
  --alpha ${ALPHA} \
  --lr ${LR} \
  --assist_loss ${assist_loss} \
  --seed 42 \
  --output_dir ${OUTPUT_DIR} \
  --generated_path ./generated_all_wmdp.jsonl \ # This is the reasoning trace generated with your original model
  --raw_path ./bio_remove_dataset.jsonl \  # This is the WMPD bio dataset
  --max_gen_tokens 100 \
  --verbose

LLM API Evaluation

After you get the unlearned model, run the generation code to get the reasoning trace first:

The first step is change your model in utils.py, add your model like this:

    "RMU_unlearn_test_11_2_2025": {
        "model_name": "", # Add your own model path.
        "tokenizer_name": "", # Add your own model path.
        "special_token_id": 128014
    },

The second step is generation, run command:

bash ./evaluate/run.sh

The command in run.sh is like this:

Change the --max_samples to 100000 if you want run the whole WMPD evaluation. Change model_choice to your own model name.

CUDA_VISIBLE_DEVICES=0,1,2,3,4 torchrun --nproc_per_node=5 evaluate_claude_save.py --mode Reason_think --datasets wmdp --model_choice RMU_unlearn_test_11_2_2025 --wmdp_subject wmdp-bio --batch_size 4 --max_samples 10

And please change the API key in api_check_reasoning_trace_score_4.py and change the file path input_path in file then run the command:

python ./evaluate/api_check_reasoning_trace_score_4.py

Cite this work

@article{wang2025reasoning,
  title={Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills},
  author={Wang, Changsheng and Fan, Chongyu and Zhang, Yihua and Jia, Jinghan and Wei, Dennis and Ram, Parikshit and Baracaldo, Nathalie and Liu, Sijia},
  journal={arXiv preprint arXiv:2506.12963},
  year={2025}
}

Any problem about the code please contact the [email protected] directly!

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
evaluate		evaluate
lm-evaluation-harness		lm-evaluation-harness
metrics		metrics
README.md		README.md
alpha1.4x1.4_lr7.5e-5_wmdp_500_reasoning_assistant_assist_loss_1.log		alpha1.4x1.4_lr7.5e-5_wmdp_500_reasoning_assistant_assist_loss_1.log
generated_all_wmdp.jsonl		generated_all_wmdp.jsonl
run.sh		run.sh
unlearn_wmdp.py		unlearn_wmdp.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

How to run the code?

Install the conda enviroment

Run the Unlearn part

LLM API Evaluation

Cite this work

About

Uh oh!

Releases

Packages

Languages

OPTML-Group/Unlearn-R2MU

Folders and files

Latest commit

History

Repository files navigation

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

How to run the code?

Install the conda enviroment

Run the Unlearn part

LLM API Evaluation

Cite this work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages