Skip to content

fatemehpesaran310/weasel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WEASEL

Official code for the ICML 2026 paper WEASEL: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection.

[Paper] [Project Page]

WEASEL selects compact, goal-relevant, and diverse web-agent trajectory steps to improve out-of-domain generalization while reducing training cost.

WEASEL overview

This repository currently contains the cleaned data-selection pipeline:

  1. Prune AXTree states.
  2. Compute goal-relevance and pairwise distance scores.
  3. Run the WEASEL greedy subset-selection objective.
  4. Build the final training subset, including length filtering and 10K subsampling.

We do not include the original training datasets in this repository. To download AgentTrek, please refer to the official xlang-ai/AgentTrek repository. In the commands below, replace path/to/train.json with the local path to the downloaded training file.

If you want to skip the preprocessing steps and directly use our WEASEL-selected training dataset, it will be available here:

0. AXTree Pruning

We use target-centered AXTree pruning before score computation, with a threshold-based fallback when the action does not reference a valid bid.

python -m weasel.prune_axtree \
  --input path/to/train.json \
  --output path/to/train_pruned.json \
  --window-size 60 \
  --fallback-threshold 120

1. Prepare Scores

Run score preprocessing on the downloaded training data:

python -m weasel.prepare_scores \
  --input path/to/train_pruned.json \
  --output path/to/goals_with_scores.json \
  --augmented-dataset-output path/to/train_with_phi_scores.json

2. Greedy Selection

Run greedy subset selection using the precomputed scores:

python -m weasel.select_greedy \
  --input path/to/goals_with_scores.json \
  --output path/to/full_selected_dataset_indices_T0_3.json

3. Postprocess Dataset

Build the final WEASEL training subset:

python -m weasel.postprocess_dataset \
  --dataset path/to/train_pruned.json \
  --selected-indices path/to/full_selected_dataset_indices_T0_3.json \
  --output path/to/weasel_train_10k.json \
  --max-user-chars 40000 \
  --max-examples 10000 \
  --seed 0

Training

For supervised fine-tuning, we used hiyouga/LLaMA-Factory. After building the WEASEL-selected training file, you can use it as the training dataset in a LLaMA-Factory SFT run.

If you want to directly use our trained model checkpoints, they are available in the WEASEL Hugging Face collection:

  • Qwen2.5-7B-Instruct WEASEL checkpoint
  • Gemma3-4B-IT WEASEL checkpoint
  • Qwen3-8B WEASEL checkpoint

Evaluation

For WebArena evaluation, please refer to web-arena-x/webarena.

For MiniWob evaluation, please refer to the MiniWob documentation and Farama-Foundation/miniwob-plusplus.

For WorkArena evaluation, please refer to ServiceNow/WorkArena.

Citation

@inproceedings{pesaranzadeh2026weasel,
  title     = {{WEASEL}: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection},
  author    = {Pesaran Zadeh, Fatemeh and Choi, Seyeon and L\`u, Xing Han and Reddy, Siva and Kim, Gunhee},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026}
}

About

Official repository for our ICML 2026 paper Weasel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages