WEASEL

Official code for the ICML 2026 paper WEASEL: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection.

[Paper] [Project Page]

WEASEL selects compact, goal-relevant, and diverse web-agent trajectory steps to improve out-of-domain generalization while reducing training cost.

This repository currently contains the cleaned data-selection pipeline:

Prune AXTree states.
Compute goal-relevance and pairwise distance scores.
Run the WEASEL greedy subset-selection objective.
Build the final training subset, including length filtering and 10K subsampling.

We do not include the original training datasets in this repository. To download AgentTrek, please refer to the official xlang-ai/AgentTrek repository. In the commands below, replace path/to/train.json with the local path to the downloaded training file.

If you want to skip the preprocessing steps and directly use our WEASEL-selected training dataset, it will be available here:

WEASEL-selected AgentTrek training dataset: weasel_agenttrek_train_10k.json

0. AXTree Pruning

We use target-centered AXTree pruning before score computation, with a threshold-based fallback when the action does not reference a valid bid.

python -m weasel.prune_axtree \
  --input path/to/train.json \
  --output path/to/train_pruned.json \
  --window-size 60 \
  --fallback-threshold 120

1. Prepare Scores

Run score preprocessing on the downloaded training data:

python -m weasel.prepare_scores \
  --input path/to/train_pruned.json \
  --output path/to/goals_with_scores.json \
  --augmented-dataset-output path/to/train_with_phi_scores.json

2. Greedy Selection

Run greedy subset selection using the precomputed scores:

python -m weasel.select_greedy \
  --input path/to/goals_with_scores.json \
  --output path/to/full_selected_dataset_indices_T0_3.json

3. Postprocess Dataset

Build the final WEASEL training subset:

python -m weasel.postprocess_dataset \
  --dataset path/to/train_pruned.json \
  --selected-indices path/to/full_selected_dataset_indices_T0_3.json \
  --output path/to/weasel_train_10k.json \
  --max-user-chars 40000 \
  --max-examples 10000 \
  --seed 0

Training

For supervised fine-tuning, we used hiyouga/LLaMA-Factory. After building the WEASEL-selected training file, you can use it as the training dataset in a LLaMA-Factory SFT run.

If you want to directly use our trained model checkpoints, they are available in the WEASEL Hugging Face collection:

Qwen2.5-7B-Instruct WEASEL checkpoint
Gemma3-4B-IT WEASEL checkpoint
Qwen3-8B WEASEL checkpoint

Evaluation

For WebArena evaluation, please refer to web-arena-x/webarena.

For MiniWob evaluation, please refer to the MiniWob documentation and Farama-Foundation/miniwob-plusplus.

For WorkArena evaluation, please refer to ServiceNow/WorkArena.

Citation

@inproceedings{pesaranzadeh2026weasel,
  title     = {{WEASEL}: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection},
  author    = {Pesaran Zadeh, Fatemeh and Choi, Seyeon and L\`u, Xing Han and Reddy, Siva and Kim, Gunhee},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
weasel		weasel
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WEASEL

0. AXTree Pruning

1. Prepare Scores

2. Greedy Selection

3. Postprocess Dataset

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WEASEL

0. AXTree Pruning

1. Prepare Scores

2. Greedy Selection

3. Postprocess Dataset

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages