Asim Mohamed, Martin Gubri
African Institute for Mathematical Sciences (AIMS), Parameter Lab
Official implementation of "Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution".
In this work, we introduce STEAM (Simple Translation-Enhanced Approach for Multilingual watermarking), a novel defense mechanism designed to enhance the robustness of LLM watermarks against translation-based attacks.
The work was supported by Parameter Lab, which provided the compute resources and covered the API costs of large language models.
- Overview
- Installation
- Basic Workflow
- Code Structure
- Core Components
- Evaluation Workflow
- Configuration
- Cite
Existing multilingual watermarking methods, such as X-SIR, claim cross-lingual robustness but have been tested almost exclusively on high-resource languages. When evaluated across a wider range of languages, these methods fail to maintain watermark strength under translation attacks—especially for medium- and low-resource languages like Tamil or Bengali.
This degradation arises because semantic clustering (grouping equivalent words like “house–maison–casa”) depends heavily on tokenizer coverage: languages with fewer full-word tokens lose semantic alignment, making watermarks fragile to translation.
These findings reveal that current multilingual watermarking is not truly multilingual, as robustness collapses when token coverage decreases or when text is translated into underrepresented languages
STEAM addresses this limitation with a simple, detection-time defense that uses multilingual back-translation to recover watermark signals lost during translation.
Given a suspect text, STEAM translates it back into multiple supported languages, evaluates each version using a standard watermark detector (e.g., KGW), applies language-specific z-score normalization, and takes the maximum normalized score as the decisive signal. This process effectively restores watermark strength regardless of language or tokenizer. Across 17 diverse languages, STEAM achieves up to +0.33 AUC and +64.6 percentage-point TPR@1% gains over prior methods, remaining robust even under translator mismatches and multi-step translation attacks.
In essence, STEAM provides a model-agnostic, non-invasive, and retroactively extensible defense that ensures fair watermark detection across high-, medium-, and low-resource languages.
This project requires Python 3.10.17.
python3.10 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtSTEAM is designed as a plug-in defense layer that works with existing watermarking frameworks such as X-SIR, X-KGW, and KGW.
# 1. Prepare bilingual dictionaries
bash data/dictionary/download_dictionaries.sh new_supported
bash data/dictionary/build_dictionaries.sh new_supported
# 2. Generate semantic mappings
bash evaluation/scripts/generate_mapping.sh new_supported
# 3. Generate watermarked and human text
bash evaluation/scripts/generate_watermark.sh new_supported
bash evaluation/scripts/generate_human.sh new_supported
# 4. Evaluate detection performance
bash evaluation/scripts/evaluate_detection.sh new_supported| Category | Description |
|---|---|
new_supported |
Run experiments with the new set of supported languages |
original_supported |
Use only the original supported languages (en, fr, de, zh, ja) |
unsupported |
Evaluate unsupported languages |
Languages for each category can be configured in evaluation/common/languages.sh files.
STEAM/
├── gen.py # Generate watermarked text
├── detect.py # Compute z-scores for detection
├── utils.py # Shared utility functions
│
├── data/
│ ├── dataset/ # MC4 prompts (en, fr, de, zh, etc.)
│ ├── dictionary/ # Bilingual dictionaries (MUSE-based)
│ ├── mapping/ # Semantic mappings (X-SIR / X-KGW)
│ └── model/ # Pretrained transform models
│
├── evaluation/
│ ├── scripts/ # Automated generation & evaluation scripts
│ ├── common/ # Shared configs (models, languages)
│ └── eval_detection.py # Computes AUC, TPR@FPR, F1
│
└── watermarks/
├── xsir/ # X-SIR implementation
├── xkgw/ # X-KGW implementation
└── kgw/ # KGW implementation
Generates watermarked or baseline text from prompts.
python gen.py \
--base_model meta-llama/Llama-3.2-1B \
--input_file data/dataset/mc4.en.jsonl \
--output_file evaluation/gen/llama-3.2-1B/new_supported/xsir_seed0/mc4.en.mod.jsonl \
--watermark_method xsir \
--watermark_type context \
--mapping_file data/mapping/xsir/new_supported/mapping.json \
--transform_model data/model/transform_model_x-sbert.pthKey Arguments
--watermark_method:xsir,xkgw,kgw, ornone--mapping_file: Required for X-SIR and X-KGW methods
Computes z-scores for watermark detection.
python detect.py \
--base_model meta-llama/Llama-3.2-1B \
--detect_file evaluation/gen/llama-3.2-1B/new_supported/xsir_seed0/mc4.en.mod.jsonl \
--output_file evaluation/gen/llama-3.2-1B/new_supported/xsir_seed0/mc4.en.mod.z_score.jsonl \
--watermark_method xsir \
--watermark_type context \
--mapping_file data/mapping/xsir/new_supported/mapping.json \
--transform_model data/model/transform_model_x-sbert.pthComputes detection performance metrics including AUC, TPR@FPR, F1, and ROC curves.
python evaluation/eval_detection.py \
--hm_zscore evaluation/gen/llama-3.2-1B/new_supported/xsir_seed0/mc4.en-fr.hum.z_score.jsonl \
--wm_zscore evaluation/gen/llama-3.2-1B/new_supported/xsir_seed0/mc4.en-fr.mod.z_score.jsonlbash data/dictionary/download_dictionaries.sh new_supported
bash data/dictionary/build_dictionaries.sh new_supported
# to build a holdout dictionary by excluding a specific language:
bash data/dictionary/build_dictionaries.sh holdout enbash evaluation/scripts/generate_mapping.sh new_supported
bash evaluation/scripts/generate_watermark.sh new_supported
bash evaluation/scripts/generate_human.sh new_supported
# for holdout settings, pass the excluded language as an argument
bash evaluation/scripts/generate_mapping.sh holdout en
bash evaluation/scripts/generate_watermark_holdout.sh en
bash evaluation/scripts/generate_human_holdout.sh enbash evaluation/scripts/evaluate_detection.sh new_supported
# for holdout settings:
bash evaluation/scripts/evaluate_detection_holdout.sh enThis will iterate over:
- Base models (defined in
evaluation/common/config.sh) - Seeds (default: 0, 42, 123)
- Watermark methods (
xsir,xkgw,kgw) - Languages (defined in
evaluation/common/languages.sh)
Outputs are stored under:
evaluation/gen/<model>/<category>/<method>_seed<seed>/
To modify experiment settings:
Edit evaluation/common/config.sh and evaluation/common/utils.sh to change:
- Base models
- Seeds
- Watermark schemes
- Generation parameters
If you find our work useful, please consider citing it:
@misc{mohamed2025multilingualllmwatermarkingtruly,
title={Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution},
author={Asim Mohamed and Martin Gubri},
year={2025},
eprint={2510.18019},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.18019},
}

