Ye Fang1,2, Tong Wu✉️3, Valentin Deschaintre1, Duygu Ceylan1, Iliyan Georgiev1,
Chun-Hao Paul Huang1, Yiwei Hu1 Xuelin Chen1, Tuanfeng Yang Wang✉️1
1Adobe Research 2Fudan University 3Stanford University
TLDR: V-RGBX enables physically grounded video editing by decomposing videos into intrinsic properties and propagating keyframe edits over time, producing photorealistic and precisely controlled results.
Paper | Project page | Video | Huggingface
Click for the full abstract of V-RGBX
Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages them for video synthesis, and supports editable intrinsic representations remains unexplored. We present V-RGBX, the first end-to-end framework for intrinsic-aware video editing. V-RGBX unifies three key capabilities: (1) video inverse rendering into intrinsic channels, (2) photorealistic video synthesis from these intrinsic representations, and (3) keyframe-based video editing conditioned on intrinsic channels. At the core of V-RGBX is an interleaved conditioning mechanism that enables intuitive, physically grounded video editing through user-selected keyframes, supporting flexible manipulation of any intrinsic modality. Extensive qualitative and quantitative results show that V-RGBX produces temporally consistent, photorealistic videos while propagating keyframe edits across sequences in a physically plausible manner. We demonstrate its effectiveness in diverse applications, including object appearance editing and scene-level relighting, surpassing the performance of prior methods.
*This work was partially done while Ye was an intern at Adobe Research.
- 🚀🚀 [Jan 15, 2026] We release the V-RGBX model weights and inference code, including inverse rendering, forward rendering, and intrinsic-aware video editing. [Model Weights] · [Inference]
- 🚀🚀 [Dec 15, 2025] The paper and project page are released!
- 🔥 The first end-to-end intrinsic-aware video editing framework, enabling physically grounded control over albedo, normal, material, and irradiance.
- 🔥 A unified RGB → X → RGB pipeline that supports keyframe-based edit propagation across time via inverse and forward rendering.
- 🔥 Interleaved intrinsic conditioning with temporal-aware type embeddings enables precise and disentangled edits across different intrinsic properties with temporal coherence.
git clone https://github.com/Aleafy/V-RGBX.git
cd V-RGBXconda create -n vrgbx python=3.10
conda activate vrgbx
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
pip install -e .The model weights are available on Hugging Face. We provide 2 checkpoints:
| Checkpoints | Description |
|---|---|
| aleafy/vrgbx_inverse_renderer | Decomposes an input RGB video into intrinsic channels (albedo, normal, material, irradiance). |
| aleafy/vrgbx_forward_renderer | Renders a photorealistic RGB video from intrinsic channels and propagates keyframe edits over time. |
You can download V-RGBX model weights by running the following command:
python vrgbx/utils/download_weights.py --repo_id aleafy/V-RGBXThe pretrained backbone (built on WAN) can be downloaded with:
python vrgbx/utils/download_weights.py --repo_id Wan-AI/Wan2.1-T2V-1.3BExpected project directory:
V-RGBX/ # Project root for the V-RGBX framework
├── assets/ # Media resources(logos, figures, etc)
├── examples/ # Example videos, intrinsics, and reference images
├── models/ # Model weights directory
├── V-RGBX/ # V-RGBX intrinsic rendering models
│ ├── vrgbx_forward_renderer.safetensors
│ └── vrgbx_inverse_renderer.safetensors
└── Wan-AI/ # Pretrained backbone (Wan)
└── Wan2.1-T2V-1.3B/
└── vrgbx/ # Core V-RGBX codebase
python vrgbx_edit_inference.py \
--video_name Evermotion_CreativeLoft \
--task solid_color \
--edit_type albedoThis command automatically resolves all required inputs by video_name, applies the specified intrinsic edit, and re-renders the edited result to RGB.
Arguments
video_name: Video sequence name. All required RGB videos and reference images are automatically inferred from the dataset structure.task: A short tag used for file naming and auto path inference, e.g.texture,material,shadow,light_color,normal.edit_type: Intrinsic layer to edit, e.g.albedo,irradiance,material, ornormal.
Use your own video
Put your files in the same structure:
examples/
├── input_videos/
│ └── {your_video_name}.mp4
└── edit_images/
├── {your_video_name}_{your_task}_edit_ref.png # edited RGB reference
└── {your_video_name}_{your_task}_edit_x.png # edited intrinsic (for --edit_type)
Running command:
python vrgbx_edit_inference.py \
--video_name <your_video_name> \
--task <your_task> \
--edit_type <your_edit_type>🪄Click for more example bash commands of V-RGBX editing
python vrgbx_edit_inference.py \
--video_name AdobeStock_GradientShadow \
--task texture \
--edit_type albedopython vrgbx_edit_inference.py \
--video_name Evermotion_Lounge \
--task texture \
--edit_type albedopython vrgbx_edit_inference.py \
--video_name Captured_PoolTable \
--task texture \
--edit_type albedopython vrgbx_edit_inference.py \
--video_name Evermotion_Kitchenette \
--task light_color \
--edit_type irradiancepython vrgbx_edit_inference.py \
--video_name Evermotion_Studio \
--task shadow \
--edit_type irradiancepython vrgbx_edit_inference.py \
--video_name Evermotion_CreativeLoft \
--task shadow \
--edit_type irradiancepython vrgbx_edit_inference.py \
--video_name Evermotion_SingleWallKitchen \
--task normal \
--edit_type normal \
--drop_type irradiancepython vrgbx_edit_inference.py \
--video_path examples/input_videos/Evermotion_Lounge.mp4 \
--ref_rgb_path examples/edit_images/Evermotion_Lounge_multiple_edit_ref.png \
--edit_type irradiance --drop_type albedo \
--edit_x_path examples/edit_images/Evermotion_Lounge_multiple_edit_irradiance.png --task multiplepython vrgbx_inverse_rendering.py \
--video_path examples/input_videos/Evermotion_CreativeLoft.mp4 \
--save_dir output/inverse_rendering \
--channels albedo normal material irradianceThis command decomposes the input video into intrinsic representations (e.g., albedo, shading, geometry, material) and saves them for later intrinsic-aware editing.
You can also try other cases in examples/input_videos/ or use your own videos (recommended: 49 frames at 832×480 for better results).
The forward renderer reconstructs an RGB video from multiple intrinsic layers, including albedo, normal, material, and irradiance.
Without a reference image (pure intrinsic-driven rendering):
python vrgbx_forward_rendering.py \
--albedo_path examples/input_intrinsics/Evermotion_Banquet_Albedo.mp4 \
--normal_path examples/input_intrinsics/Evermotion_Banquet_Normal.mp4 \
--material_path examples/input_intrinsics/Evermotion_Banquet_Material.mp4 \
--irradiance_path examples/input_intrinsics/Evermotion_Banquet_Irradiance.mp4With a reference RGB image (to anchor global appearance and color tone):
python vrgbx_forward_rendering.py \
--albedo_path examples/input_intrinsics/Evermotion_Banquet_Albedo.mp4 \
--normal_path examples/input_intrinsics/Evermotion_Banquet_Normal.mp4 \
--material_path examples/input_intrinsics/Evermotion_Banquet_Material.mp4 \
--irradiance_path examples/input_intrinsics/Evermotion_Banquet_Irradiance.mp4 \
--use_reference \
--ref_rgb_path examples/input_intrinsics/Evermotion_Banquet_Ref.pngNote:
- The first mode reconstructs RGB solely from intrinsic layers.
- The second mode additionally uses a reference RGB image to provide global color and appearance guidance, improving visual fidelity.
- Due to intrinsic sampling mechanism, intrinsic channels do not need to be all provided — partial inputs are supported.
- Open-source V-RGBX models & weights
- Intrinsic-conditioned video editing inference
- Inverse rendering (RGB → X) inference
- Forward rendering (X → RGB) inference
- Inverse Renderer training code
- Forward Renderer training code
- DiffSynth-Studio: A modular diffusion framework for training and inference across mainstream diffusion models (e.g., FLUX and Wan), which provides the codebase used in our V-RGBX implementation.
- WAN-Video: A large-scale open video diffusion foundation model. We leverage its pretrained video generation capability as the base model for high-quality synthesis in our experiments.
- DiffusionRenderer: An influential line of work that bridges physically-based rendering and diffusion models, motivating our forward/inverse rendering formulation for intrinsic-aware video generation.
- RGB↔X: A seminal framework for intrinsic image decomposition and editing, laying the foundation for disentangled representations (e.g., albedo, normal, material, illumination).
If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝
@misc{fang2025vrgbxvideoeditingaccurate,
title={V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties},
author={Ye Fang and Tong Wu and Valentin Deschaintre and Duygu Ceylan and Iliyan Georgiev and Chun-Hao Paul Huang and Yiwei Hu and Xuelin Chen and Tuanfeng Yang Wang},
year={2025},
eprint={2512.11799},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.11799},
}Usage and License Notices:The source code of V-RGBX is released under the Apache License 2.0. The checkpoints and example data are released under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license and are intended for research and educational use only.

