Add Wan2.2-Animate: Unified Character Animation and Replacement with Holistic Replication#12442
Closed
tolgacangoz wants to merge 50 commits intohuggingface:mainfrom
Closed
Add Wan2.2-Animate: Unified Character Animation and Replacement with Holistic Replication#12442tolgacangoz wants to merge 50 commits intohuggingface:mainfrom
tolgacangoz wants to merge 50 commits intohuggingface:mainfrom
Conversation
- Introduced WanAnimateTransformer3DModel and WanAnimatePipeline. - Updated get_transformer_config to handle the new model type. - Modified convert_transformer to instantiate the correct transformer based on model type. - Adjusted main execution logic to accommodate the new Animate model type.
…prove error handling for undefined parameters
…work for character animation and replacement - Added Wan 2.2 Animate 14B model to the documentation. - Introduced the Wan-Animate framework, detailing its capabilities for character animation and replacement. - Included example usage for the WanAnimatePipeline with preprocessing steps and guidance on input requirements.
- Introduced `WanAnimateGGUFSingleFileTests` to validate functionality. - Added dummy input generation for testing model behavior.
- Introduced `EncoderApp`, `Encoder`, `Direction`, `Synthesis`, and `Generator` classes for enhanced motion and appearance encoding. - Added `FaceEncoder`, `FaceBlock`, and `FaceAdapter` classes to integrate facial motion processing. - Updated `WanTimeTextImageMotionEmbedding` to utilize the new `Generator` for motion embedding. - Enhanced `WanAnimateTransformer3DModel` with additional face adapter and pose patch embedding for improved model functionality.
- Introduced `pad_video` method to handle padding of video frames to a target length. - Updated video processing logic to utilize the new padding method for `pose_video`, `face_video`, and conditionally for `background_video` and `mask_video`. - Ensured compatibility with existing preprocessing steps for video inputs.
…roved video processing - Added optional parameters: `conditioning_pixel_values`, `refer_pixel_values`, `refer_t_pixel_values`, `bg_pixel_values`, and `mask_pixel_values` to the `prepare_latents` method. - Updated the logic in the denoising loop to accommodate the new parameters, enhancing the flexibility and functionality of the pipeline.
…eneration - Updated the calculation of `num_latent_frames` and adjusted the shape of latent tensors to accommodate changes in frame processing. - Enhanced the `get_i2v_mask` method for better mask generation, ensuring compatibility with new tensor shapes. - Improved handling of pixel values and device management for better performance and clarity in the video processing pipeline.
…and mask generation - Consolidated the handling of `pose_latents_no_ref` to improve clarity and efficiency in latent tensor calculations. - Updated the `get_i2v_mask` method to accept batch size and adjusted tensor shapes accordingly for better compatibility. - Enhanced the logic for mask pixel values in the replacement mode, ensuring consistent processing across different scenarios.
…nced processing - Introduced custom QR decomposition and fused leaky ReLU functions for improved tensor operations. - Implemented upsampling and downsampling functions with native support for better performance. - Added new classes: `FusedLeakyReLU`, `Blur`, `ScaledLeakyReLU`, `EqualConv2d`, `EqualLinear`, and `RMSNorm` for advanced neural network layers. - Refactored `EncoderApp`, `Generator`, and `FaceBlock` classes to integrate new functionalities and improve modularity. - Updated attention mechanism to utilize `dispatch_attention_fn` for enhanced flexibility in processing.
…annotations - Removed extra-abstractioned-functions such as `custom_qr`, `fused_leaky_relu`, and `make_kernel` to streamline the codebase. - Updated class constructors and method signatures to include type hints for better clarity and type checking. - Refactored the `FusedLeakyReLU`, `Blur`, `EqualConv2d`, and `EqualLinear` classes to enhance readability and maintainability. - Simplified the `Generator` and `Encoder` classes by removing redundant parameters and improving initialization logic.
…e class structures - Replaced several custom classes with standard PyTorch layers for improved maintainability, including `EqualConv2d` and `EqualLinear`. - Introduced `WanAnimateMotionerEncoderApp`, `WanAnimateMotionerEncoder`, and `WanAnimateMotionerSynthesis` classes to better encapsulate functionality. - Updated the `ConvLayer` class to streamline downsampling and activation processes. - Refactored `FaceBlock` and `FaceAdapter` classes to incorporate new naming conventions and improve clarity. - Removed unused functions and classes to simplify the codebase.
- Added new key mappings for the Animate model's transformer architecture. - Implemented weight conversion functions for `EqualLinear` and `EqualConv2d` to standard layers. - Updated `WanAnimatePipeline` to handle reference image encoding and conditioning properly. - Refactored the `WanAnimateTransformer3DModel` to include a new `motion_encoder_dim` parameter for improved flexibility.
…proved model integration - Updated key mappings in `convert_wan_to_diffusers.py` for the Animate model's transformer architecture. - Implemented weight scaling for `EqualLinear` and `EqualConv2d` layers. - Refactored `WanAnimateMotionEmbedder` and `WanAnimateFaceBlock` for better parameter handling. - Modified `WanAnimatePipeline` to support new reference image encoding and conditioning logic. - Switched scheduler to `UniPCMultistepScheduler` for improved performance.
… conditioning logic - Added parameters `y_ref` and `calculate_noise_latents_only` to improve flexibility in processing. - Streamlined the encoding of reference images and conditioning videos. - Adjusted tensor concatenation and masking logic for better clarity. - Updated return values to accommodate new processing paths based on `mask_reft_len` and `calculate_noise_latents_only` flags.
Collaborator
|
hi @tolgacangoz thanks for opening the PR! we will make it very clear that you are the author of the PR and make sure you get the credit for the contribution |
Contributor
Author
|
I think everthing other than preprocessing should be OK 95%. I can go on to complete the remaining today and you can work on the preprocessing part? |
- Added checks to skip unnecessary transformations for specific keys, including blur kernels and biases. - Implemented renaming of sequential indices to named components for better clarity in weight handling. - Introduced scaling for `EqualLinear` and `EqualConv2d` weights, ensuring compatibility with the Animate model's architecture. - Added comments and TODOs for future verification and simplification of the conversion process.
…es for animation and replacement modes, and improving test coverage for various scenarios.
Updated contribution attribution for the Wan-Animate model.
tolgacangoz
commented
Oct 21, 2025
|
|
||
| The project page: https://humanaigc.github.io/wan-animate | ||
|
|
||
| This model was mostly contributed by [M. Tolga Cangöz](https://github.com/tolgacangoz). |
Contributor
Author
There was a problem hiding this comment.
- Reverted the order of face_embedder norms to their original configuration for improved clarity. - Introduced a placeholder for `face_encoder.norm2` to maintain compatibility with the existing architecture.
…nference_steps, and guidance_scale
… in WanAnimatePipeline
… simplify expected output validation
Contributor
Author
|
Sounds good, thanks @yiyixuxu 🙌. |
yiyixuxu
pushed a commit
that referenced
this pull request
Nov 13, 2025
…oz) (#12526) --------- Co-authored-by: Tolga Cangöz <mtcangoz@gmail.com> Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is fixing #12441.
Project Page: https://humanaigc.github.io/wan-animate/
THE LATEST STATUS: #12442 (comment)
Temporary HF repo: https://huggingface.co/tolgacangoz/Wan2.2-Animate-14B-Diffusers
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@yiyixuxu @sayakpaul @asomoza
@WanX-Video-1 @suruoxi
@J4BEZ @lopho @hitchhiker3010 @tin2tin @cjkindel @a-free-a