Fix QwenImage txt_seq_lens handling #12702

kashif · 2025-11-23T18:03:55Z

What does this PR do?

Removes the redundant txt_seq_lens plumbing from all QwenImage pipelines and modular steps; the transformer now infers text length from encoder inputs/masks and validates optional overrides.
Builds a lightweight broadcastable attention mask from encoder_hidden_states_mask inside the double-stream attention, avoiding full seq_len² masks while keeping padding tokens masked.
Adjusts QwenImage Transformer/ControlNet RoPE to take a single text length and documents the fallback behavior.
Adds regression tests to ensure short txt_seq_lens values and encoder masks are handled safely.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-11-23T18:12:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dxqb · 2025-11-29T06:54:47Z

just a few comments, not a full review:

there is some overlap with Fix qwen encoder hidden states mask #12655
this code has the same issue mentioned in Fix qwen encoder hidden states mask #12655 (expecting boolean semantics in a FloatTensor - but float attention masks are interpreted differently)
Could you clarify what the purpose of this PR is?
If the purpose is to remove the txt_seq_lens parameters, and infer the sequence lengths from the attention mask: why is it still a parameter of the transformer model?
If the purpose is towards passing sequence lengths to the attention dispatch (see Qwen Image: txt_seq_lens is redundant and not used #12344 (comment)), the sequence lengths for each batch sample must be inferred from the mask and passed to the transformer blocks, not only the max sequence length across all batch samples for RoPE

dxqb · 2025-11-29T06:56:41Z

src/diffusers/models/controlnets/controlnet_qwenimage.py

+                raise ValueError(f"`txt_seq_lens` must have length {batch_size}, but got {len(txt_seq_lens)} instead.")
+            text_seq_len = max(text_seq_len, max(txt_seq_lens))
+        elif encoder_hidden_states_mask is not None:
+            text_seq_len = max(text_seq_len, int(encoder_hidden_states_mask.sum(dim=1).max().item()))


This only works if the attention mask is in the form of [True, True, True, ..., False, False, False]. While this is the case in the most common use case of text attention masks, it doesn't have to be the case.

If the mask is [True, False, True, False, True, False], self.pos_embed receives an incorrect sequence length

kashif · 2025-11-29T15:41:50Z

thanks @dxqb the idea was to remove txt_seq_lens all together and work with any mask pattern

yiyixuxu

thanks a ton for the PR! @kashif
I left one question, let me know!

src/diffusers/models/transformers/transformer_qwenimage.py

- Remove seq_lens parameter from dispatch_attention_fn - Update varlen backends to extract seqlens from masks - Update QwenImage to pass 2D joint_attention_mask - Fix native backend to handle 2D boolean masks - Fix sage_varlen seqlens_q to match seqlens_k for self-attention Note: sage_varlen still producing black images, needs further investigation

…to txt_seq_lens

sayakpaul · 2025-12-08T05:59:39Z

#12655 provides some benchmarks on the speed, as well. Possible to provide them here too? @kashif

kashif · 2025-12-08T10:58:12Z

some benchmarks with various backends:

code:
benchmark_backends_qwen.py

Co-authored-by: YiYi Xu <[email protected]>

yiyixuxu

thanks so much for helping us on this issue @kashif
changes looks good to me.

I asked the qwen team to review it too so we will wait for their feedbacks now :)

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

cdutr · 2025-12-11T22:16:28Z

Hey @kashif! I've prepared a documentation update with a new Performance section covering:

Attention backend benchmarks (from your tests)
torch.compile speedup (~2.4x)
Variable-length prompt handling with CFG

I am also mentioning the gist with the scripts I used

Can you double check?

Is there anything missing, or something else I can help with?

src/diffusers/models/transformers/transformer_qwenimage.py

naykun · 2025-12-12T06:36:33Z

Thank you so much for this excellent PR! It’s clean, well-structured, and addresses several long-standing issues. I’ve left a few questions —we can discuss them further.

sayakpaul

Left some more comments. LMK if anything is unclear.

sayakpaul · 2025-12-17T14:31:32Z

docs/source/en/api/pipelines/qwenimage.md

+image = pipe("a cat", num_inference_steps=50).images[0]
+```
+
+### Batched Inference with Variable-Length Prompts


Instead of specifying performance numbers on torch.compile and other attention backends, maybe we could highlight this point and include with and without torch.compile numbers? @cdutr WDYT?

Good point! I've simplified the Performance section to focus on torch.compile with the before/after numbers,
removed the attention backend tables since the differences between backends are minimal compared to the torch.compile gains

src/diffusers/models/attention_dispatch.py

sayakpaul · 2025-12-17T14:35:20Z

src/diffusers/models/attention_dispatch.py

    return out
+
+
+_maybe_download_kernel_for_backend(_AttentionBackendRegistry._active_backend)


Why do we need this?

Cc: @kashif

src/diffusers/models/controlnets/controlnet_qwenimage.py

sayakpaul · 2025-12-17T14:37:44Z

src/diffusers/models/transformers/transformer_qwenimage.py

+
+        # Validate batch inference with variable-sized images
+        # In Layer3DRope, the outer list represents batch, inner list/tuple represents layers
+        if isinstance(video_fhw, list) and len(video_fhw) > 1:


Cc: @naykun good for you?

src/diffusers/models/transformers/transformer_qwenimage.py

sayakpaul · 2025-12-17T14:39:45Z

tests/models/transformers/test_models_transformer_qwenimage.py

    def prepare_dummy_input(self, height, width):
        return QwenImageTransformerTests().prepare_dummy_input(height=height, width=width)

-    @pytest.mark.xfail(condition=True, reason="RoPE needs to be revisited.", strict=True)


Does it pass now? 👀

yes it passes

@kashif which versions are you using? I am getting:

FAILED tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTests::test_compile_works_with_aot - torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 1) (unhinted: E... FAILED tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTests::test_torch_compile_recompilation_and_graph_break - torch._dynamo.exc.Unsupported: Dynamic slicing with Tensor arguments FAILED tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTests::test_torch_compile_repeated_blocks - torch._dynamo.exc.Unsupported: Data-dependent branching

With:

RUN_SLOW=1 RUN_COMPILE=yes pytest tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTests

Would it be possible to fix the compile compatibility? With main, it runs as expected:

==================================== 3 passed, 1 skipped, 1 xfailed, 5 warnings in 109.93s (0:01:49) =====================================

Compilation issues still need to be resolved I believe.

RUN_SLOW=1 RUN_COMPILE=yes pytest tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTests ... ============================================================ 4 passed, 1 skipped, 12 warnings in 45.10s =============================================================

Yes passing for me as well. What was the major change?

had to fix device placements and data dependent ifs

tests/models/transformers/test_models_transformer_qwenimage.py

Removes detailed attention backend benchmarks and simplifies torch.compile performance description Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU Streamlines the documentation to provide more concise and actionable performance insights

Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter

…to txt_seq_lens

sayakpaul · 2025-12-18T04:30:04Z

tests/models/transformers/test_models_transformer_qwenimage.py

+            "patch_size": 2,
+            "in_channels": 16,
+            "out_channels": 16,
+            "num_layers": 2,
+            "attention_head_dim": 128,
+            "num_attention_heads": 4,
+            "joint_attention_dim": 16,


Can we use smaller values please? @cdutr

I also think we should use_additional_t_cond=True here?

@kashif already addressed it

Fix QwenImage txt_seq_lens handling

b547fcf

kashif requested a review from sayakpaul November 23, 2025 18:03

kashif added 2 commits November 23, 2025 18:24

formatting

72a80c6

formatting

88cee8b

sayakpaul requested a review from yiyixuxu November 24, 2025 01:57

dxqb reviewed Nov 29, 2025

View reviewed changes

kashif added 2 commits November 29, 2025 15:39

remove txt_seq_lens and use bool mask

ac5ac24

Merge branch 'main' into txt_seq_lens

0477526

kashif added 4 commits November 30, 2025 14:11

use compute_text_seq_len_from_mask

18efdde

add seq_lens to dispatch_attention_fn

6a549d4

use joint_seq_lens

2d424e0

remove unused index_block

30b5f98

yiyixuxu reviewed Dec 5, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_qwenimage.py Outdated Show resolved Hide resolved

kashif added 6 commits December 6, 2025 19:13

Merge branch 'main' into txt_seq_lens

588dc04

Merge branch 'txt_seq_lens' of https://github.com/kashif/diffusers in…

ec52417

…to txt_seq_lens

fix formatting

beeb020

undo sage changes

5c6f8e3

xformers support

5d434f6

yiyixuxu mentioned this pull request Dec 8, 2025

Fix: Normalize batch inputs to 5D tensors for Qwen-Image-Edit #12698

Open

6 tasks

hub fix

71ba603

kashif added 2 commits December 8, 2025 12:05

Merge branch 'main' into txt_seq_lens

babf490

fix torch compile issues

afad335

This was referenced Dec 9, 2025

The Diffusers MVP 🚀 #12635

Open

fix(hooks): Add padding support to context parallel hooks #12595

Open

kashif and others added 4 commits December 10, 2025 10:36

Update src/diffusers/models/transformers/transformer_qwenimage.py

3676d8e

Co-authored-by: YiYi Xu <[email protected]>

Only create the mask if there's actual padding

9ed0ffd

Merge branch 'main' into txt_seq_lens

abec461

fix order of docstrings

e26e7b3

yiyixuxu approved these changes Dec 10, 2025

View reviewed changes

sayakpaul mentioned this pull request Dec 11, 2025

Fix qwen encoder hidden states mask #12655

Closed

4 tasks

Adds performance benchmarks and optimization details for QwenImage

59e3882

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

naykun suggested changes Dec 12, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_qwenimage.py Show resolved Hide resolved

src/diffusers/models/transformers/transformer_qwenimage.py Outdated Show resolved Hide resolved

src/diffusers/models/transformers/transformer_qwenimage.py Outdated Show resolved Hide resolved

kashif added 7 commits December 12, 2025 17:09

Merge branch 'main' into txt_seq_lens

0cb2138

rope_text_seq_len = text_seq_len

60bd454

rename to max_txt_seq_len

a5abbb8

Merge branch 'main' into txt_seq_lens

8415c57

Merge branch 'main' into txt_seq_lens

afff5b7

Merge branch 'main' into txt_seq_lens

8dc6c3f

removed deprecated args

22cb03d

sayakpaul reviewed Dec 17, 2025

View reviewed changes

kashif and others added 10 commits December 17, 2025 15:01

undo unrelated change

125a3a4

Updates deprecation warnings for txt_seq_lens parameter

61f5265

Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter

fix compile

2ef38e2

Merge branch 'txt_seq_lens' of https://github.com/kashif/diffusers in…

270c63f

…to txt_seq_lens

formatting

35efa06

fix compile tests

50c4815

Merge branch 'main' into txt_seq_lens

c88bc06

rename helper

1433783

remove duplicate

8de799c

sayakpaul reviewed Dec 18, 2025

View reviewed changes

smaller values

fc93747

		return out


		_maybe_download_kernel_for_backend(_AttentionBackendRegistry._active_backend)

Fix QwenImage txt_seq_lens handling #12702

Are you sure you want to change the base?

Fix QwenImage txt_seq_lens handling #12702

Conversation

kashif commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 23, 2025

Uh oh!

dxqb commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kashif commented Nov 29, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Dec 8, 2025

Uh oh!

kashif commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

cdutr commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

naykun commented Dec 12, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

kashif commented Nov 23, 2025 •

edited

Loading

dxqb commented Nov 29, 2025 •

edited

Loading

kashif commented Dec 8, 2025 •

edited

Loading

cdutr commented Dec 11, 2025 •

edited

Loading

sayakpaul Dec 18, 2025 •

edited

Loading