Qwen Image model Guide

Qwen Image is a new open-source text-to-image model developed by Alibaba’s Qwen team. It’s quickly gaining thumbs-up from AI creators. Unlike many closed models, Qwen Image is both flexible and accessible, making it a strong alternative to Stable Diffusion and Flux models. You can run it locally in ComfyUI with customization.

In this article, I will cover:

An overview of the Qwen Image model.
Sample images
Qwen Image on Google Colab
Qwen Image on ComfyUI (Standard)
Qwen Image on ComfyUI (Fast)

Table of Contents

The Qwen Image model
- Pros of Qwen Image
- Cons of Qwen Image
Software
Sample images
ComfyUI Colab Notebook
Qwen Image workflow (Fast)
Qwen Image workflow (Standard)

The Qwen Image model

So why should you consider using Qwen Image for your creative projects?

Qwen-Image has 20 billion parameters and uses a MMDiT (Multimodal Diffusion Transformer) architecture. The design goals are (1) Complex, multilangual text rendering, and (2) Strong alignment between the prompts and the generated images.

Pros of Qwen Image

Open-source: Freely available with transparent weights.
High image quality: Sharp, consistent, and diverse images.
Good at following prompts: Strong alignment between text input and images with multilingual support.

Cons of Qwen Image

Large model size – You’d better have a large hard drive and a good GPU card.
Smaller ecosystem – Unlike Stable Diffusion and Flux, many essential tools, like ControlNet, are not as mature.

Qwen Image also supports image editing and LoRA, but I will cover these features in the upcoming tutorials.

Software

We will use ComfyUI, a free AI image and video generator. You can use it on Windows, Mac, or Google Colab.

Think Diffusion provides an online ComfyUI service. They offer an extra 20% credit to our readers.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

Sample images

Here are some images generated by the Qwen Image 8-step Lightning workflow. (15 secs on an RTX 4090). I used the same prompt in the Wan 2.2 model post so that you can compare.

A medieval knight in ornate silver armor, polished and gleaming under radiant sunlight, riding a gigantic shimmering koi fish flying through the sky. The koi’s scales glow iridescent orange and gold, casting reflections across drifting lotus ponds suspended in the clouds. Red and golden paper lanterns float gently, their light flickering against a backdrop of endless pastel skies. Epic cinematic composition, ultra-detailed textures, dreamlike atmosphere, fantasy realism, 8K, volumetric lighting, high dynamic range.

A lively street soccer scene in a colorful urban neighborhood with pale blue and yellow buildings in mediterranean background. A young Iranian boy wearing a bright yellow and green soccer jersey with navy shorts, is skillfully juggling a soccer ball mid-air, his face full of concentration and energy. He wears black sneakers and white socks. Behind him, pastel-colored buildings with balconies, murals, and some graffiti. Parked cars line the sides of the street, but the road is open and vibrant with play. The atmosphere is dynamic and joyful, capturing the essence of childhood street soccer in a Latin or Southern European city. The lighting is warm and natural, suggesting late afternoon or early evening, with a cinematic depth of field that emphasizes the main boy in action while softly blurring the background buildings.

Portrait photograph of a young woman lying on her stomach on a tropical beach, wearing a white crochet bikini, gold bracelets and rings, and a delicate necklace, long brown hair loose over her shoulders. She rests on her forearms with legs bent upward, eyes looking at the viewer smile. The sand is light and fine. Some sand on her stomach. Turquoise waves roll gently in the background under a bright blue sky with scattered clouds. Midday sunlight, soft shadows, warm tones, high detail, sharp focus, natural skin texture, vibrant colors, shallow depth of field, bokeh, professional beach photography, shot on a 50mm lens, cinematic composition. It’s windy. realistic skin texture.

A lone samurai in crimson lacquered armor battling a towering origami dragon inside an ancient library. The dragon’s folded paper wings crackle with energy, scattering thousands of glowing calligraphy symbols into the air. Books fly like birds from towering bookshelves, pages tearing into stormy paper clouds that swirl above. Dramatic chiaroscuro lighting, ultra-detailed Japanese fantasy, cinematic still, hyperrealistic textures, 8K, intricate atmosphere.

ComfyUI Colab Notebook

If you don’t have a powerful GPU card, you can still run the Qwen Image model on Google Colab, Google’s cloud computing platform, with my ComfyUI notebook.

You don’t need to download the model as instructed below. Select the Qwen_Image model before running the notebook.

Qwen Image workflow (Fast)

This workflow uses Qwen Image Lightning LoRA to speed up the workflow. The LoRA converts the model to the distilled version.

Step 0: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up-to-date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

Restart ComfyUI.

Step 1: Load the workflow

Download the workflow below. Drop it into ComfyUI to load.

Download

Step 2: Install models

After loading the workflow JSON file, ComfyUI should prompt you to download the missing model files.

Here are the models you need to download:

Download qwen_image_fp8_e4m3fn.safetensors and put it in ComfyUI > models > diffusion_models.
Download qwen_2.5_vl_7b_fp8_scaled.safetensors and put it in ComfyUI > models > text_encoders.
Download qwen_image_vae.safetensors and put it in ComfyUI > models > vae.
Download Qwen-Image-Lightning-8steps-V2.0.safetensors and put it in ComfyUI > models > loras.

Step 3: Revise the prompt

Revise the prompt to match what you want to generate.

Step 4: Generate an image

Click the Run button to run the workflow.

Change the seed value of the KSampler node to generate a new image.

Qwen Image workflow (Standard)

This standard Qwen Image workflow does not use any speed-ups, so the image generation is a bit slower, but the image quality is slightly higher.

You can use the workflow the same way as described in the last section. Download the workflow below.

Download

7 comments

chicken says:

December 24, 2025 at 6:29 am

Thanks for another excellent article Andrew.
QI’s party pieces for me are:
1. its ability to correctly render more than one character, providing you relentlessly use names or labels rather than pronouns to refer to them.
2. its understanding of actions and human body with simple; “girlA threw the ball” actually appears as if she has indeed just thrown the ball rather then the subject and object not being aware of each other.
3. its speed on a 4060Ti 16GB. Distilled or Lighning Lora: ~90s, otherwise its ~240s.
Last note: If you sit back, think hard, then prompt QI like a similarly sized parameter model (gpt-oss:20B in my case) it can do some really amazing stuff.

Alexander says:

November 22, 2025 at 8:04 pm

Hi Andrew.
I have used your courses to great satisfaction and they are completely worth the membership. I am a visual artist who uses AI for poses and inspiration about colour palettes, much as I once did with photography but with less hassle. I am not tech savvy in coding matters, although I understand the general structure of generative AI models.
Now I find myself with a problem. I use ComfyUI with Stability Matrix on Mac Silicon, and have been happy with every model, including FluxDev (I have a lot of RAM/VRAM and a nice graphic card). But I am trying to test Qwen, and I hit a wall.
Every image I try to generate takes up to 30 minutes to finish and ends up black. There is no trace of visible denoising process like with all the other models.
I have searched the internet and found that black images are often a problem with ComfyUI and Qwen, but the possible fixes are beyond my understanding, as most refer to ComfyUI installed on its own and not through a manager, and the main environment is Windows.
Do you have suggestions?

1. Andrew says:
  
  November 24, 2025 at 9:44 am
  
  Not sure if Qwen on ComfyUI is well supported on Mac. 30 mins is definitely way too long. You can try the colab notebook if you are ok with using it on google.
  
David Rawlins says:

October 2, 2025 at 11:10 am

Thanks, Andrew. I agree Qwen is a great model and is particularly good at following the prompt and at text. There seems to be a big difference with or without the lightning lora (with your settings) – the style and level of detail changes a lot. I found the sampler and scheduler combinations also made a big difference with euler and res-multistep with sgm-uniform and beta working well. Some schedulers don’t work at all like ddim and kl-optimal, I guess because of the way Qwen works. So there’s a lot to experiment with. It’s quite fast on Colab L4 – the normal version takes 2.5 mins and the lightning version just over a minute. I used aspect ratios based on 1344×1344 as that seems to be recommended. My only criticism so far is that with a particular prompt, characters look very similar which is fine if you want consistency but it’s not easy to get variety.

1. Andrew says:
  
  October 4, 2025 at 11:45 am
  
  Thanks for sharing your experience!
  
  1. David Rawlins says:
    
    October 18, 2025 at 10:39 am
    
    An update to this. DDIM is in fact a very good sampler for Qwen but I hadn’t realised to pair it with the ddim-uniform scheduler – doh. I find this combination gives very good results with or without the lightning lora. Without the lora, I’ve found increasing the cfg to 4-5 gives better prompt adherence, though going above 5 causes increased contrast and/or artefacts.
    
    1. Andrew says:
      
      October 18, 2025 at 1:28 pm
      
      Thanks for sharing. DDIM is not that different from Euler/Euler a. I am curious if they work as well.