Introducing Ideogram 4.0: the best open image model in the world.
Think it. Make it. Own it.
Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.
Ideogram V4 Trainer is now live on fal!
Fine-tune Ideogram V4 on your own images with LoRA
Teach it a custom style or character while keeping Ideogram's clean text rendering
Train, then generate straight away with the matching LoRA inference endpoint
Our lab built the highest-quality quantization for running Ideogram 4 on consumer GPUs. Our Q4_K build outperforms the standard NF4 baseline in both image and text quality at the exact same 10.4 GB size, while our INT8 matches the uncompressed FP8 ceiling. 🧵👇 @ideogram_ai
New model. New brand. The start of a new era for Ideogram.
Ideogram 4.0 launched with open weights, and we felt it was the right time to rethink our brand as a whole. Thank you How&How for bringing our new vision to life.
What do you think of our new logo?
The best way to bring the composition from your head into an image → Ideogram V4 + drawing bounding boxes in Comfy.
The control here is quite unique. The model uses structured JSON so drawing bounding boxes to get the exact placement works very well.
The model only needs 12
Introducing Ideogram 4.0: the best open image model in the world.
Think it. Make it. Own it.
Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.
The result is the best open-weight image model available, closing the gap with closed-source foundation models.
This is a 9.3B model — far from scaling limits. We expect further gains as we scale.
Ideogram 4.0 is designed to take structured JSON as input rather than unstructured text.
To make that easy, we've built a prompt-enhancement model that turns natural language into our JSON format. It's free in our API, and we've open-sourced a system prompt so you can get the
As the text encoder we use Qwen3-VL-8B, a vision-language model — so the same encoder can also take images for editing, not just text.
Our DiT consumes hidden states from 13 intermediate layers concatenated along the feature dimension, instead of a single hidden state.