Dimitri von Rütte (@dvruette) / X

Dimitri von Rütte

1,494 posts

Dimitri von Rütte

@dvruette

AI/ML research. prev. PhD @ETH_en, ML engineer @DeepJudgeAI

Joined January 2023

Pinned
Dimitri von Rütte
@dvruette
Feb 27
there, I said it. diffusion LLMs are the future! I'll be back in a couple of years to collect my "I told you so" award.
215K
Dimitri von Rütte
@dvruette
Mar 10, 2025
🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12
GIF
144K
Dimitri von Rütte
@dvruette
Aug 5, 2025
gpt-oss is probably the most standard MoE transformer that ever was. Couple of details worth noting: - Uses attention sinks (a.k.a. registers) - Sliding window attention in every second layer - YaRN context window extension - RMSNorm without biases - No QK norm, no attn. softcap
65K
Dimitri von Rütte
@dvruette
Jul 20, 2023
🚨📜 Announcing FABRIC, a training-free method for using iterative feedback to improve the results of any Stable Diffusion model. Instead of spending hours to find the right prompt, just click 👍/👎 to tell the model what exactly you want. 🤗 Demo: huggingface.co/spaces/dvruett…
00:00
123K
Dimitri von Rütte
@dvruette
Aug 8, 2025
I feel like this completely flew under the radar despite being a huge deal for discrete diffusion models: DremOn is a 7B dLLM that can do variable length generation, solving something that has been a huge challenge! The idea is clever: Let's just randomly insert <|delete|>
31K
Dimitri von Rütte
@dvruette
Apr 15, 2023
🚨 OpenAssistant has just been released! Dataset and trained models with near-ChatGPT quality are available for download to everyone. You can even try out our biggest model (based on LLaMA-30B) through a chat interface in your browser right now! open-assistant.io/chat
99K
Dimitri von Rütte
@dvruette
Aug 5, 2025
don't do bf16 kids, it's not worth the pain
35K
Dimitri von Rütte
@dvruette
Jul 23, 2023
✨ FABRIC plugin for SD WebUI is now available in alpha for testing. Check it out and let us know what you think! github.com/dvruette/sd-we… Also make sure to share your creations! We're excited to see what you talented folks out there can create with it ❤️
00:00
18K
Dimitri von Rütte
@dvruette
Dec 27, 2023
i've always wanted to do this... BarrelRec trains 10x FASTER than conventional QKV attention!! 🤯🤯🚀
32K
Dimitri von Rütte
@dvruette
Feb 23, 2024
🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 arxiv.org/abs/2402.14433
15K
Dimitri von Rütte
@dvruette
Jan 15, 2025
Weight decay is truly evil. looks worse for the first 400k steps and then suddenly overtakes on the home stretch.. But I can't even be mad, we've been warned about exactly this
14K
Dimitri von Rütte
@dvruette
Aug 7, 2025
Now that gpt-oss has made attention sinks all the rage again, I can't help but wonder why nobody is using attention bias, seemingly a strictly superior solution? Minimal overhead, no awkward extra tokens, easy to implement.
7.9K
Dimitri von Rütte
@dvruette
Oct 17, 2024
Cold take: Diffusion models are just hierarchical VAEs with a fixed, pre-defined encoder.
15K
Dimitri von Rütte
@dvruette
Aug 6, 2025
TIL that computing the median on GPU is really fast but excruciatingly slow on TPU. the more you know!
5.9K