sway (@SwayStar123) / X

sway

1,647 posts

sway

@SwayStar123

Vegan btw, working on diffusion models 日本語できる, 中文学习中 Building languagespeedrun.com, working at @LumaLabsAI

Joined July 2015

Pinned
sway
@SwayStar123
Dec 16, 2025
Speedrunning ImageNet Diffusion - 360x faster training There have been many new techniques demonstrating convergence speedups compared to DiT in the past few years, however all of these have been studied in isolation, against increasingly outdated baselines. I present SR-DiT
82K
sway
@SwayStar123
Sep 9, 2025
You can implement this paper with 2 lines of code cfm_target = torch.roll(flow_target, shifts=1, dims=0) cfm_loss = -((model_output - cfm_target) ** 2).mean() * λ (Official impl is 60 lines btw)
Papers of the day
@ArxivToday
Jun 6, 2025
New paper: Contrastive Flow Matching - a simple trick that makes diffusion models generate better images with 5x fewer steps and train 9x faster. No extra compute needed, just smarter training. 🧵
98K
sway
@SwayStar123
Oct 22, 2025
Another one 4 different papers coincidentally discovered the same thing at the same time
sway
@SwayStar123
Oct 20, 2025
Third paper to do this now lol "LATENT DIFFUSION MODEL WITHOUT VARIATIONAL AUTOENCODER" Using dino features and a residual connection to make a stronger decoder, and diffuse in dino feature space
95K
sway
@SwayStar123
Jun 3, 2025
TLDR, you can speed up training by initializing some layers with dinov2 layers instead of random weights
32K
sway
@SwayStar123
May 16, 2025
Diffusion model trained for $300 of compute Maybe SOTA with $10k is possible
32K
sway
@SwayStar123
Oct 20, 2025
Third paper to do this now lol "LATENT DIFFUSION MODEL WITHOUT VARIATIONAL AUTOENCODER" Using dino features and a residual connection to make a stronger decoder, and diffuse in dino feature space
95K
sway
@SwayStar123
May 2, 2025
Replying to @wmhuo168
is this like the quadrillion petabyte internet cables that come out every other year that we're never gonna get
14K
sway
@SwayStar123
Aug 1, 2025
Replying to @gabriel1
just started straightening my back whenever i noticed it, only took a few months for it to be the default
14K
sway
@SwayStar123
Aug 4, 2025
Paper by bytedance, improves upon Meanflow by removing the need for JVP calculation
15K
sway
@SwayStar123
Oct 11, 2025
SOTA FID on imagenet is now an AR model!
21K
sway
@SwayStar123
Nov 18, 2025
researchers really wasting no time these days
23K
sway
@SwayStar123
Oct 6, 2025
Near REPA performance without any external model alignment: Paper name: No alignment needed for generation: Learning linearly seperable representations in diffusion models
7.6K
sway
@SwayStar123
Sep 8, 2025
This is missing ALOT of what ive done in the past but im gonna start logging my diffusion experiments in a blog post now too
swaystar123.github.io
My Diffusion Experiments
Here I note down some experiments I tried with diffusion models in the past or that I plan on conducting (if i get the compute for it)
8.8K
sway
@SwayStar123
Apr 7, 2025
Replying to @rubyrangerr and @ID_AA_Carmack
Yes ofcourse, we must make everything more inefficient for the sake of more jobs!! Stop using alarm clocks, you are taking away the job of knocker uppers!!!
2.4K