TimDarcet (@TimDarcet) / X

TimDarcet

1,439 posts

TimDarcet

@TimDarcet

codegen @ FAIR, prev. DINO stuff @ INRIA & FAIR

Joined March 2021

Pinned
TimDarcet
@TimDarcet
Apr 21, 2023
1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.
00:00
125K
TimDarcet
@TimDarcet
Sep 29, 2023
Vision transformers need registers! Or at least, it seems they 𝘸𝘢𝘯𝘵 some… ViTs have artifacts in attention maps. It’s due to the model using these patches as “registers”. Just add new tokens (“[reg]”): - no artifacts - interpretable attention maps 🦖 - improved performances!
467K
TimDarcet
@TimDarcet
Jan 7, 2025
Thanks python, very helpful
893K
TimDarcet
@TimDarcet
Apr 2, 2025
"Massive activations in LLMS" is the paper you need and that everyone should read
Seunghyun Seo
@SeunghyunSEO7
Apr 2, 2025
what happens in the residual stream of gemma3? l2 norm of activation explodes at the end of every transformer block after x=x+res. key architectural difference between gemma2 and 3 is softcapping vs qknorm. 1b is not even multimodal (fig reps gemma2-2b vs 3-1b). what's wrong?
68K
TimDarcet
@TimDarcet
Feb 14, 2025
Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
161K
TimDarcet
@TimDarcet
Oct 27, 2023
DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')
72K
Is there a good reason we use softmax losses in contrastive learning, instead of just doing MSE? ie L = ||xi-xi'||² - lambda sum_k ||xi-xk'||² I'd guess the optimization dynamics are maybe friendlier, but does anyone have a good pointer? Both for CLIP and SSL btw
110K
TimDarcet
@TimDarcet
Feb 14, 2025
Also: yes, it's a JEPA. Yes, you hated on @ylecun , but he was right. Yes, as usual
TimDarcet
@TimDarcet
Feb 14, 2025
Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
87K
TimDarcet
@TimDarcet
Aug 22, 2025
Qq has anyone ever seen the best AI researcher and the best sion euw in the same room because if not guys I've got a theory
34K
Funniest bug of my phd: model loses 1 point if pretrain and eval use different conda env The difference was libjpeg vs libjpeg-turbo iiuc the jpeg algo is not entirely standardized (wtf?) and libjpeg != libjpeg-turbo Tiny differences in decoding artifacts caused a 1 point drop!
vik
@vikhyatk
Mar 17, 2025
if you train a model exclusively on JPEG images, will performance drop on other image file formats?
25K
TimDarcet
@TimDarcet
Jul 15, 2024
Still not sure why the ML community adopted conda instead of plain old virtualenv
59K
TimDarcet
@TimDarcet
Oct 19, 2024
Alright actual serious post. Lingua := super simple codebase + torch.compile for speed --> clean, hackable, but still efficient *It can train a 7B >llama2 in 24h*. Crazy. If you got the gpus, not only can you train a good 7B, you can *iterate* on it. You can do *research*
TimDarcet
@TimDarcet
Oct 18, 2024
🚨 RELEASE ALERT ‼️ github.com/facebookresear… THIS CHANGES EVERYTHING $META just dropped a game-changing codebase! Now everyone can do LLM research! 😱 🧵10 best things people are already building with lingua 🔥👇
49K
TimDarcet
@TimDarcet
Apr 22, 2025
I did not realize people used frameworks for simple distributed trainings. Tip: for 80% of trainings you just need DDP, and it's trivial to setup For the rest go with fsdp (either pytorch fsdp2 or the single-file fsdp in the CAPI repo)
Ben (no treats)
@andersonbcdefg
Apr 20, 2025
wait. distributed training with pure pytorch is not that bad. why did we all collectively get gaslit into using accelerate...
37K
TimDarcet
@TimDarcet
Feb 26, 2024
Mistral's "Le Chat" logo is a design masterclass The two dots make a smol cat
17K