Log inSign up
TimDarcet
1,439 posts
user avatar
TimDarcet
@TimDarcet
codegen @ FAIR, prev. DINO stuff @ INRIA & FAIR
Joined March 2021
804
Following
4,531
Followers
  • Pinned
    user avatar
    TimDarcet
    @TimDarcet
    Apr 21, 2023
    1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.
    Image
    00:00
    125K
  • user avatar
    TimDarcet
    @TimDarcet
    Sep 29, 2023
    Vision transformers need registers! Or at least, it seems they 𝘸𝘢𝘯𝘵 some… ViTs have artifacts in attention maps. It’s due to the model using these patches as “registers”. Just add new tokens (“[reg]”): - no artifacts - interpretable attention maps 🦖 - improved performances!
    Image
    467K
  • user avatar
    TimDarcet
    @TimDarcet
    Jan 7, 2025
    Thanks python, very helpful
    Image
    893K
  • user avatar
    TimDarcet
    @TimDarcet
    Apr 2, 2025
    "Massive activations in LLMS" is the paper you need and that everyone should read
    Image
    Image
    Image
    user avatar
    Seunghyun Seo
    @SeunghyunSEO7
    Apr 2, 2025
    what happens in the residual stream of gemma3? l2 norm of activation explodes at the end of every transformer block after x=x+res. key architectural difference between gemma2 and 3 is softcapping vs qknorm. 1b is not even multimodal (fig reps gemma2-2b vs 3-1b). what's wrong?
    68K
  • user avatar
    TimDarcet
    @TimDarcet
    Feb 14, 2025
    Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
    Image
    161K
  • user avatar
    TimDarcet
    @TimDarcet
    Oct 27, 2023
    DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')
    Image
    72K
  • Is there a good reason we use softmax losses in contrastive learning, instead of just doing MSE? ie L = ||xi-xi'||² - lambda sum_k ||xi-xk'||² I'd guess the optimization dynamics are maybe friendlier, but does anyone have a good pointer? Both for CLIP and SSL btw
    110K
  • user avatar
    TimDarcet
    @TimDarcet
    Feb 14, 2025
    Also: yes, it's a JEPA. Yes, you hated on @ylecun , but he was right. Yes, as usual
    user avatar
    TimDarcet
    @TimDarcet
    Feb 14, 2025
    Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
    Image
    87K
  • user avatar
    TimDarcet
    @TimDarcet
    Aug 22, 2025
    Qq has anyone ever seen the best AI researcher and the best sion euw in the same room because if not guys I've got a theory
    Image
    Image
    34K
  • Funniest bug of my phd: model loses 1 point if pretrain and eval use different conda env The difference was libjpeg vs libjpeg-turbo iiuc the jpeg algo is not entirely standardized (wtf?) and libjpeg != libjpeg-turbo Tiny differences in decoding artifacts caused a 1 point drop!
    user avatar
    vik
    @vikhyatk
    Mar 17, 2025
    if you train a model exclusively on JPEG images, will performance drop on other image file formats?
    25K
  • user avatar
    TimDarcet
    @TimDarcet
    Jul 15, 2024
    Still not sure why the ML community adopted conda instead of plain old virtualenv
    59K
  • user avatar
    TimDarcet
    @TimDarcet
    Oct 19, 2024
    Alright actual serious post. Lingua := super simple codebase + torch.compile for speed --> clean, hackable, but still efficient *It can train a 7B >llama2 in 24h*. Crazy. If you got the gpus, not only can you train a good 7B, you can *iterate* on it. You can do *research*
    Image
    user avatar
    TimDarcet
    @TimDarcet
    Oct 18, 2024
    🚨 RELEASE ALERT ‼️ github.com/facebookresear… THIS CHANGES EVERYTHING $META just dropped a game-changing codebase! Now everyone can do LLM research! 😱 🧵10 best things people are already building with lingua 🔥👇
    49K
  • user avatar
    TimDarcet
    @TimDarcet
    Apr 22, 2025
    I did not realize people used frameworks for simple distributed trainings. Tip: for 80% of trainings you just need DDP, and it's trivial to setup For the rest go with fsdp (either pytorch fsdp2 or the single-file fsdp in the CAPI repo)
    user avatar
    Ben (no treats)
    @andersonbcdefg
    Apr 20, 2025
    wait. distributed training with pure pytorch is not that bad. why did we all collectively get gaslit into using accelerate...
    37K
  • user avatar
    TimDarcet
    @TimDarcet
    Feb 26, 2024
    Mistral's "Le Chat" logo is a design masterclass The two dots make a smol cat
    Image
    17K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement