Log inSign up
Xinlei Chen
125 posts
user avatar
Xinlei Chen
@endernewton
Multimodal understanding & generation @xAI
CA, US
xinleic.xyz
Joined July 2011
879
Following
3,536
Followers
  • user avatar
    Xinlei Chen
    @endernewton
    Dec 4, 2024
    I am looking for an intern to do research together next summer. Possible topics: representation learning, network architecture, and in general understanding what's going on :P. Please apply (metacareers.com/jobs/532549086…) and email me ([email protected]) if interested.
    46K
  • user avatar
    Xinlei Chen
    @endernewton
    Jun 27, 2025
    4th of July vibe wth you :P
    user avatar
    Elon Musk
    X
    @elonmusk
    Jun 27, 2025
    Grinding on @Grok all night with the @xAI team. Good progress. Will be called Grok 4. Release just after July 4th. Needs one more big run for a specialized coding model.
    12K
  • user avatar
    Xinlei Chen
    @endernewton
    Aug 7, 2025
    We are actively hiring for image/video understanding/generation, join us!
    user avatar
    Guodong Zhang
    @Guodzh
    Aug 6, 2025
    Join us for build next gen video gen and world model!!
    60K
  • user avatar
    Xinlei Chen
    @endernewton
    Jan 26, 2024
    Our serious look into diffusion models for representation learning. And NO — “diffusion” is just the cherry on the top, “denoising” (the “latent” noise) is the cake to take!
    user avatar
    AK
    @_akhaliq
    Jan 26, 2024
    Meta presents Deconstructing Denoising Diffusion Models for Self-Supervised Learning paper page: huggingface.co/papers/2401.14… examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to
    Image
    21K
  • user avatar
    Xinlei Chen
    @endernewton
    Nov 19, 2024
    Thanks @abursuc for sharing our work! Yes we find attention maps are almost* all you need from pre-trained ViTs. * Except when the data distribution shifts -- perhaps
    user avatar
    Andrei Bursuc
    @abursuc
    Nov 18, 2024
    Interesting work by @endernewton et al. studying how & what pretraining knowledge is transfered downstream. It seems that representations are less important than attention patterns that can guide students to learn good features from scratch w/ good perfs arxiv.org/abs/2411.09702
    Image
    76K
  • user avatar
    Xinlei Chen
    @endernewton
    Jul 8, 2024
    Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!
    user avatar
    Xiaolong Wang
    @xiaolonw
    Jul 8, 2024
    Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)
    Image
    26K
  • user avatar
    Xinlei Chen
    @endernewton
    Jun 14, 2024
    Great finding from my former intern Kien: The inductive bias of *locality* is actually not that fundamental as we previously thought. Transformers can work *better* in quality by just treating images as an ordered set of pixels.
    user avatar
    AK
    @_akhaliq
    Jun 14, 2024
    Meta announces An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision
    Image
    21K
  • user avatar
    Xinlei Chen
    @endernewton
    Aug 24, 2025
    Open source contribution from xAI!
    user avatar
    Elon Musk
    X
    @elonmusk
    Aug 23, 2025
    The @xAI Grok 2.5 model, which was our best model last year, is now open source. Grok 3 will be made open source in about 6 months. huggingface.co/xai-org/grok-2
    9.6K
  • user avatar
    Xinlei Chen
    @endernewton
    Jan 18, 2025
    I was involved in @tokenpilled65B 's project mid-way due to shared interest on visual tokenization. Didn't contribute hands-on, but this work shares some of the (negative) learnings I had when trying to scale tokenizers -- summarized for quick read.
    user avatar
    Philippe Hansen-Estruch
    @tokenpilled65B
    Jan 17, 2025
    Excited to share my work at Meta! We explore scaling tokenizers w/ ViT (ViTok) & found scaling tokenizers with DiT generation pipeline doesn’t boost performance for the current paradigm of auto-encoders! We develop SOTA tokenizers for images/videos. Thread for findings
    Image
    9.6K
  • user avatar
    Xinlei Chen
    @endernewton
    Feb 28, 2024
    Fascinating and insightful work from @_mingjiesun @liuzhuang1234, took a much deeper look at the "massive activations" inside LLMs, proposing hypothesis and verified them as "biases" for attention, and they can appear in ViTs too!
    user avatar
    Zhuang Liu
    @liuzhuang1234
    Feb 28, 2024
    LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)
    Image
    6.4K
  • user avatar
    Xinlei Chen
    @endernewton
    Jun 7, 2016
    End of an Era.
    user avatar
    americanair
    @AmericanAir
    Jun 6, 2016
    We're announcing new changes to our #AAdvantage program today. Learn more here: bit.ly/AADVUpdate2016
    Image
  • user avatar
    Xinlei Chen
    @endernewton
    Feb 7, 2025
    Replying to @_alex_kirillov_
    sounds fun!
    825
  • user avatar
    Xinlei Chen
    @endernewton
    Sep 26, 2024
    Replying to @liuzhuang1234
    c'est la vie
    1.8K
  • user avatar
    Xinlei Chen
    @endernewton
    Jun 16, 2020
    Little push on 3D indoor object detection, to be presented at 4PM today (Seattle time)
    user avatar
    AI at Meta
    Meta
    @AIatMeta
    Jun 16, 2020
    Today at #CVPR2020 4pm, we’re presenting ImVoteNet, a 2D-3D voting scheme for 3D object detection, that's specialized for RGB-D and pushes state of the art 3D object detection by 5.7 mean average precision. Read the paper here: research.fb.com/publications/i…

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement