Xinlei Chen (@endernewton) / X

Xinlei Chen

125 posts

Xinlei Chen

@endernewton

Multimodal understanding & generation @xAI

CA, US

Joined July 2011

Xinlei Chen
@endernewton
Dec 4, 2024
I am looking for an intern to do research together next summer. Possible topics: representation learning, network architecture, and in general understanding what's going on :P. Please apply (metacareers.com/jobs/532549086…) and email me ([email protected]) if interested.
46K
Xinlei Chen
@endernewton
Jun 27, 2025
4th of July vibe wth you :P
Elon Musk
@elonmusk
Jun 27, 2025
Grinding on @Grok all night with the @xAI team. Good progress. Will be called Grok 4. Release just after July 4th. Needs one more big run for a specialized coding model.
12K
Xinlei Chen
@endernewton
Aug 7, 2025
We are actively hiring for image/video understanding/generation, join us!
Guodong Zhang
@Guodzh
Aug 6, 2025
Join us for build next gen video gen and world model!!
60K
Xinlei Chen
@endernewton
Jan 26, 2024
Our serious look into diffusion models for representation learning. And NO — “diffusion” is just the cherry on the top, “denoising” (the “latent” noise) is the cake to take!
AK
@_akhaliq
Jan 26, 2024
Meta presents Deconstructing Denoising Diffusion Models for Self-Supervised Learning paper page: huggingface.co/papers/2401.14… examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to
21K
Xinlei Chen
@endernewton
Nov 19, 2024
Thanks @abursuc for sharing our work! Yes we find attention maps are almost* all you need from pre-trained ViTs. * Except when the data distribution shifts -- perhaps
Andrei Bursuc
@abursuc
Nov 18, 2024
Interesting work by @endernewton et al. studying how & what pretraining knowledge is transfered downstream. It seems that representations are less important than attention patterns that can guide students to learn good features from scratch w/ good perfs arxiv.org/abs/2411.09702
76K
Xinlei Chen
@endernewton
Jul 8, 2024
Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!
Xiaolong Wang
@xiaolonw
Jul 8, 2024
Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)
26K
Xinlei Chen
@endernewton
Jun 14, 2024
Great finding from my former intern Kien: The inductive bias of *locality* is actually not that fundamental as we previously thought. Transformers can work *better* in quality by just treating images as an ordered set of pixels.
AK
@_akhaliq
Jun 14, 2024
Meta announces An Image is Worth More Than 16x16 Patches Exploring Transformers on Individual Pixels This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision
21K
Xinlei Chen
@endernewton
Aug 24, 2025
Open source contribution from xAI!
Elon Musk
@elonmusk
Aug 23, 2025
The @xAI Grok 2.5 model, which was our best model last year, is now open source. Grok 3 will be made open source in about 6 months. huggingface.co/xai-org/grok-2
9.6K
Xinlei Chen
@endernewton
Jan 18, 2025
I was involved in @tokenpilled65B 's project mid-way due to shared interest on visual tokenization. Didn't contribute hands-on, but this work shares some of the (negative) learnings I had when trying to scale tokenizers -- summarized for quick read.
Philippe Hansen-Estruch
@tokenpilled65B
Jan 17, 2025
Excited to share my work at Meta! We explore scaling tokenizers w/ ViT (ViTok) & found scaling tokenizers with DiT generation pipeline doesn’t boost performance for the current paradigm of auto-encoders! We develop SOTA tokenizers for images/videos. Thread for findings
9.6K
Xinlei Chen
@endernewton
Feb 28, 2024
Fascinating and insightful work from @_mingjiesun @liuzhuang1234, took a much deeper look at the "massive activations" inside LLMs, proposing hypothesis and verified them as "biases" for attention, and they can appear in ViTs too!
Zhuang Liu
@liuzhuang1234
Feb 28, 2024
LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)
6.4K
Xinlei Chen
@endernewton
Jun 7, 2016
End of an Era.
americanair
@AmericanAir
Jun 6, 2016
We're announcing new changes to our #AAdvantage program today. Learn more here: bit.ly/AADVUpdate2016
Xinlei Chen
@endernewton
Feb 7, 2025
Replying to @_alex_kirillov_
sounds fun!
825
Xinlei Chen
@endernewton
Sep 26, 2024
Replying to @liuzhuang1234
c'est la vie
1.8K
Xinlei Chen
@endernewton
Jun 16, 2020
Little push on 3D indoor object detection, to be presented at 4PM today (Seattle time)
AI at Meta
@AIatMeta
Jun 16, 2020
Today at #CVPR2020 4pm, we’re presenting ImVoteNet, a 2D-3D voting scheme for 3D object detection, that's specialized for RGB-D and pushes state of the art 3D object detection by 5.7 mean average precision. Read the paper here: research.fb.com/publications/i…