Jason Weston (@jaseweston) / X

Jason Weston

584 posts

Jason Weston

@jaseweston

Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 151k+ citations | Current: Self-Improving & Co-Improving AI

NYC

Joined April 2008

Pinned
Jason Weston
@jaseweston
May 1
💎Autodata: an agentic data scientist to create high quality data✨ We introduce a method for building agents that create high-quality training & evaluation data. Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model
43K
Jason Weston
@jaseweston
May 30, 2024
🚨 Contextual Position Encoding (CoPE) 🚨 Context matters! CoPE is a new positional encoding method for transformers that takes into account *context*. - Can "count" distances per head dependent on need, e.g. i-th sentence or paragraph, words, verbs, etc. Not just tokens. -
1.5M
Jason Weston
@jaseweston
Nov 21, 2023
🚨 New paper! 🚨 We introduce System 2 Attention (S2A). - Soft attention in Transformers is susceptible to irrelevant/biased info - S2A uses LLM reasoning to generate what to attend to Improves factuality & objectivity, decreases sycophancy. arxiv.org/abs/2311.11829 🧵(1/5)
553K
Jason Weston
@jaseweston
Jan 19, 2024
🚨New paper!🚨 Self-Rewarding LMs - LM itself provides its own rewards on own generations via LLM-as-a-Judge during Iterative DPO - Reward modeling ability improves during training rather than staying fixed ...opens the door to superhuman feedback? arxiv.org/abs/2401.10020 🧵(1/5)
466K
Jason Weston
@jaseweston
Apr 2, 2025
🚨Multi-Token Attention🚨 📝: arxiv.org/abs/2504.00927 Attention is critical for LLMs, but its weights are computed by single query & key vectors, limiting capability. MTA combines query, key & head operations over multiple tokens, improving performance in terms of PPL, std
98K
Jason Weston
@jaseweston
Sep 9, 2025
🌀New Test-time scaling method 🌀 📝: arxiv.org/abs/2509.06870 - Use RL to train an LLM solution aggregator – Reasons, reviews, reconciles, and synthesizes a final solution -> Much better than existing techniques! - Simple new method. Strong results across 4 math benchmarks. 🧵1/5
70K
Jason Weston
@jaseweston
Oct 15, 2024
🚨New work: Thinking LLMs!🚨 - Introduces Thought Preference Optimization (TPO) - Trains LLMs to think & respond for *all* instruction following tasks, not just math -Gives gains on AlpacaEval (beats GPT-4 & Llama3-70b) & ArenaHard with an 8B model arxiv.org/abs/2410.10630 🧵1/4
108K
Jason Weston
@jaseweston
Aug 14, 2023
🚨New Paper 🚨 Self-Alignment with Instruction Backtranslation - New method auto-labels web text with instructions & curates high quality ones for FTing - Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst arxiv.org/abs/2308.06259 (1/4)🧵
358K
Jason Weston
@jaseweston
Feb 6, 2024
Our team in FAIR labs (at Meta) is hiring researchers (RE, RS & PostDoc)! DM if interested. We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Recent work: Self-Rewarding LMs: arxiv.org/abs/2401.10020 Pairwise Cringe Loss: arxiv.org/abs/2312.16682
104K
Jason Weston
@jaseweston
Sep 21, 2023
🚨New Paper🚨 Chain-of-Verification Reduces Hallucination in LLMs - Reduces longform hallucinations via LLM double-checking its own work with shortform questions - Important not to reattend to the original hallucinations or they get copied arxiv.org/abs/2309.11495 (1/4)🧵
242K
Jason Weston
@jaseweston
Sep 27, 2024
Today we are releasing code, models & data from the Self-Taught Evaluator paper, a method to train LLM judges with synthetic preference data. Better model trained with DPO: huggingface.co/facebook/Self-… Synthetic preference data: huggingface.co/datasets/faceb… Code for training and
80K
Jason Weston
@jaseweston
Oct 24, 2023
🚨 New paper! 🚨 We introduce Branch-Solve-Merge (BSM) reasoning in LLMs for: - Improving LLM-as-Evaluator: makes Llama 70B chat+BSM close to GPT4. GPT4+BSM is better than GPT4. - Constrained Story Generation: improves coherence & constraints satisfied. arxiv.org/abs/2310.15123
122K
Jason Weston
@jaseweston
Nov 7, 2025
Scaling Agent Learning via Experience Synthesis 📝: arxiv.org/abs/2511.03773 Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
113K
Jason Weston
@jaseweston
Jun 3, 2025
🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to
83K