Log inSign up
Jason Weston
584 posts
Image
user avatar
Jason Weston
@jaseweston
Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 151k+ citations | Current: Self-Improving & Co-Improving AI
NYC
jaseweston.com
Joined April 2008
896
Following
14.9K
Followers
  • Pinned
    user avatar
    Jason Weston
    @jaseweston
    May 1
    💎Autodata: an agentic data scientist to create high quality data✨ We introduce a method for building agents that create high-quality training & evaluation data. Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model
    Image
    43K
  • user avatar
    Jason Weston
    @jaseweston
    May 30, 2024
    🚨 Contextual Position Encoding (CoPE) 🚨 Context matters! CoPE is a new positional encoding method for transformers that takes into account *context*. - Can "count" distances per head dependent on need, e.g. i-th sentence or paragraph, words, verbs, etc. Not just tokens. -
    Image
    1.5M
  • user avatar
    Jason Weston
    @jaseweston
    Nov 21, 2023
    🚨 New paper! ​​🚨 We introduce System 2 Attention (S2A). - Soft attention in Transformers is susceptible to irrelevant/biased info - S2A uses LLM reasoning to generate what to attend to Improves factuality & objectivity, decreases sycophancy. arxiv.org/abs/2311.11829 🧵(1/5)
    Image
    553K
  • user avatar
    Jason Weston
    @jaseweston
    Jan 19, 2024
    🚨New paper!🚨 Self-Rewarding LMs - LM itself provides its own rewards on own generations via LLM-as-a-Judge during Iterative DPO - Reward modeling ability improves during training rather than staying fixed ...opens the door to superhuman feedback? arxiv.org/abs/2401.10020 🧵(1/5)
    Image
    466K
  • user avatar
    Jason Weston
    @jaseweston
    Apr 2, 2025
    🚨Multi-Token Attention🚨 📝: arxiv.org/abs/2504.00927 Attention is critical for LLMs, but its weights are computed by single query & key vectors, limiting capability. MTA combines query, key & head operations over multiple tokens, improving performance in terms of PPL, std
    Image
    98K
  • user avatar
    Jason Weston
    @jaseweston
    Sep 9, 2025
    🌀New Test-time scaling method 🌀 📝: arxiv.org/abs/2509.06870 - Use RL to train an LLM solution aggregator – Reasons, reviews, reconciles, and synthesizes a final solution -> Much better than existing techniques! - Simple new method. Strong results across 4 math benchmarks. 🧵1/5
    Image
    70K
  • user avatar
    Jason Weston
    @jaseweston
    Oct 15, 2024
    🚨New work: Thinking LLMs!🚨 - Introduces Thought Preference Optimization (TPO) - Trains LLMs to think & respond for *all* instruction following tasks, not just math -Gives gains on AlpacaEval (beats GPT-4 & Llama3-70b) & ArenaHard with an 8B model arxiv.org/abs/2410.10630 🧵1/4
    Image
    108K
  • user avatar
    Jason Weston
    @jaseweston
    Aug 14, 2023
    🚨New Paper 🚨 Self-Alignment with Instruction Backtranslation - New method auto-labels web text with instructions & curates high quality ones for FTing - Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst arxiv.org/abs/2308.06259 (1/4)🧵
    Image
    358K
  • user avatar
    Jason Weston
    @jaseweston
    Feb 6, 2024
    Our team in FAIR labs (at Meta) is hiring researchers (RE, RS & PostDoc)! DM if interested. We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Recent work: Self-Rewarding LMs: arxiv.org/abs/2401.10020 Pairwise Cringe Loss: arxiv.org/abs/2312.16682
    104K
  • user avatar
    Jason Weston
    @jaseweston
    Sep 21, 2023
    🚨New Paper🚨 Chain-of-Verification Reduces Hallucination in LLMs - Reduces longform hallucinations via LLM double-checking its own work with shortform questions - Important not to reattend to the original hallucinations or they get copied arxiv.org/abs/2309.11495 (1/4)🧵
    Image
    242K
  • user avatar
    Jason Weston
    @jaseweston
    Sep 27, 2024
    Today we are releasing code, models & data from the Self-Taught Evaluator paper, a method to train LLM judges with synthetic preference data. Better model trained with DPO: huggingface.co/facebook/Self-… Synthetic preference data: huggingface.co/datasets/faceb… Code for training and
    Image
    80K
  • user avatar
    Jason Weston
    @jaseweston
    Oct 24, 2023
    🚨 New paper! 🚨 We introduce Branch-Solve-Merge (BSM) reasoning in LLMs for: - Improving LLM-as-Evaluator: makes Llama 70B chat+BSM close to GPT4. GPT4+BSM is better than GPT4. - Constrained Story Generation: improves coherence & constraints satisfied. arxiv.org/abs/2310.15123
    Image
    122K
  • user avatar
    Jason Weston
    @jaseweston
    Nov 7, 2025
    Scaling Agent Learning via Experience Synthesis 📝: arxiv.org/abs/2511.03773 Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready
    Image
    113K
  • user avatar
    Jason Weston
    @jaseweston
    Jun 3, 2025
    🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to
    Image
    83K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement