Log inSign up
Alex Wettig
268 posts
Image
user avatar
Alex Wettig
@_awettig
composer-ing @cursor_ai
sf
cs.princeton.edu/~awettig/
Joined July 2022
705
Following
1,960
Followers
  • user avatar
    Alex Wettig
    @_awettig
    Feb 18, 2025
    🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N
    Image
    50K
  • user avatar
    Alex Wettig
    @_awettig
    Oct 4, 2024
    How to train long-context LMs? (and beat Llama-3.1 🏆) Many takeaways from our new paper! - Focus on diverse & reliable evaluations (not just perplexity) - Find good sources of long data and high-quality short data - ... A 🧵 on how we produced ProLong, a SoTA 8B 512K model
    Image
    21K
  • user avatar
    Alex Wettig
    @_awettig
    Feb 16, 2024
    **QuRating**: We get 4 quality signals from GPT-3.5 for selecting LM training data We select 30B out of 260B tokens and train 1.3B LMs from scratch. We find that QuRating can improve perplexity and ICL performance ✅ (w/ Aatmik Gupta, Sauma Malik, @danqi_chen)
    Image
    13K
  • user avatar
    Alex Wettig
    @_awettig
    Jul 25, 2024
    Stop by the QuRating *spotlight* poster this afternoon to chat about data quality for LMs ⏰: 1:30-3pm CET /📍: Hall C 4-9 #617
    Image
    6.7K
  • user avatar
    Alex Wettig
    @_awettig
    Jul 16, 2025
    Presenting two posters at ICML over the next two days: - Both at 11am - 1:30pm - Both about how to improve pre-training with domains - Both at stall # E-2600 in East Exhibition Hall A-B (!) Tomorrow: WebOrganizer w/ @soldni & @kylelostat Thursday: MeCo by @gaotianyu1350
    Image
    12K
  • user avatar
    Alex Wettig
    @_awettig
    Apr 2, 2024
    Stay tuned for our pre-print next week with lots of insights on how to build good SWE agents 🕵️‍♂️
    user avatar
    John Yang
    @jyangballin
    Apr 2, 2024
    SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code github.com/princeton-nlp/…
    Image
    4.4K
  • user avatar
    Alex Wettig
    @_awettig
    Jul 22, 2024
    Simple strategy: (1) keep pre-training with HQ mix of long & short documents (2) quick instruction-tuning with ONLY short UltraChat We find: avg. performance on our long-context evals keeps improving with increasing continual pre-training budgets
    Image
    Image
    user avatar
    Tianyu Gao
    @gaotianyu1350
    Jul 22, 2024
    Meet ProLong, a Llama-3 based long-context chat model! huggingface.co/princeton-nlp/… (64K here, 512K coming soon) ProLong uses a simple recipe (short/long pre-training data + short UltraChat, no synthetic instructions) and achieves top performance on a series of long-context tasks.
    4.9K
  • user avatar
    Alex Wettig
    @_awettig
    Sep 4, 2024
    Check out the paper for the most comprehensive guide to MoE pre-training! 🧭 It's been amazing to witness the push for open LMs by @Muennighoff and @allen_ai -- The transparency is off the charts: Every ablation figure has a link to the full results on wandb in the caption
    user avatar
    Niklas Muennighoff
    @Muennighoff
    Sep 4, 2024
    Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜arxiv.org/abs/2409.02060 🧵1/9
    Image
    2K
  • user avatar
    Alex Wettig
    @_awettig
    Jun 5, 2023
    New paper where we train transformer-style models and then debug them as python programs!
    user avatar
    Dan Friedman
    @danfriedman0
    Jun 5, 2023
    Learning Transformer Programs We designed a modified Transformer that can be trained to solve a task and then automatically converted into a discrete, human-readable program. With @_awettig and @danqi_chen. Paper: arxiv.org/abs/2306.01128 Code: github.com/princeton-nlp/… [1/12]
    Image
    2.7K
  • user avatar
    Alex Wettig
    @_awettig
    Aug 13, 2024
    SWE-bench x OpenAI 👀
    user avatar
    OpenAI
    @OpenAI
    Aug 13, 2024
    We're releasing a new iteration of SWE-bench, in collaboration with the original authors, to more reliably evaluate AI models on their ability to solve real-world software issues. openai.com/index/introduc…
    1.8K
  • user avatar
    Alex Wettig
    @_awettig
    May 7, 2025
    Big arrow time! We can make huge progress on open-source SWE agents by scaling up the creation of virtual coding environments 🚀
    user avatar
    John Yang
    @jyangballin
    May 7, 2025
    40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
    Image
    2.9K
  • user avatar
    Alex Wettig
    @_awettig
    Jun 23, 2025
    New paper cutting through the thicket of KV cache eviction methods!
    user avatar
    Adithya Bhaskar
    @AdithyaNLP
    Jun 23, 2025
    There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7
    Image
    1.2K
  • user avatar
    Alex Wettig
    @_awettig
    Oct 15, 2024
    Replying to @kellerjordan0
    I would be a little bit more generous towards Tim's intentions and not over-index on this specific aspect of this response.. clearly the bigger challenge is: how can we better extrapolate what works at scale from smaller scale experiments
    1K
  • user avatar
    Alex Wettig
    @_awettig
    Feb 16, 2024
    Replying to @_awettig
    📜Check out the paper for extensive analysis of the quality ratings, including a discussion of social biases and the wider implications of data selection:
    arXiv logo
    arxiv.org
    QuRating: Selecting High-Quality Data for Training Language Models
    Selecting high-quality pre-training data is important for creating capable language models, but existing methods rely on simple heuristics. We introduce QuRating, a method for selecting...
    425

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement