Log inSign up
Ryan Marten
594 posts
user avatar
Ryan Marten
@ryanmart3n
Building @harborframework and @terminalbench with @alexgshaw
San Francisco
ryanmarten.com
Joined June 2022
1,987
Following
2,130
Followers
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
    Image
    201K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jan 28, 2025
    Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open. Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.
    Image
    38K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Paper: arxiv.org/abs/2506.04178 Model: huggingface.co/open-thoughts/… Dataset: huggingface.co/datasets/open-… Code: github.com/open-thoughts/… Blog: openthoughts.ai/blog/ot3 (10/N)
    Image
    10K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Apr 7, 2025
    OpenThoughts2 is the #1 trending dataset on 🤗
    Image
    25K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jan 26, 2025
    Replying to @teknium
    They do! You just aren't pushing it hard it enough ;) @trungthvu went beast mode on it with Curator
    Image
    Image
    22K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Highlight 1. Sampling multiple answers for the same question from a teacher model is a surprisingly effective way to increase the dataset size. Would it be better to have 30k questions, each answered once, or 10k questions, each answered 3 times independently? Surprisingly, with
    Image
    6K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Our model surpasses similar scale models from industry labs, such as Nvidia, Hugging Face, and GPT-4.1, among others. We achieve SOTA on held out evals, demonstrating strong generalization. OpenThoughts3-1.2M is built through 1,000 ablation experiments. (2/N)
    Image
    15K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Thank you to the whole OpenThoughts team for yet another great effort! @etash_guha, @ryanmart3n, @sedrickkeh2, @NeginRaoof_, @GeorgeSmyrnis1, @hbXNov, @marnezhurina, @MercatJean, @trungthvu, @ZayneSprague, @suvarna_ashima, @FeuerBenjamin, @cliangyu_, @codezakh, @esfrankel,
    3.8K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Speaking about OpenThoughts3 (hot off the presses!!) and how you can use our reasoning data recipe lessons to train your own specialized reasoning models. 12:15pm @aiDotEngineer (talk will also be recorded)
    Image
    3.1K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Highlight 2. Models with better performance are not necessarily better teachers. QwQ-32B is a stronger teacher than DeepSeek-R1, although it scores lower on target reasoning benchmarks. (4/N)
    Image
    5.2K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Highlight 5. Question Filtering did work. Filtering questions by LLM labeled difficulty or LLM response length yields better results than filters typical to pre-training data curation that use embeddings or fastText. (7/N)
    Image
    3.6K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Our dataset also works for post-training Llama! OpenThoughts3 is versatile and works across multiple base models. We train Llama-3.1-8B-Instruct on 100k samples of OpenThoughts3-1.2M, and we see similar or even larger gains on downstream evals. (9/N)
    Image
    3.9K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    OpenThoughts3 consists of 850k math, 250K code, and 100K science questions with reasoning traces from QwQ-32B. All completely open! (8/N)
    Image
    3.6K
  • user avatar
    Ryan Marten
    @ryanmart3n
    Jun 5, 2025
    Replying to @ryanmart3n
    Highlight 3. Answer Filtering didn’t work. We experimented with numerous verification and answer filtering methods, and none gave significant performance improvements. (5/N)
    Image
    3.8K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement