Ryan Marten (@ryanmart3n) / X

Ryan Marten

594 posts

Ryan Marten

@ryanmart3n

Building @harborframework and @terminalbench with @alexgshaw

San Francisco

Joined June 2022

Ryan Marten
@ryanmart3n
Jun 5, 2025
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
201K
Ryan Marten
@ryanmart3n
Jan 28, 2025
Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open. Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.
38K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Paper: arxiv.org/abs/2506.04178 Model: huggingface.co/open-thoughts/… Dataset: huggingface.co/datasets/open-… Code: github.com/open-thoughts/… Blog: openthoughts.ai/blog/ot3 (10/N)
10K
Ryan Marten
@ryanmart3n
Apr 7, 2025
OpenThoughts2 is the #1 trending dataset on 🤗
25K
Ryan Marten
@ryanmart3n
Jan 26, 2025
Replying to @teknium
They do! You just aren't pushing it hard it enough ;) @trungthvu went beast mode on it with Curator
22K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Highlight 1. Sampling multiple answers for the same question from a teacher model is a surprisingly effective way to increase the dataset size. Would it be better to have 30k questions, each answered once, or 10k questions, each answered 3 times independently? Surprisingly, with
6K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Our model surpasses similar scale models from industry labs, such as Nvidia, Hugging Face, and GPT-4.1, among others. We achieve SOTA on held out evals, demonstrating strong generalization. OpenThoughts3-1.2M is built through 1,000 ablation experiments. (2/N)
15K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Thank you to the whole OpenThoughts team for yet another great effort! @etash_guha, @ryanmart3n, @sedrickkeh2, @NeginRaoof_, @GeorgeSmyrnis1, @hbXNov, @marnezhurina, @MercatJean, @trungthvu, @ZayneSprague, @suvarna_ashima, @FeuerBenjamin, @cliangyu_, @codezakh, @esfrankel,
3.8K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Speaking about OpenThoughts3 (hot off the presses!!) and how you can use our reasoning data recipe lessons to train your own specialized reasoning models. 12:15pm @aiDotEngineer (talk will also be recorded)
3.1K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Highlight 2. Models with better performance are not necessarily better teachers. QwQ-32B is a stronger teacher than DeepSeek-R1, although it scores lower on target reasoning benchmarks. (4/N)
5.2K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Highlight 5. Question Filtering did work. Filtering questions by LLM labeled difficulty or LLM response length yields better results than filters typical to pre-training data curation that use embeddings or fastText. (7/N)
3.6K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Our dataset also works for post-training Llama! OpenThoughts3 is versatile and works across multiple base models. We train Llama-3.1-8B-Instruct on 100k samples of OpenThoughts3-1.2M, and we see similar or even larger gains on downstream evals. (9/N)
3.9K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
OpenThoughts3 consists of 850k math, 250K code, and 100K science questions with reasoning traces from QwQ-32B. All completely open! (8/N)
3.6K
Ryan Marten
@ryanmart3n
Jun 5, 2025
Replying to @ryanmart3n
Highlight 3. Answer Filtering didn’t work. We experimented with numerous verification and answer filtering methods, and none gave significant performance improvements. (5/N)
3.8K