Ross Taylor (@rosstaylor90) / X

Ross Taylor

2,790 posts

Ross Taylor

@rosstaylor90

Co-founder and CEO @GenReasoning. Previously lots of other things like: reasoning lead Meta AI, Llama 3/2, Galactica, Papers with Code.

▽²f = 0

Joined March 2012

Pinned
Ross Taylor
@rosstaylor90
Apr 23
The continual learning vs in-context learning debate is a sideshow from the fact we lack interesting training environments. If you only care about narrow things like SWE and automated AI research, then you create optimisation pressure towards narrow agentic capabilities. And
13K
Ross Taylor
@rosstaylor90
Nov 14, 2023
I am the first author of the Galactica paper and have been quiet about it for a year. Maybe I will write a blog post talking about what actually happened, but if you want the TLDR: 1. Galactica was a base model trained on scientific literature and modalities. 2. We approached
Sharon Goldman
@sharongoldman
Nov 14, 2023
One year ago — 2 weeks before @OpenAI released ChatGPT — @Meta released Galactica. The LLM was public for only 3 days, but its lessons led to decisions around Llama's release. Thanks to @jpineau1 for chatting w/ me and h/t to @ylecun Read here: ⏬ venturebeat.com/ai/what-meta-l…
961K
Ross Taylor
@rosstaylor90
Apr 16, 2024
I left Meta yesterday. Nothing but positive things to say: FAIR and GenAI are great places to do research and engineering. Will miss my colleagues! LLMs have shown how magical deep learning can be in a data-rich regime. But many domains remain data-constrained, which prevents
161K
Ross Taylor
@rosstaylor90
Sep 17, 2025
Supplementary information for the new DeepSeek R1 Nature paper is very interesting! Details on training data, hyperparameters, base model importance, and more.
182K
Ross Taylor
@rosstaylor90
Jan 20, 2025
Last tweet on this but the way @deepseek_ai does launches is beautiful: no hype, arrogance or vague-posting: just sharing something great with the world. US tech companies look cringe in comparison.
46K
Ross Taylor
@rosstaylor90
Jan 26, 2025
Replying to @goodside
This is so much fun
33K
Ross Taylor
@rosstaylor90
Aug 24, 2025
Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs
112K
Ross Taylor
@rosstaylor90
Feb 4, 2025
No one is saying RL didn’t work for reasoning. The argument is about internal reasoning emergence, not absolute performance boosts with RL. Quite the opposite in fact - we had PPO on Llama 2 base models 2 years ago with verifiable rewards and had 90%+ on GSM8k. We already knew
Ricardo Olmedo
@rdolmedo_
Feb 3, 2025
Does reinforcement learning with verifiable rewards work only for recent model families? It turns out that GRPO also works very well for Llama 2 7B, with an impressive +15 accuracy point increase in GSM8K. GRPO over GSM8K train. No bells and whistles. It just works.
134K
Ross Taylor
@rosstaylor90
Jul 16, 2025
It’s funny that people on this site think major LLM efforts are talent-bound rather than org-bound. The talent differential has never been big between major orgs. Most of the difference in outcomes is due to organisational factors - like allocating compute to the right bets, and
75K
Ross Taylor
@rosstaylor90
Apr 8, 2025
Maybe OpenAI had a point with “high taste testers”. I didn’t like the phrase initially because it felt a little elitist. But maybe I can reconcile with it by treating “high taste” as folks who care more about the outputs they are getting, and scrutinise them more carefully. In
60K
Ross Taylor
@rosstaylor90
Dec 7, 2023
Why are LLMs bad at reasoning? One theory says this is due to weaknesses in maximum likelihood, where the probability mass “overgeneralises” to low quality solutions. Because our pretraining objective (likelihood) doesn’t transfer to our evaluation objective (accuracy), the
124K
Ross Taylor
@rosstaylor90
Oct 21, 2023
Here’s my ICML talk on teaching LLMs to reason (video and slides). Like everyone else, I can’t talk about what I’m working on right now, but I tried to provide a useful overview of the history of LLMs and reasoning, current areas of focus, and potential directions. Enjoy!
105K
Ross Taylor
@rosstaylor90
Aug 3, 2025
Replying to @gabriel1
General things that improved my health/productivity/outcomes: - Having dinner much earlier - no later than 6pm - Replacing sugary snacks with healthy alternatives - eg almonds, dark chocolate, blueberries. - Not using laptops / programming in bed - Expanding my friend circle
37K
Ross Taylor
@rosstaylor90
Apr 13, 2025
Distillation outperforms RL recipes at smaller scales. One simple way to think about this is a large model is a better way to discover good data than a smaller one. A point made in original DeepSeek-R1 paper, but missed by many in the wake of that release in the GRPO wave. That
Wenhu Chen
@WenhuChen
Apr 12, 2025
arxiv.org/pdf/2504.07086 is quite interesting. It standardizes the evaluation of all the existing math reasoning models and re-evaluate these models. Takeaway 1: Most RL-trained variants of the DeepSeek R1-Distill model do not yield meaningful performance improvements (except
42K