Jay (@jayendra

Jay

1,281 posts

Jay

@jayendra_ram

founder @hud_evals, prev cs+physics @columbia, @ycombinator. inside the sim2real gap

hud.ai

Joined September 2022

Pinned
Jay
@jayendra_ram
Aug 26, 2025
Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run: This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:
00:00
152K
Jay
@jayendra_ram
Jul 11, 2024
Replying to @ShitpostGate
This is because 1 mile = 1.60934 km which is very close to the golden ratio. Very cool!
37K
Jay
@jayendra_ram
Dec 1, 2024
Replying to @mr_samosaman
This is some summer-before-freshman-year ass advice
61K
Jay
@jayendra_ram
Jun 11, 2025
6 weeks ago I got a haircut and my barber told me his app idea. I said it was a great idea but I didn’t have time to build it but he should get windsurf and vibe code it. I came in to get a fade and he showed me his app and it was sick! I invested 10k on the spot!
36K
Jay
@jayendra_ram
Jan 24, 2025
I've been working with a small team to evaluate agentic models for computer use agents. Today, we're thrilled to introduce Autonomy, our comprehensive eval for AI agents. We aim to create an eval that's rigorous, tests agency, and moves toward general intelligence. 1/ 🧵
116K
Jay
@jayendra_ram
Aug 26, 2025
This may be the blueprint for AI apps moving forward: 1) Make ChatGPT/Claude wrapper that users love. 2) Collect production traces and create evals 3) SFT an OSS model on the traces and RL on the evals to get parity with ChatGPT/Claude. Similar quality and lower costs.
30K
Jay
@jayendra_ram
May 11, 2025
Replying to @khoomeik
bro leaves OAI and then leaks alpha on the TL that I would be afraid to tell my mom
9.2K
Jay
@jayendra_ram
Aug 13, 2025
I usually don’t talk about RL envs on the tl out of respect for our customers but this take won’t age well. Making problems that provide signal for models to get better is pretty hard, and is only going to get harder every year as models improve. The notion that you can vibe
Rohan Pandey
@khoomeik
Aug 13, 2025
the only RL envs frontier labs will continue buying in the medium term are the idiosyncratic high value ones and if you’re building a high value env, why not just do the RL in house, deploy vertically, and capture orders of magnitude more value?
41K
Jay
@jayendra_ram
Aug 27, 2025
You can't make an RL env out of is things that require human behavior inside of the env. Ex: you can't have an accurate RL environment that simulates a Twitch streamer interacting with their fans, because that requires accurate simulation of the human utility function.
22K
Jay
@jayendra_ram
Aug 23, 2025
Since RL environments are becoming a lot more mainstream it probably makes sense to explain them for people who see it vague posted incessantly on the TL. An RL environment is the "world" or "problem space" in which a reinforcement learning (RL) agent operates in order to learn.
jason
@jxnlco
Aug 23, 2025
Can someone explain to me what an RL environment is.
20K
Jay
@jayendra_ram
Nov 28, 2024
Replying to @saikatc
The fact that people can be debanked for doing anything that’s not against the law is the absurd part. I’ve heard so many stories of founders / non-establishment types getting maligned by the banking system. It may be “anecdata” but it’s def happening.
8.1K
Jay
@jayendra_ram
May 27, 2025
Over the last few months, the team at @hud_evals has made a lot of evals and environments. When we first started, we ran into a lot of problems: 1) Hosting CUA evals is annoying 2) Creating RL environments and problems is hard 3) Reviewing trajectories was super tedious 4) There
00:00
53K
Jay
@jayendra_ram
Aug 27, 2025
This is my new favorite hub.
Prime Intellect
@PrimeIntellect
Aug 27, 2025
Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI
00:00
12K
Jay
@jayendra_ram
May 3, 2024
Replying to @fluxtheorist
indeed
16K