hud (@hud_evals) / X

hud

70 posts

hud

@hud_evals

RL environments + evals for agents | @ycombinator | we're hiring!

Joined January 2025

Pinned
hud
@hud_evals
Jul 2, 2025
we're actively hiring for these roles btw👀
atlas
@creatine_cycle
Jul 1, 2025
the jobs left after the singularity will be: - agentic workflow engineer - twink - chief of staff
62K
hud reposted
Aaron Epstein
@aaron_epstein
Jun 18
HUD has been on fire, being used by some of the largest companies in the world to build RL environments. Congrats to @hud_evals @jayendra_ram @parth220 @seeklis on the series A from @Standard_Cap!
hud
@hud_evals
Jun 18
Today, HUD is excited to share our Series A funding! We are the platform for building high quality post training datasets. Over 50 businesses use HUD to build RL environments, sell them to AI labs, or train their own models from them. Our mission is to enable a generation of
9.5K
hud reposted
Dalton Caldwell
@daltonc
Jun 18
HUD is a very interesting company, the big idea here is to provide entrepreneurs anywhere in the world the tools and infrastructure they need to get into the data business. HUD : ScaleAI :: Airbnb : Hilton
hud
@hud_evals
Jun 18
Today, HUD is excited to share our Series A funding! We are the platform for building high quality post training datasets. Over 50 businesses use HUD to build RL environments, sell them to AI labs, or train their own models from them. Our mission is to enable a generation of
28K
hud
@hud_evals
Jun 18
Today, HUD is excited to share our Series A funding! We are the platform for building high quality post training datasets. Over 50 businesses use HUD to build RL environments, sell them to AI labs, or train their own models from them. Our mission is to enable a generation of
72K
hud
@hud_evals
Jun 18
Come join us and celebrate this weekend at HUD's Frontier/RSI RL Environments Hackathon @ the YC HQ in San Francisco. We have an all-star cast of cosponsors such as: @ycombinator, @modal, @GoogleDeepMind, @OpenAI, @AnthropicAI , @daytonaio, @FireworksAI_HQ, @MiniMax_AI,
7.8K
hud
@hud_evals
Jun 18
Here's the link to the event:
HUD Frontier/RSI RL Environments Hackathon
From events.ycombinator.com
1.4K
hud
@hud_evals
Jun 12
Excited to welcome @OpenAI and @arcprize as co-sponsors! HUD RL for RSI hackathon, June 20th-21st @ YC HQ Signups close tomorrow! 📢
6.9K
hud
@hud_evals
Jun 12
apply here 👇
HUD Frontier/RSI RL Environments Hackathon
From events.ycombinator.com
598
hud
@hud_evals
Jun 10
btw, if you win our RL for RSI hackathon, u get a cool robot dog 🐕‍🦺 June 20th-21st @ YC HQ. Signups close in 3 days! 👇
00:00
hud
@hud_evals
May 16
Announcing HUD's RL environments for RSI hackathon! 🎉 Join us June 20–21 in SF if you're interested in RL and want to push the frontier forward! (w/$100,000+ in prizes and compute credits 👀)
3.1K
hud
@hud_evals
Jun 10
apply here by 13th June (!) 👉
HUD Frontier/RSI RL Environments Hackathon
From events.ycombinator.com
307
hud
@hud_evals
May 16
Announcing HUD's RL environments for RSI hackathon! 🎉 Join us June 20–21 in SF if you're interested in RL and want to push the frontier forward! (w/$100,000+ in prizes and compute credits 👀)
70K
hud
@hud_evals
May 16
You can improve models at anything you can verify. The only question left: what will you teach them? Imagine what 2040 looks like. Then work backwards. Build environments and agents to push frontier in coding, ML research, robotics, manufacturing, autonomous businesses.
2.5K
hud
@hud_evals
May 16
No prior RL experience required. Just ambition. Apply here → events.ycombinator.com/hud-frontier-j… Special thanks to our partners! @ycombinator, @AnthropicAI, @GoogleDeepMind, @modal, @daytonaio, @ExaAILabs, @FireworksAI_HQ, @sixtyfourai, @MiniMax_AI, @AntimLabs .
HUD Frontier/RSI RL Environments Hackathon
From events.ycombinator.com
2.1K
hud
@hud_evals
May 9
This Tuesday HUD is hosting Strange Evals. This session: if VLM reasoning benchmark are saturated why cant claude make me a decent PPT? DM if you’d like to join!
Vincent Koc
@vincent_koc
May 4
For my eval-maxxing nerds out there, good friends of mine are running a series called "strange evals", you can benchmaxx now on anything. If in SF swing by! luma.com/lvqbs1mo
1.9K
hud
@hud_evals
Mar 18
AI agents are deploying to prod, but can they autonomously find and patch unseen critical vulnerabilities? We introduce ZeroDayBench, a benchmark for evaluating LLM agents on proactive cyberdefense. Plus, a novel high-severity (CVSS 8.1) CVE we found partway through ... 👀
37K
hud
@hud_evals
Mar 18
Replying to @hud_evals
While creating ZeroDayBench, a member of our team discovered CVE-2025-14279, a high-severity DNS rebinding vulnerability in the MLFlow REST server allowing full read/write access to a user’s endpoint w/o authentication. Read more on: huntr.com/bounties/ef478…
1.3K
hud
@hud_evals
Mar 18
Check out the full paper by @unrelated333, @louis_sloot, @WinterCawfie as well as @Shark_Academia, @super_bavario and @jdchawla29 from @hud_evals !
arxiv.org
ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day...
Large language models (LLMs) are increasingly being deployed as software engineering agents that autonomously contribute to repositories. A major benefit these agents present is their ability to...
1.1K