Sachin (@sachdh) / X

Sachin

2,292 posts

Sachin

@sachdh

cooking custom specialized models at @savantedotai

Joined April 2019

Pinned
Sachin
@sachdh
Jul 22, 2025
Excited to share Aryabhatta 1.0, our leading model that scores 90.2% on JEE Mains, outperforming frontier models like o4 mini and Gemini Flash 2.5 Trained by us at @AthenaAgentRL , in collaboration with @physics__wallah, using custom RLVR training on 130K+ curated JEE problems
198K
Sachin
@sachdh
Sep 12, 2025
you dont need GRPO; plain old policy gradient works. this blog post is a short and sweet case study of how to train LLM with RL for your product use case - they have huge amount of dataset - experimented with reward structure - used simple policy gradient - worked on
Cursor
@cursor_ai
Sep 11, 2025
We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.
00:00
70K
Sachin
@sachdh
Sep 23, 2025
the reason why I don’t like Anthropic - this video shows pure arrogance and disdain for anyone else who isn’t big labs their words and actions around safetyism reflect the same - only we can train the models and you peasants can’t open source AI is a wonderful thing and will win
高级分析师
@techeconomyana
Sep 22, 2025
Anthropic CEO Dario谈开源模型： - 大模型开放权重不同于软件开源，不存在开发者社区的反向贡献。 - 开源只是吸引注意力的幌子，用户只关心这个模型是否好用。Deepseek开源与否都无所谓，作为一个超大模型，推理起来很困难。 - 开源并不等于免费，推理服务器运行，是有成本的。
00:00
68K
Sachin
@sachdh
Apr 22, 2024
YC Application submitted at 2:06 AM Ready to fly at 2:26 AM Peak engineer ? Or master procrastinator? 🤔
27K
Sachin
@sachdh
Oct 20, 2025
PPO is one of the RL algorithms; not the magic algorithm for every RL problem same goes for GRPO / GSPO or any other current policy gradients algo rl as a problem formulation is excellent - agent sees state - agent takes action - agent gets reward and next state this is so
Csaba Szepesvari
@CsabaSzepesvari
Oct 19, 2025
Replying to @karpathy
@karpathy I think it would be good to distinguish RL as a problem from the algorithms that people use to address RL problems. This would allow us to discuss if the problem is with the algorithms, or if the problem is with posing a problem as an RL problem. 1/x
36K
Sachin
@sachdh
Sep 11, 2025
best / super efficient RL framework doesn't exist. profile everything and write your own training scrips. experiment with everything - reward functions, calculations of advantages, objective functions, training prompt distributions. GRPO is good; it is not untouchable. it is just
will brown
@willccbb
Sep 11, 2025
"veRL is the best RL framework it's super efficient" really. are you sure about that. are you sure that you need 16 GPUs to tune a 7B model at 8k context. do you think that it's reasonable each step takes 19 minutes for this
26K
Sachin
@sachdh
Aug 5, 2025
want to prepare for the same skills? want to train the models to solve hard tasks? want to join the early, chaotic but extremely ambitious pirate ship? we are hiring and DMs are open. if you dont care about leetcode and trad job routes; but just want to take on ambitious
TDM (e/λ) (L8 vibe coder 💫)
@cto_junior
Aug 5, 2025
Whatever gold rush is happening in valley, will happen in Bangalore in max next 2 years So if you're feeling lost seeing those large comps, just prepare for the same skills Time will reward you well (somewhat PPP adjusted)
22K
Sachin
@sachdh
Jul 27, 2025
I am not a devout Hindu but statues and temples in Bali make me feel proud of and connected to the Vedic roots we should revive Indian cities to reflect Indian Sanskriti
7.3K
Sachin
@sachdh
Mar 23, 2025
Paras is summarizing the most important lesson of Geeta - perform your own dharma - in the context of societal dynamics.
Paras Chopra
@paraschopra
Mar 23, 2025
Replying to @paraschopra
30/ The point is not to avoid working hard, but it is to work on things you find fulfilling while completely ignoring what you think is the objectively right thing to do (which is often an idea implanted into your head by the society).
9.5K
Sachin
@sachdh
Aug 14, 2025
The Aryabhatta 1.0 paper drop happened! Finally! Sorry for the delay! This paper talks about composition of our training datasets, some details about our training methodologies and few interesting tidbits about how we did effective exploration with RLVR. Do checkout and dm me /
Sachin
@sachdh
Jul 22, 2025
Excited to share Aryabhatta 1.0, our leading model that scores 90.2% on JEE Mains, outperforming frontier models like o4 mini and Gemini Flash 2.5 Trained by us at @AthenaAgentRL , in collaboration with @physics__wallah, using custom RLVR training on 130K+ curated JEE problems
17K
Sachin
@sachdh
Sep 13, 2025
it was so much fun discussing my learnings of training a sota maths llm - aryabhata 1.0 @lossfunk is where all of this journey started with - curious conversations about o1 - first talk about speculations of how o1 would have been trained - conversations with @paraschopra about
Naveen Benny
@navbenny
Sep 13, 2025
Yesterday's session by @sachdh session @lossfunk about how they achieved SOTA in JEE with RLFT+ was 🔥 I always learn something from him in every interaction and is one of the few ML practitioners I deeply respect (he's too humble for his own sake). My learnings 🧵
9K
Sachin
@sachdh
Aug 2, 2025
story of how I got into machine learning - my first job was with a services company - my manager wanted me to maintain legacy C++ software & I wanted to train models - found a lead data scientist in the same company on LinkedIn and convinced him to interview me - manager got
Raj Dabre
@prajdabre
Aug 1, 2025
Share a piece of your "I almost quit my job to do XYZ" lore.
10K
Sachin
@sachdh
Jul 22, 2025
Replying to @sachdh
Aryabhatta 1.0 proves that sometimes David wins—not because of brute force, but because of precision. You mostly dont need 100B+ parameters or 16k+ context lengths.
9.3K
Sachin
@sachdh
Jul 27, 2025
if you are one of these giants and want to train custom LLMs for your own use case, my DMs are open. we started @AthenaAgentRL to help you train sota and cheap inference LLMs. please check our proof of work - Aryabhatta 1.0 on HF and then ping me
sphinx
@protosphinx
Jul 27, 2025
So apparently China has several top models now. I can understand India’s manufacturing challenges. I can understand the infra gaps. What I can’t understand is how Indian IT giants sitting on billions have made no real AI progress while their core business is under direct threat.
7K