Zain (@ZainHasan6) / X

Zain

4,588 posts

Zain

@ZainHasan6

I build and teach AI • AI/ML @togethercompute • EngSci ℕΨ/PhD @UofT • Previously: vector DBs, data scientist, lecturer & health tech founder • 🇺🇸🇨🇦🇵🇰

SF/Toronto

Joined August 2012

Pinned
Zain
@ZainHasan6
Dec 28, 2023
If you cannot explain something in simple terms, you don't understand it.
18K
Zain
@ZainHasan6
Mar 31, 2025
they tested sota LLMs on 2025 US Math Olympiad hours after the problems were released Tested on 6 problems and spoiler alert! They all suck -> 5%
1.2M
Zain
@ZainHasan6
Nov 21, 2023
The most clearest and crisp explanation, I've ever heard, of how large language models compress and capture a "world-model" in their weights simply by learning to predict the next word accurately. Furthermore, how the raw power of these base models can then be tamed by teaching
00:00
971K
Zain
@ZainHasan6
Jun 22, 2024
Curriculum for Karpathy's new planned course is 🌶️ github.com/karpathy/LLM10…
154K
Zain
@ZainHasan6
Mar 11, 2025
very cool and detailed breakdown of 7 years worth of advancements in post-training LLMs SFT, RLHF, DPO, visual SFT, MoE, reasoning
172K
Zain
@ZainHasan6
Feb 17, 2024
"Minimal, clean, educational code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization."👀👀👀
GitHub - karpathy/minbpe: Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly...
From github.com
165K
Zain
@ZainHasan6
Dec 7, 2023
Anthropic was able to solve the "lost in the middle" problem "by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation." Does it just take
176K
Zain
@ZainHasan6
Jul 5, 2024
⁉️Should you finetune your LLM or just give relevant examples in the prompt? How many examples should you give for best performance?? If you give more will it hurt perf?? Does order of the examples matter!?? New paper from Deepmind answers all these questions and more, so much
110K
Zain
@ZainHasan6
Jul 14, 2024
You don't need a 2 trillion parameter model to tell you the capital of France is Paris. Be smart and route between a panel of models according to query difficulty and model specialty! New paper proposes a framework to train a router that routes queries to the appropriate LLM
69K
Zain
@ZainHasan6
Feb 28, 2025
NeoBERT - Another successor to BERT! >> SotA performance on MTEB benchmark >> 250M parameters >> outperforms BERTlarge, RoBERTalarge, NomicBERT, and ModernBERT under identical fine-tuning conditions >> OSS - release all code, data, checkpoints, and training scripts
37K
Zain
@ZainHasan6
Apr 4, 2025
New deepseek Paper+Model: DeepSeek-GRM models automatically generate judging principles and critiques without needing a human in the loop to achieve better reward scaling with inference-time compute. Open-source model coming!🧵
34K
Zain
@ZainHasan6
Oct 25, 2024
Contextual RAG from Anthropic is pretty cool. Here's an overview of how it works. 👇 Currently re-implementing every part of the pipeline below, to learn it better. Will share a cookbook soon! Contextual RAG: 1. For every chunk - prepend an explanatory context snippet that
39K
Zain
@ZainHasan6
Jun 29, 2024
🤖Can multiple smaller open-source LLMs be combined to outperform larger monolithic LLMs? New paper shows that LLMs tend to generate better responses when presented with outputs from other models, even if less capable. They use this to build a Mixture of Agents(MoA)
39K
Zain
@ZainHasan6
Feb 21, 2024
Replying to @paulg
Can't explain this T-Rex one yet 😅
41K