Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models.
Note they retrain DeepSeek-V3-Base with the new 800k curated data instead of continuing to finetune the checkpoint from the first round of cold-start SFT + RL
Research Scientist at @GoogleDeepMind, ML PhD @UofT/@VectorInst. EngSci Grad. Former Canadian Rubik's Cube Champion.
Toronto, Ontario
Joined January 2013




















