Pinned
New notes: We've been building a research-friendly LLM-RL repo in JAX, and I recently took the time to optimize the sampling/training pipeline.
We're able to match vLLM sampling and get decent training batchsizes now!
notes.kvfrans.com/7-misc/rl-infr…







