Reinforcement Learning from Human Feedback

A short introduction to RLHF and post-training focused on language models.

Nathan Lambert

Course

Course lectures and talks based on the RLHF Book, built with Colloquium. Click into a deck to navigate through the slides, or open in full screen.

Welcome to the Course

Introduction and overview of what you'll learn

Watch

Lectures

Lecture 1: Overview

Chapters 1-3 · Foundations of RLHF and post-training

Lecture 2: IFT, Reward Models, & Rejection Sampling

Chapters 4, 5, 9 · Beginning the core optimization methods section

Lecture 3: RL Motivation & Math

Chapter 6, Part 1 · Policy gradients math, intuitions, and theory

Lecture 4: RL Implementation & Practice

Chapter 6, Part 2 · Code, loss aggregation, async training, and practical engineering

Other Lectures

2026

An Introduction to Reinforcement Learning from Human Feedback and Post-training

SALA 2026 · Quito, Ecuador · March 2026

Citation

If you found this useful for your research, please cite it!

@book{rlhf2026lambert,
  author = {Nathan Lambert},
  title = {Reinforcement Learning from Human Feedback},
  year = {2026},
  publisher = {Online},
  url = {https://rlhfbook.com}
}