CS 221M

Course description

What is the internal structure of modern neural networks and how can we study it? This course provides a broad and deep introduction to interpretability, the subfield of machine learning concerned with understanding precisely how models process information and why they produce the outputs they do. We will cover topics such as probing, steering, causal abstraction, and sparse autoencoders, with a particular emphasis on causal methods and large language models. The course will include guest lectures from leading interpretability labs across academia and industry.

Staff

Thomas Icard

Instructor

Atticus Geiger

Instructor

Amir Zur

Instructor

Jing Huang

Instructor

Junyi Tao

Teaching Assistant

Siri Vatsavaya

Course Manager

Please reach the staff at cs221m-spr2526-staff@lists.stanford.edu.

Logistics

Time: Monday, Wednesday 2:30pm-3:50pm
Location: McMurtry Art Building, Oshman
Office hours: By appointment - email course staff to schedule (cs221m-spr2526-staff@lists.stanford.edu)

Coursework

The course will have five weeks of notebook-guided lectures, four weeks of guest lectures, and one week of final presentations. Students will be graded for participation in lectures and for their final project.

Participation: Students are expected to attend all lectures and engage with the course materials; please notify staff if you are unable to attend lectures in advance.
Final project: Students will implement and present the key results from a recent paper in mechanistic interpretability.

Syllabus

Please download the syllabus here.

.

Schedule

Note: schedule is subject to change.

Date	Lesson	Readings	Materials
Week 1 Mon. March 30	Introduction	—	—
Week 1 Wed. April 1	Review of language models	Rush et al. 2018 annotated transformer Elhage et al. 2021 circuits	Slides Interactive notebook
Week 2 Mon. April 6	Behavioral analysis and input attribution	Jia and Liang 2017 adversarial evaluation Sundararajan et al. 2017 integrated gradients	—
Week 2 Wed. April 8	Probes for decoding activations	Wendler et al. 2024 llamas think in english Tenney et al. 2019 BERT rediscovers NLP pipeline Marks et al. 2023 geometry of truth	—
Week 3 Mon. April 13	Interventions for steering activations	Meng et al. 2022 editing facts Ravfogel et al. 2020 nullspace projection Geva et al. 2023 attention knockout	—
Week 3 Wed. April 15	Causal mediation analysis	Vig et al. 2020 analyzing gender bias Wang et al. 2023 IOI Mueller et al. 2025 quest for the right mediator	—
Week 4 Mon. April 20	Theory of causal abstraction I	Rubenstein et al. 2017 Beckers et al. 2019 Geiger et al. 2021, 2025	—
Week 4 Wed. April 22	Designing counterfactuals	Parakash et al. 2025 lookback mechanism Gur-Arieh et al. 2025 entity binding	—
Week 5 Mon. April 27	Automated causal interpretability	Davies et al. 2023 Cao et al. 2020, 2022 Geiger et al. 2023 DAS Wu et al. 2023 boundless DAS	—
Week 5 Wed. April 29	Theory of causal abstraction II	—	—
Week 6 Mon. May 4	—	—	—
Week 6 Wed. May 6	—	—	—
Week 7 Mon. May 11	—	—	—
Week 7 Wed. May 13	—	—	—
Week 8 Mon. May 18	—	—	—
Week 8 Wed. May 20	—	—	—
Week 9 Mon. May 25	—	—	—
Week 9 Wed. May 27	—	—	—
Week 10 Mon. June 1	Project presentations		—
Week 10 Wed. June 3	Project presentations		—

Frequently asked questions

I have submitted an application but have not heard back by Mar 27th, is it still possible to enroll in the course?

We have received more than 200 applications, far more than what we initially expected. It is truly exciting to see so many students interested in interpretability! We have increased the course capacity to accommodate as many students as we can, however, we are constrained by resources, e.g., course staff, project mentors, compute, etc. At this point, we do not plan to further increase the class size. We will likely have another iteration of the course next year, so if you are still around, check it out next spring!

Can I audit this course without enrollment?

We generally do not allow auditing. However, you are more than welcome to attend the guest lectures, which will be in the second half of the course. We will also try to make most of the course materials public.

I have enrolled in the class, but cannot attend some lectures in person.

We value participation. Students are expected to attend all lectures and engage with the course materials. If you are unable to attend a lecture due to travel or other unforeseen circumstances, you must notify us by email in advance, i.e., before the lecture. Please include the date of the anticipated absence and the reason for your absence. We will follow up with you as necessary.

CS 221M: Mechanistic Interpretability

Course description

Staff

Logistics

Coursework

Syllabus

Schedule

Frequently asked questions

Sponsors