I regret to inform you that once I have actually paged a codebase into my head, it is faster for me to make changes than it is to ask Claude to do it
Edward Z. Yang
9,047 posts
- Do you want to work on PyTorch? The PyTorch team at Facebook is hiring! Remote in many locations is OK and most things we do are open source. Reach out to me in DMs if you're interested.
- Thanks Python. >>> 9007199254740992 == 9007199254740992. True >>> 9007199254740993 == 9007199254740993. False >>> 9007199254740994 == 9007199254740994. True
- Announcing "torch.compile: the missing manual": docs.google.com/document/d/1y5… The performance/memory sections are not done yet, but everything else is as much hard won debugging knowledge that I could write down from Meta's deployments of PT2. Feedback welcome!
- People who run lots of small training jobs for your day job, what is one thing about experiment management / hygiene that you wish you knew when you started out?
- 🚨🚨🚨 HEY EVERYONE I HAVE A PODCAST ABOUT PYTORCH INTERNALS DEVELOPMENT pytorch-dev-podcast.simplecast.com Two episodes public so far, three more recorded and unreleased. Also on Spotify, Apple and Google 🚨🚨🚨
- It's really interesting comparing the Ultra-Scale playbook (huggingface.co/spaces/nanotro…) and How To Scale Your Model, aka the JAX book (jax-ml.github.io/scaling-book/) side-by-side. 🧵
- I never thought I'd say this, but writing the first compiler for PyTorch in C++ instead of Python was such a big mistake 😂
- Compile time just absolutely destroys casual contributions. "Oh, you have a free hour to write a fix? Well, spend it compiling the project first"
- I've been brainstorming episodes for the next season of PyTorch Developer Podcast. DTensor StridedShard, FSDP-TP order Redistributing a DTensor Prefetching vs Bucketing History of FSDP in PyTorch Multiprocessing: DataParallel versus DistributedDataParallel Monarch Parallelism





