Horace He (@cHHillee) / X

Horace He

3,568 posts

Horace He

@cHHillee

@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale

chhillee

thonking.ai/p/strangely-ma…

Joined February 2010

Pinned
Horace He
@cHHillee
Sep 10, 2025
Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!
Thinking Machines
@thinkymachines
Sep 10, 2025
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
542K
Horace He
@cHHillee
Mar 14, 2023
I suspect GPT-4's performance is influenced by data contamination, at least on Codeforces. Of the easiest problems on Codeforces, it solved 10/10 pre-2021 problems and 0/10 recent problems. This strongly points to contamination. 1/4
Horace He
@cHHillee
Mar 14, 2023
How is it even … possible to have a codeforces rating of 392? That’s very low. Like, my understanding was as long as you participated in a couple of contests (regardless of how you did), you'd have a rating above 392.
1.8M
Horace He
@cHHillee
Nov 30, 2023
Happy to OSS gpt-fast, a fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more! Code: github.com/pytorch-labs/g… Blog: pytorch.org/blog/accelerat… (1/12)
00:00
477K
Horace He
@cHHillee
Mar 4, 2025
Life update: After 4 years on PyTorch, I've joined @thinkymachines! Over the years, I've had several people ask me why I'm so reluctant to leave. I want to talk about 1. why I've stayed at PyTorch for 4 years, and 2. why Thinking Machines was so compelling to me. 1/6
268K
Horace He
@cHHillee
Mar 15, 2022
Everybody wants their models to run faster. However, researchers often cargo cult performance without a solid understanding on the underlying principles. To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3) horace.io/brrr_intro.html
Horace He
@cHHillee
Feb 27, 2023
Recently, Karpathy tweeted that *increasing* the size of his matmul made it run faster. But... why? Many people seem content to leave this as black magic. But luckily, this *can* be understood! Here's a plot of FLOPs achieved for square matmuls. Let's explain each curve! 1/19
549K
Horace He
@cHHillee
Aug 7, 2024
For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10
288K
Horace He
@cHHillee
Aug 14, 2022
Why is OpenAI's new compiler, Triton, so exciting? And what distinguishes it from other efforts to provide a Python DSL for programming Nvidia GPUs, like Numba? To answer that, we need to look at the operation behind all of deep learning - matrix multiplication. (1/7)
Horace He
@cHHillee
Sep 18, 2025
Lots of sympathy to the Anthropic team 🙏🙏🙏
Claude
@claudeai
Sep 17, 2025
Replying to @claudeai
In our investigation, we uncovered three separate bugs. They were partly overlapping, making diagnosis even trickier. We've now resolved all three bugs and written a technical report on what happened, which you can find here: anthropic.com/engineering/a-…
239K
Horace He
@cHHillee
Aug 7, 2025
You're no match for OpenAI's marketing team.
typedfemale
@typedfemale
Aug 6, 2025
i should work in marketing
155K
Horace He
@cHHillee
Dec 10, 2022
Eager mode was what made PyTorch successful. So why did we feel the need to depart from eager mode in PyTorch 2.0? Answer: it's the damn hardware! Let's tell a story about how the assumptions PyTorch were based off of became untrue, and why PyTorch needed to evolve. (1/10)
Horace He
@cHHillee
Sep 7, 2022
Ever since the V100, Nvidia has been cramming more and more "tensor cores" into each GPU generation. But what *are* tensor cores? How can you use them to accelerate deep learning models by >10x? And ... why does their existence make me somewhat sad :( (1/9)
Horace He
@cHHillee
Jan 7, 2025
I gave a talk at Jane Street about building ML systems! Although it's nominally about PyTorch, the talk primarily focuses on the notion of a "programming model", and how compiler optimizations are not a programming model. (1/4)
Sylvain Gugger
@GuggerSylvain
Jan 7, 2025
We had an awesome talk at Jane Street from the amazing @cHHillee on scaling ML systems to and I just realized the recording is now online: youtu.be/139UPjoq7Kw?si…
170K
Horace He
@cHHillee
Dec 12, 2022
I'm going to start posting a series of PyTorch 2.0 benchmarks demonstrating what kinds of things we speed up as well as exploring a variety of ML compiler optimizations! For the first one, let's talk about good old operator fusion - the workhorse of all ML compilers. (1/8)