Log inSign up
Horace He
Thinking Machines
3,568 posts
user avatar
Horace He
Thinking Machines
@cHHillee
@thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale
chhillee
thonking.ai/p/strangely-ma…
Joined February 2010
590
Following
50.7K
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Sep 10, 2025
    Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!
    user avatar
    Thinking Machines
    @thinkymachines
    Sep 10, 2025
    Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
    Image
    542K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Mar 14, 2023
    I suspect GPT-4's performance is influenced by data contamination, at least on Codeforces. Of the easiest problems on Codeforces, it solved 10/10 pre-2021 problems and 0/10 recent problems. This strongly points to contamination. 1/4
    Image
    Image
    user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Mar 14, 2023
    How is it even … possible to have a codeforces rating of 392? That’s very low. Like, my understanding was as long as you participated in a couple of contests (regardless of how you did), you'd have a rating above 392.
    1.8M
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Nov 30, 2023
    Happy to OSS gpt-fast, a fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more! Code: github.com/pytorch-labs/g… Blog: pytorch.org/blog/accelerat… (1/12)
    Image
    00:00
    477K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Mar 4, 2025
    Life update: After 4 years on PyTorch, I've joined @thinkymachines! Over the years, I've had several people ask me why I'm so reluctant to leave. I want to talk about 1. why I've stayed at PyTorch for 4 years, and 2. why Thinking Machines was so compelling to me. 1/6
    Image
    268K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Mar 15, 2022
    Everybody wants their models to run faster. However, researchers often cargo cult performance without a solid understanding on the underlying principles. To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3) horace.io/brrr_intro.html
    Image
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Feb 27, 2023
    Recently, Karpathy tweeted that *increasing* the size of his matmul made it run faster. But... why? Many people seem content to leave this as black magic. But luckily, this *can* be understood! Here's a plot of FLOPs achieved for square matmuls. Let's explain each curve! 1/19
    Image
    549K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Aug 7, 2024
    For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10
    Image
    288K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Aug 14, 2022
    Why is OpenAI's new compiler, Triton, so exciting? And what distinguishes it from other efforts to provide a Python DSL for programming Nvidia GPUs, like Numba? To answer that, we need to look at the operation behind all of deep learning - matrix multiplication. (1/7)
    Image
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Sep 18, 2025
    Lots of sympathy to the Anthropic team 🙏🙏🙏
    Image
    Image
    Image
    Image
    user avatar
    Claude
    Anthropic
    @claudeai
    Sep 17, 2025
    Replying to @claudeai
    In our investigation, we uncovered three separate bugs. They were partly overlapping, making diagnosis even trickier. We've now resolved all three bugs and written a technical report on what happened, which you can find here: anthropic.com/engineering/a-…
    239K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Aug 7, 2025
    You're no match for OpenAI's marketing team.
    Image
    "Bar chart comparing SWE-bench Verified accuracy scores between Opus 4 (May 2025) showing 72.5% and Opus 4.1 (Aug 2025) showing 74.5%. The y-axis scale ranges from 71 to 76, making the 2 percentage point improvement appear visually dramatic."
    user avatar
    typedfemale
    @typedfemale
    Aug 6, 2025
    i should work in marketing
    155K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Dec 10, 2022
    Eager mode was what made PyTorch successful. So why did we feel the need to depart from eager mode in PyTorch 2.0? Answer: it's the damn hardware! Let's tell a story about how the assumptions PyTorch were based off of became untrue, and why PyTorch needed to evolve. (1/10)
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Sep 7, 2022
    Ever since the V100, Nvidia has been cramming more and more "tensor cores" into each GPU generation. But what *are* tensor cores? How can you use them to accelerate deep learning models by >10x? And ... why does their existence make me somewhat sad :( (1/9)
    Image
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Jan 7, 2025
    I gave a talk at Jane Street about building ML systems! Although it's nominally about PyTorch, the talk primarily focuses on the notion of a "programming model", and how compiler optimizations are not a programming model. (1/4)
    Image
    user avatar
    Sylvain Gugger
    @GuggerSylvain
    Jan 7, 2025
    We had an awesome talk at Jane Street from the amazing @cHHillee on scaling ML systems to and I just realized the recording is now online: youtu.be/139UPjoq7Kw?si…
    170K
  • user avatar
    Horace He
    Thinking Machines
    @cHHillee
    Dec 12, 2022
    I'm going to start posting a series of PyTorch 2.0 benchmarks demonstrating what kinds of things we speed up as well as exploring a variety of ML compiler optimizations! For the first one, let's talk about good old operator fusion - the workhorse of all ML compilers. (1/8)
This post is unavailable.
Advertisement
Advertisement