Log inSign up
Jacob Buckman
2,437 posts
Image
user avatar
Jacob Buckman
@jacobmbuckman
Formerly @jhuclsp, @GoogleAI, @SCSatCMU, @MilaMontreal, founder @manifest__ai.
SF
jacobbuckman.com
Joined December 2016
384
Following
5,879
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Oct 29, 2025
    The end of the transformer era marches slowly closer: we trained a completely attention-free foundation model at the 14B scale for only $4,000. The performance matches other models of similar scale, including transformers and hybrid models.
    Image
    Image
    user avatar
    Manifest AI
    @manifest__ai
    Oct 29, 2025
    Today we are releasing Brumby-14B-Base, the strongest attention-free base model around. manifestai.com/articles/relea…
    Readers added context they thought people might want to knowReaders added context
    The post implies from-scratch training of an attention-free model for $4,000, but Brumby-14B repurposes pretrained Qwen3-14B weights via "power retention layers" for rapid adaptation. The author agree they should have used different wording. manifestai.com/articles/relea… x.com/jacobmbuckman/…
    Context is written by people who use X, and appears when rated helpful by others. Find out more.
    214K
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jan 10, 2023
    Are you a PhD student struggling to get a job or internship? Jealous of the success of your more-cited peers? More concerned with your career than doing good science? Here is a thread of eight invaluable techniques to "improve" your publication and citation metrics. vv 🧵🧵🧵 vv
    112K
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Sep 23, 2025
    Transformers are broken. Today, Manifest AI is releasing Power Retention, an open-source architecture to replace them. More below 🧵:
    user avatar
    Manifest AI
    @manifest__ai
    Sep 23, 2025
    Today, we’re releasing Power Retention, a new architecture beyond Transformers. It enables LLMs to handle millions of tokens efficiently, unlocking long-context applications that were too costly before. manifestai.com/articles/relea…
    Image
    00:00
    95K
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    May 30, 2021
    New blog post, "Please Commit More Blatant Academic Fraud": jacobbuckman.com/2021-05-29-ple… Yes, I'm serious. Blatant academic fraud might be our best shot at developing the future of artificial intelligence.
    Image
    jacobbuckman.com
    Please Commit More Blatant Academic Fraud
    This week, I was thrilled to read about the first well-documented case of explicit academic fraud in the artificial intelligence community. I hope that this is the beginning of a trend, and that...
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jun 26, 2018
    New blog post on understanding Tensorflow abstractions! jacobbuckman.com/post/tensorflo…
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Dec 27, 2021
    I'm trying to write a good answer for "What is deep learning?" -- an answer that is specific but also complete. What's something that obviously deserves to be considered deep (supervised) learning, but doesn't fit this definition?
    Image
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jan 18, 2020
    New blog post with @carlesgelada -- "A Sober Look at Bayesian Neural Networks": jacobbuckman.com/2020-01-17-a-s… Without a good prior, Bayesian uncertainties are meaningless. We argue that BNN priors are likely quite poor, and concretely characterize one specific failure mode.
    Image
    jacobbuckman.com
    A Sober Look at Bayesian Neural Networks
    by Carles Gelada and Jacob Buckman WARNING: This is an old version of this blogpost, and if you are a Bayesian, it might make you angry. Click here for an updated post with the same content. Context:...
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jun 12, 2021
    Paper writing tip: no matter the topic, always remember to cite (1) a random paper by Hinton from the 80s and (2) capsule networks, both within the first two paragraphs. Reviewers will assume that the paper is by Geoff Hinton and give you a free accept!
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    May 9, 2021
    The three worst ideas in deep learning are batchnorm, epochs, and overfitting
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Apr 9, 2023
    This thread is, unfortunately, a pretty clear indication that he does *not* properly understand some of the concepts underlying DL. While "GPTs are not GANs" is true in the most literal sense, his description of the implications of this is totally off. 1/n
    181K
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Sep 24, 2019
    New blog post: Automation via RL jacobbuckman.com/2019-09-23-aut… RL research should be oriented around the eventual goal of solving real-world tasks with less effort. To progress towards this goal, we need to change how we motivate and evaluate RL algorithms.
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jun 15, 2022
    New blog post, "An Actually-Good Argument Against Naive AI Scaling": jacobbuckman.com/2022-06-14-an-… A response to @slatestarcodex and @GaryMarcus, in which I point out that they are both wrong. The current paradigm is certainly limited, but not for the reasons that Gary claims.
    Image
    jacobbuckman.com
    An Actually-Good Argument Against Naive AI Scaling
    The past few days have seen a back-and-forth between Scott Alexander and Gary Marcus on the topic of AI scaling (post1, post2, post3, post4). Specifically, the debate is whether scaled-up language...
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jan 8, 2024
    Anyone who has trained a Transformer has viscerally felt its O(T^2) cost. It is not tractable to train Transformers end-to-end on long contexts. Here's a writeup of the research direction I believe is most likely to solve this: linear transformers. manifestai.com/blogposts/fast… 1/7
    Image
    90K
  • user avatar
    Jacob Buckman
    @jacobmbuckman
    Jun 11, 2021
    Permanent offer: if anyone wants a high-effort, public, non-anonymized, most-likely-critical review, please send a draft of your paper my way. I can't promise I will help you get into conferences, but I will do my best to help improve the quality of the science.
Advertisement
Advertisement