Log inSign up
Keiran Paster
2,722 posts
Image
user avatar
Keiran Paster
@keirp1
MDR
keirp.com
Joined April 2010
905
Following
8,192
Followers
  • Pinned
    user avatar
    Keiran Paster
    @keirp1
    Oct 11, 2023
    Introducing OpenWebMath, a massive dataset containing every math document found on the internet - with equations in LaTeX format! 🤗 Download on @huggingface: huggingface.co/datasets/open-… 📝 Read the paper: arxiv.org/abs/2310.06786 w/ @dsantosmarco, @zhangir_azerbay, @jimmybajimmyba!
    Image
    196K
  • user avatar
    Keiran Paster
    @keirp1
    Apr 18, 2025
    In case anyone still doesn't see the insane speed that models are getting smarter and cheaper: Yesterday, Google released Gemini 2.5 Flash, a very efficient reasoning model. Today, Grok 3 mini is stronger on most benchmarks for 7x cheaper!
    user avatar
    xAI
    xAI
    @xai
    Apr 18, 2025
    Replying to @xai
    Let’s start with Grok 3 Mini. When we set out to build a fast, affordable mini model, we knew it would be good but even we didn’t expect it to be this good. Some highlights: - Grok 3 Mini tops the leaderboards on graduate-level STEM, math, and coding, outcompeting flagship
    Image
    2.9M
  • user avatar
    Keiran Paster
    @keirp1
    Feb 20, 2025
    Btw, the chain of thought in the "thinking" mode for Grok 3 is completely open. No summarizers or obfuscation. This is really important and the reasoning process is often fascinating!
    1.6M
  • user avatar
    Keiran Paster
    @keirp1
    Nov 4, 2022
    Can large language models write prompts…for themselves? Yes, at a human-level (!) if they are given the ability to experiment and see what works. arxiv.org/abs/2211.01910 with @Yongchao_Zhou_, @_AndreiMuresanu, @ziwen_h, @silviupitis, @SirrahChan, and @jimmybajimmyba (1/7)
    Image
    GIF
  • user avatar
    Keiran Paster
    @keirp1
    Feb 20, 2025
    cope@64
    user avatar
    Boris Power
    @BorisMPower
    Feb 20, 2025
    Disappointing to see the incentives for the grok team to cheat and deceive in evals. Tl;dr o3-mini is better in every eval compared to grok 3. Grok 3 is genuinely a decent model, but no need to over sell.
    70K
  • user avatar
    Keiran Paster
    @keirp1
    Nov 14, 2023
    Which LLMs are generally good at math and which are overfitting to benchmarks? With the release of Grok, @xai evaluated several closed models on a Hungarian national finals math exam which was published after the models were trained. This means it is impossible to train on or
    Image
    Image
    116K
  • user avatar
    Keiran Paster
    @keirp1
    Jun 2, 2023
    Meet STEVE-1, an instructable generative model for Minecraft. STEVE-1 follows both text and visual instructions and acts on raw pixel inputs with keyboard and mouse controls. Best of all - it only cost $60 to train! w/ @Shalev_lif @SirrahChan @jimmybajimmyba @SheilaMcIlraith
    Image
    00:00
    170K
  • user avatar
    Keiran Paster
    @keirp1
    Feb 20, 2025
    Replying to @mattshumer_ and @xai
    One thing about Grok 3 since it was trained with much more compute/compression than other models is that it has deep pockets of knowledge in so many unexpected places. Obscure libraries, recent papers, etc.
    13K
  • user avatar
    Keiran Paster
    @keirp1
    Nov 8, 2024
    We released a beta of our API and we are giving everyone $25 free credits a month until the end of the year to give it a try! 🔗 x.ai/api The first model available in our API is grok-beta, a preview of our next-gen Grok model (this version has around the same
    Image
    API: Frontier Models for Reasoning & Enterprise | xAI
    From x.ai
    25K
  • user avatar
    Keiran Paster
    @keirp1
    Aug 20, 2024
    Grok 2 should be rolling out now!
    14K
  • user avatar
    Keiran Paster
    @keirp1
    Feb 20, 2023
    Super cool - seems like labeling documents as <good> or <bad> at the beginning of the sequence during pretraining outperforms throwing out the bad data. Lots of parallels to supervised RL / decision transformer as well.
    user avatar
    Aran Komatsuzaki
    @arankomatsuzaki
    Feb 20, 2023
    Pretraining Language Models with Human Preferences Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. arxiv.org/abs/2302.08582
    Image
    45K
  • user avatar
    Keiran Paster
    @keirp1
    Feb 20, 2025
    Replying to @mk7113252g
    We decided to keep it open and there was some confusion.
    23K
  • user avatar
    Keiran Paster
    @keirp1
    Aug 23, 2024
    We dramatically improved our model in the short time between our sus-column-r and official release, now sitting at the #2 spot overall! We also doubled the speed of our inference in the last week. The rate of progress at xAI is unreal.
    user avatar
    Arena.ai
    @arena
    Aug 23, 2024
    Chatbot Arena update❤️‍🔥 Exciting news—@xai's Grok-2 and Grok-mini are now officially on the leaderboard! With over 6000 community votes, Grok-2 has claimed the #2 spot, surpassing GPT-4o (May) and tying with the latest Gemini! Grok-2-mini also impresses at #5. Grok-2 excels in
    Image
    18K
  • user avatar
    Keiran Paster
    @keirp1
    Mar 3, 2025
    4.5 hours later...
    7.4K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement