Keiran Paster (@keirp1) / X

Keiran Paster

2,722 posts

Keiran Paster

@keirp1

MDR

Joined April 2010

Pinned
Keiran Paster
@keirp1
Oct 11, 2023
Introducing OpenWebMath, a massive dataset containing every math document found on the internet - with equations in LaTeX format! 🤗 Download on @huggingface: huggingface.co/datasets/open-… 📝 Read the paper: arxiv.org/abs/2310.06786 w/ @dsantosmarco, @zhangir_azerbay, @jimmybajimmyba!
196K
Keiran Paster
@keirp1
Apr 18, 2025
In case anyone still doesn't see the insane speed that models are getting smarter and cheaper: Yesterday, Google released Gemini 2.5 Flash, a very efficient reasoning model. Today, Grok 3 mini is stronger on most benchmarks for 7x cheaper!
xAI
@xai
Apr 18, 2025
Replying to @xai
Let’s start with Grok 3 Mini. When we set out to build a fast, affordable mini model, we knew it would be good but even we didn’t expect it to be this good. Some highlights: - Grok 3 Mini tops the leaderboards on graduate-level STEM, math, and coding, outcompeting flagship
2.9M
Keiran Paster
@keirp1
Feb 20, 2025
Btw, the chain of thought in the "thinking" mode for Grok 3 is completely open. No summarizers or obfuscation. This is really important and the reasoning process is often fascinating!
1.6M
Keiran Paster
@keirp1
Nov 4, 2022
Can large language models write prompts…for themselves? Yes, at a human-level (!) if they are given the ability to experiment and see what works. arxiv.org/abs/2211.01910 with @Yongchao_Zhou_, @_AndreiMuresanu, @ziwen_h, @silviupitis, @SirrahChan, and @jimmybajimmyba (1/7)
GIF
Keiran Paster
@keirp1
Feb 20, 2025
cope@64
Boris Power
@BorisMPower
Feb 20, 2025
Disappointing to see the incentives for the grok team to cheat and deceive in evals. Tl;dr o3-mini is better in every eval compared to grok 3. Grok 3 is genuinely a decent model, but no need to over sell.
70K
Keiran Paster
@keirp1
Nov 14, 2023
Which LLMs are generally good at math and which are overfitting to benchmarks? With the release of Grok, @xai evaluated several closed models on a Hungarian national finals math exam which was published after the models were trained. This means it is impossible to train on or
116K
Keiran Paster
@keirp1
Jun 2, 2023
Meet STEVE-1, an instructable generative model for Minecraft. STEVE-1 follows both text and visual instructions and acts on raw pixel inputs with keyboard and mouse controls. Best of all - it only cost $60 to train! w/ @Shalev_lif @SirrahChan @jimmybajimmyba @SheilaMcIlraith
00:00
170K
Keiran Paster
@keirp1
Feb 20, 2025
Replying to @mattshumer_ and @xai
One thing about Grok 3 since it was trained with much more compute/compression than other models is that it has deep pockets of knowledge in so many unexpected places. Obscure libraries, recent papers, etc.
13K
Keiran Paster
@keirp1
Nov 8, 2024
We released a beta of our API and we are giving everyone $25 free credits a month until the end of the year to give it a try! 🔗 x.ai/api The first model available in our API is grok-beta, a preview of our next-gen Grok model (this version has around the same
API: Frontier Models for Reasoning & Enterprise | xAI
From x.ai
25K
Keiran Paster
@keirp1
Aug 20, 2024
Grok 2 should be rolling out now!
14K
Keiran Paster
@keirp1
Feb 20, 2023
Super cool - seems like labeling documents as <good> or <bad> at the beginning of the sequence during pretraining outperforms throwing out the bad data. Lots of parallels to supervised RL / decision transformer as well.
Aran Komatsuzaki
@arankomatsuzaki
Feb 20, 2023
Pretraining Language Models with Human Preferences Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. arxiv.org/abs/2302.08582
45K
Keiran Paster
@keirp1
Feb 20, 2025
Replying to @mk7113252g
We decided to keep it open and there was some confusion.
23K
Keiran Paster
@keirp1
Aug 23, 2024
We dramatically improved our model in the short time between our sus-column-r and official release, now sitting at the #2 spot overall! We also doubled the speed of our inference in the last week. The rate of progress at xAI is unreal.
Arena.ai
@arena
Aug 23, 2024
Chatbot Arena update❤️‍🔥 Exciting news—@xai's Grok-2 and Grok-mini are now officially on the leaderboard! With over 6000 community votes, Grok-2 has claimed the #2 spot, surpassing GPT-4o (May) and tying with the latest Gemini! Grok-2-mini also impresses at #5. Grok-2 excels in
18K
Keiran Paster
@keirp1
Mar 3, 2025
4.5 hours later...
7.4K