Log inSign up
Peter Gostev (SF: 22-26 June)
Arena.ai
1,828 posts
Image
user avatar
Peter Gostev (SF: 22-26 June)
Arena.ai
@petergostev
London 🇬🇧 AI Capability @ Arena.ai linkedin.com/in/peter-goste…
Joined June 2025
888
Following
12.5K
Followers
  • Pinned
    user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Feb 24
    I've got a fun new benchmark for you where most LLMs are doing pretty badly - "Bullshit Benchmark". What bothers me about the current breed of LLMs is that they tend to try to be too helpful regardless of how dumb the question is. So I've built 55 'bullshit' questions that don't
    Image
    00:00
    840K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Oct 3, 2025
    I think this is my favourite Sora 2 video I generated - Cleopatra visiting modern Egypt
    Image
    00:00
    371K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Jul 20, 2025
    It is crazy to me that some still don't see how big our GPU shortage is: - Most context window is <100k - Delayed rollouts of Agents, Codex - Full Sora never released - Veo 3 roll out taking weeks - Claude constant rate limits - Even big clouds default rate limits are
    user avatar
    Sam Altman
    OpenAI
    @sama
    Jul 20, 2025
    we will cross well over 1 million GPUs brought online by the end of this year! very proud of the team but now they better get to work figuring out how to 100x that lol
    485K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Sep 23, 2025
    Nvidia is still cheap
    Image
    user avatar
    Sam Altman
    OpenAI
    @sama
    Sep 23, 2025
    Abundant Intelligence: blog.samaltman.com/abundant-intel…
    387K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Jul 18, 2025
    OpenAI are testing a new model on the Web Dev Arena @lmarena_ai under the name 'Anonymous Chatbot 0717'. I can't believe I'm gonna say this, but it is genuinely at a completely different level of front end coding - far better than Sonnet, o3, Gemini 2.5 Pro, or Grok 4. To test
    Image
    00:00
    496K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Sep 8, 2025
    I will not back down @OpenAI
    Image
    Image
    user avatar
    Michaël Trazzi
    @MichaelTrazzi
    Sep 5, 2025
    Hi, my name's Michaël Trazzi, and I'm outside the offices of the AI company Google DeepMind right now because we are in an emergency. I am here in support of Guido Reichstadter, who is also on hunger strike in front of the office of the AI company Anthropic. DeepMind, Anthropic
    249K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Sep 26, 2025
    OpenAI have 9x'd their capacity this year, and have a goal to 125x further by 2033, thus exceeding current energy capacity of the whole of India - as per @alexeheath reporting of @sama's internal slack note. One thing to note is that 125x energy capacity increase understates the
    Image
    246K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Oct 14, 2025
    Nvidia's DGX Spark goes on sale today and @lmsysorg have done a brilliant bit of benchmarking vs other systems. In short, it is a very usable system for smaller models and closer in performance to Apple's devices (e.g. Mac Mini M4 Pro), but it is priced at $4,000 vs $1,400 for
    Image
    Image
    user avatar
    LMSYS Org
    @lmsysorg
    Oct 14, 2025
    🚀 SGLang In-Depth Review of the NVIDIA DGX Spark is LIVE! Thanks to @nvidia’s early access program, SGLang makes its first ever appearance in a consumer product, the brand-new DGX Spark. The DGX Spark’s 128GB Unified Memory and Blackwell architecture set a new standard for
    285K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Sep 20, 2025
    The amount of compute that labs have is growing dramatically. For example, @OpenAI has 14x more compute than when it launched GPT-4. @xai is growing compute extremely rapidly, and they doubled the compute available between the Grok 3 and Grok 4 launches. @AnthropicAI has the
    Image
    121K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Sep 29, 2025
    Comparisons of Sonnet 4.5 vs GPT-5 Pro. I appreciate the comparison is not exactly fair, but I've had GPT-5 Pro videos ready to go, so forgive me. Saying that it does give a good benchmark, I don't think there was a single instance of Sonnet 4.5 being better, and I was testing
    Image
    00:00
    256K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Oct 26, 2025
    Reminder: OpenAI had a model back in December 2024 (10 months ago) that scored higher than GPT-5-Pro or Sonnet 4.5 on @arcprize
    Image
    149K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Nov 19, 2025
    OpenAI's new GPT-5.1-Codex-Max (Extra-High) - more advanced Golden Gate Bridge prompt with a couple of turns of 5-7 minutes each. This is definitely the best I ever got out of this type of prompt by far.
    Image
    00:00
    199K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Aug 15, 2025
    Anthropic and Google are losing coding share in recent weeks, according to @openrouter data. This isn’t really because of GPT-5 (it has <5% share), but rather the rapid rise of Qwen 3 Coder, now >20% in this dataset. Caveat: this reflects only a small slice of the market. It
    Image
    181K
  • user avatar
    Peter Gostev (SF: 22-26 June)
    Arena.ai
    @petergostev
    Sep 28, 2025
    Nvidia's market cap is $4.3 trillion, what are they going to do with all that cash? One thing they can do is spend ~8.5% of their market cap to acquire the entire US Nuclear Industry - which is only worth $369bn - and build enough nuclear to power all of these GPUs with green
    Image
    424K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement