Log inSign up
Jonathan Roberts
177 posts
Image
user avatar
Jonathan Roberts
@JRobertsAI
PhD Student, Applied Machine Learning, University of Cambridge
Cambridge
jonathanroberts42.github.io
Joined December 2022
409
Following
602
Followers
  • Pinned
    user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Is computer vision “solved”? Not yet Current models score 0% on ZeroBench 🧵1/6
    Image
    1.4M
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Replying to @JRobertsAI
    ZeroBench includes 100 manually-curated multi-step visual reasoning questions Questions are curated adversarially They span both natural and synthetic images 2/6
    Image
    Image
    99K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Replying to @JRobertsAI
    We evaluate 20 LMMs on our benchmark, finding all models to score 0% pass@1 (temperature=0) and 0% 5/5 reliability Several questions are tantalisingly close to current capabilities, with some models correctly answering them in a pass@5 setting 3/6
    Image
    Image
    34K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Replying to @JRobertsAI
    Project page: zerobench.github.io Paper: arxiv.org/abs/2502.09696 Dataset: huggingface.co/datasets/jonat… 5/6
    25K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Replying to @JRobertsAI
    If you spot a new issue with any of the ZeroBench questions, please let us know here: docs.google.com/document/d/1qd… (more details for our dataset refinement strategy to come shortly) 4/6
    docs.google.com
    ZeroBench Refinement
    ZeroBench Refinement The ZeroBench benchmark contains 100 manually-curated challenging visual reasoning questions. ZeroBench was constructed to be a difficult eval, largely beyond the capabilities of...
    28K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Mar 18, 2025
    🥳📢 GPT 4.5 is the new State of the Art on ZeroBench: 1% pass@1 7% pass@5 0% 5/5 reliability
    9.6K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Replying to @JRobertsAI
    This project was carried out with some great collaborators including @taesiri @ioanacroi @vladbogo @vishaal_urao @gyunginshin @anh_ng8 @kaihan_vis @SamuelAlbanie
    23K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Apr 17, 2025
    👏Some recent ZeroBench pass@1 results: o3: 3% Gemini 2.5 Pro: 3% o4-mini: 2% Llama 4 Maverick: 0% GPT-4.1: 0%
    7.8K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    We need you, eagle-eyed folks of X! Help us red team ZeroBench to find errors To recognise effort, we will offer co-authorship to those who find new issues Details below 1/5
    user avatar
    Jonathan Roberts
    @JRobertsAI
    Feb 17, 2025
    Is computer vision “solved”? Not yet Current models score 0% on ZeroBench 🧵1/6
    Image
    4.3K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Mar 28, 2025
    🔥Newly released Gemini 2.5 Pro is State of the Art on ZeroBench: 3% pass@1 5% pass@5 1% 5/5 reliability
    1.5K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Mar 12, 2025
    🚨 ZeroBench: We created the HARDEST visual reasoning benchmark we could—then invited the AI community to red team our data 🔥 One month later, here's what happened... 🧵👇
    4.1K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Aug 22, 2024
    🎉📢New Paper! Introducing GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models grab-benchmark.github.io The highest-performing model scores just 21.7% A thread 🧵
    Image
    3.1K
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Dec 8, 2022
    Although it is a language model, ChatGPT can be used for object recognition! #OpenAI #ChatGPT
    Image
  • user avatar
    Jonathan Roberts
    @JRobertsAI
    Mar 12, 2025
    Replying to @JRobertsAI
    Thanks to all those who contributed 🔥 Updated (v2): Paper: arxiv.org/abs/2502.09696 Dataset: huggingface.co/datasets/jonat…
    203

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement