Log inSign up
Datacurve
19 posts
Image
user avatar
Datacurve
@datacurve
Research and data to advance frontier models.
San Francisco
datacurve.ai
Joined February 2024
9
Following
8,190
Followers
  • Pinned
    user avatar
    Datacurve
    @datacurve
    Jun 19
    Claude Fable 5 debuts at #1 on DeepSWE. It outscores the previous best by 3% and sets a new state-of-the-art on our long-horizon coding benchmark.
    Image
    00:00
    461K
  • user avatar
    Datacurve
    @datacurve
    Jun 20
    GLM 5.2 is now on DeepSWE as the top open-source model on our leaderboard. With a pass@1 score of 44% at max effort, GLM 5.2 is indisputable #1 open-source model besting Kimi K2.7 Code by 17%.
    Image
    00:00
    555K
    user avatar
    Datacurve
    @datacurve
    Jun 20
    Our updated leaderboard at
    Image
    DeepSWE
    From deepswe.datacurve.ai
    13K
  • user avatar
    Datacurve
    @datacurve
    Jun 19
    Replying to @datacurve
    See the full updated leaderboard here:
    Image
    DeepSWE
    From deepswe.datacurve.ai
    13K
  • user avatar
    Datacurve
    @datacurve
    Jun 19
    Replying to @datacurve
    Fable 5 scores 70% pass@1 and tracks GPT-5.5 on cost-performance at the default high effort. Kimi K2.7 also joins the leaderboard with a score of 31%.
    Image
    00:00
    25K
  • Datacurve reposted
    user avatar
    Artificial Analysis
    @ArtificialAnlys
    Jun 12
    We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top
    Image
    568K
  • user avatar
    Datacurve
    @datacurve
    May 30
    Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.
    Image
    00:00
    978K
    user avatar
    Datacurve
    @datacurve
    May 30
    Opus 4.8 delivers efficiency gains by solving tasks in fewer steps, directly reducing the total number of input tokens required per task.
    Image
    124K
    user avatar
    Datacurve
    @datacurve
    May 30
    Full deep dive coming soon. Check out the full benchmark here →
    Image
    DeepSWE
    From deepswe.datacurve.ai
    26K
  • Datacurve reposted
    user avatar
    Matthew Berman
    Forward Future
    @MatthewBerman
    May 27
    DeepSWE reflects what I’m hearing from engineers better than any other benchmark. They took the hard path to build a good one.
    Image
    00:00
    Image
    user avatar
    Serena Ge (Datacurve)
    Datacurve
    @serenaa_ge
    May 26
    Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
    41K
  • Datacurve reposted
    user avatar
    Garry Tan
    Y Combinator
    @garrytan
    May 26
    This is the new standard for engineering evals
    Image
    Image
    user avatar
    Serena Ge (Datacurve)
    Datacurve
    @serenaa_ge
    May 26
    Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
    117K
  • Datacurve reposted
    user avatar
    Serena Ge (Datacurve)
    Datacurve
    @serenaa_ge
    May 26
    Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
    Image
    2M
  • Datacurve reposted
    user avatar
    Serena Ge (Datacurve)
    Datacurve
    @serenaa_ge
    Apr 4, 2024
    I presented today at Demo day Day 2 and @TechCrunch featured us @datacurve! Just been reading TC and listening to TC Daily Crunch since high school mornings... a surreal feeling to see us on it. Also, post-demo sadness cuz now YC is coming to an end
    Image
    Image
    31K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement