Proximal (@ProximalHQ) / X

Proximal

85 posts

Proximal

@ProximalHQ

Advancing coding intelligence.

Joined October 2022

Pinned
Proximal
@ProximalHQ
Feb 18
Today, we are announcing Proximal. Proximal is a research lab for data. Our core belief is that data which is complex enough to teach today’s frontier models is not bottlenecked by domain experts, but by great ideas and excellent software. We are excited about a world in which
133K
Proximal reposted
elie
@eliebakouch
Jun 16
very impressive performance, and not "only" on open eval, frontierSWE @ProximalHQ is almost SOTA looking forward to DeepSWE (@datacurve) and FrontierCode (@cognition) score as well
Z.ai
@Zai_org
Jun 16
Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong
9.8K
Proximal
@ProximalHQ
Jun 16
GLM 5.2 ranks #3 on FrontierSWE. It is only behind Fable 5 and Opus 4.8, and it outperforms GPT-5.5. This is the first model that closes the large gap between models from Anthropic / OpenAI and other providers, and it is the strongest open-weight model by far.
00:00
326K
Proximal
@ProximalHQ
Jun 16
In the best@5 ranking, GLM-5.2 ranks only behind Claude Fable 5. It is particularly strong in AI research tasks, achieving the top score in the PCQM4Mv2 Molecular Gap Prediction task
39K
Proximal
@ProximalHQ
Jun 16
Check out FrontierSWE:
FrontierSWE
From frontierswe.com
7.1K
Proximal
@ProximalHQ
Jun 11
Replying to @ProximalHQ
Fable also demonstrates impressive capabilities in implementation tasks: it re-built the Dart_Style code formatter in Haskell and built a native Lua compiler targeting standalone x86-64 ELF binaries, hence saturating two of the five implementation tasks in FrontierSWE
1.9K
Proximal
@ProximalHQ
Jun 11
A more thorough analysis will follow. Congratulations to @AnthropicAI! Check out FrontierSWE:
FrontierSWE
From frontierswe.com
1.4K
Proximal
@ProximalHQ
Jun 11
Claude Fable 5 ranks #1 on FrontierSWE. This represents the biggest capability jump we have observed since releasing the benchmark On many tasks, Fable 5 works productively for close to 20 hours and fully saturates tasks that were effectively out of reach for earlier models
35K
Proximal
@ProximalHQ
Jun 11
In the FrogsGame Post-Training task, Fable manages to train Qwen3-8B to solve 67.8% of held-out puzzles, up from 3.8% for Opus 4.8 Its solution relies on synthetic reasoning traces which it generated by writing a backtracking solver and verbalizing the actions of the solver
2K
Proximal
@ProximalHQ
Jun 10
Replying to @ProximalHQ
The only way for us to experiment with new data and dogfood our product is to train models ourselves. We are building infrastructure to post-train 1T+ parameter open source LLMs. So far, we have seen great results building on top of @tinkerapi and @slime_framework
1K
Proximal
@ProximalHQ
Jun 10
Replying to @ProximalHQ
We've been experimenting with ways to create highly realistic synthetic codebases for the purpose of creating training data
9.3K
Proximal
@ProximalHQ
Jun 10
If you are interested in these problems, please reach out! Blog Post: proximal.ai/blog/our-probl… Open Roles: proximal.ai/careers
proximal.ai
Proximal — Advancing Coding Intelligence
Proximal is a research lab for coding data. We build the data engine behind the next generation of autonomous coding agents.
809
Proximal
@ProximalHQ
Jun 10
We believe that better training data will come from creative research and engineering ideas, not from hiring annotators. Here are some of the open problems we are working on:
19K
Proximal
@ProximalHQ
Jun 10
When agents attempt ultra-long horizon tasks, we would often like to revert back to critical states within a trajectory. For this, we need to build a snapshotting system for our internal sandbox infrastructure
2.6K
Proximal
@ProximalHQ
Jun 10
Replying to @ProximalHQ
We are working on better ways to quantify how valuable a single RL task is for a given model. As part of this, we found the paper "The Unlearnability Phenomenon in RLVR for Language Models" by @YulinChen99 et al. very interesting
907
Proximal
@ProximalHQ
Jun 10
Replying to @ProximalHQ
Public code serves as a useful seed for data pipelines. To curate this data, we need to scrape all code on the internet
928
Proximal
@ProximalHQ
May 28
We evaluated Claude Opus 4.8 on FrontierSWE ahead of today's release. It is now the best-performing model on FrontierSWE.
19K
Proximal
@ProximalHQ
May 28
Replying to @ProximalHQ
It is the first model to satisfy correctness checks for the libswscale performance optimization task. It chose to rewrite libswscale in Zig, and its implementation is only slightly slower than the reference C implementation.
924
Proximal
@ProximalHQ
May 28
Check out FrontierSWE: Website: frontierswe.com Github: github.com/Proximal-Labs/…
FrontierSWE
From frontierswe.com
723