Prime Intellect (@PrimeIntellect) / X

Prime Intellect

2,804 posts

Prime Intellect

@PrimeIntellect

The Open Stack for Self-Improving Agents discord.gg/primeintellect

Joined June 2020

Pinned
Prime Intellect
@PrimeIntellect
May 7
The next wave of AI will not be won by better prompts. It will be won by systems that learn from experience. Today, Prime Intellect Lab is out of beta, open for you to start training your own models. The era of self-improving agents is here.
00:00
1.3M
Prime Intellect reposted
Vibrant Labs
@VibrantLabsAI
13h
1/n For browser agents, a major bottleneck in evaluation is truthful scoring on the live web. A task is only as good as your ability to confirm the agent actually did it, on a real site whose state keeps moving and that the agent can potentially misreport. So we took matters
4.4K
Prime Intellect reposted
Mika Senghaas
@mikasenghaas
Jun 23
this is a good one
Prime Intellect
@PrimeIntellect
Jun 23
Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time.
5.2K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
prime-rl is fully open source, and we're hiring systems engineers to take it further. Read the full prime-rl performance deep dive:
RL at 1T Scale: prime-rl Performance Deep Dive
From primeintellect.ai
17K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
Huge thanks to the @vllm_project team, and @robertshaw21 in particular, for all the help along the way. Also to the llm-d and Dynamo teams for the collaboration on routing and inference.
3.4K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
The trainer is 3D-parallel (FSDP2 + CP + EP), built on TorchTitan. FSDP2 shards params, grads & optimizer state. EP keeps experts sharded and routes tokens with all2all instead of all-gathering ~80GB per layer. CP handles the 131k context and GLM-5's DSA attention.
3.1K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
Over a long run the trainer and inference policies slowly drift apart, and that mismatch can kill your training. R3 (router replay) captures the routing decisions from the inference engine, replays them on the trainer - KL mismatch drops ~10x.
3.3K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
One Mooncake store pools KV cache across all nodes, so any worker can reuse any prefix. The router picks workers by a score over load, queue depth, KV usage and prefix overlap. You get cross-replica cache hits with balanced routing across the whole deployment.
3.5K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
We disaggregate prefill and decode onto separate workers. A long prefill used to stall decode for everyone. Now it doesn't.
4.2K
Prime Intellect
@PrimeIntellect
Jun 23
Replying to @PrimeIntellect
In RL, inference is the bottleneck — we optimize for throughput, not latency. High concurrency, FP8 precision, and wide expert parallelism over 32+ GPUs. Every GPU holds its own slice of experts and acts as its own endpoint.
5.6K
Prime Intellect
@PrimeIntellect
Jun 23
Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time.
257K
Prime Intellect reposted
Johannes Hagemann
@johannes_hage
Jun 16
awesome post by @kimbochen covering RL systems end-to-end, including a SWE training run on GLM-5 using our prime-rl framework.
SemiAnalysis
@SemiAnalysis_
Jun 16
RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/p/rl-systems-m…
21K
Prime Intellect reposted
elie
@eliebakouch
Jun 16
nice blog by @kimbochen about the current RL ecosystem, goes into detail about the different settings and tradeoffs to consider when RLing open models
SemiAnalysis
@SemiAnalysis_
Jun 16
RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/p/rl-systems-m…
13K