Log inSign up
Prime Intellect
2,804 posts
Image
user avatar
Prime Intellect
@PrimeIntellect
The Open Stack for Self-Improving Agents discord.gg/primeintellect
primeintellect.ai
Joined June 2020
40
Following
67.1K
Followers
  • Pinned
    user avatar
    Prime Intellect
    @PrimeIntellect
    May 7
    The next wave of AI will not be won by better prompts. It will be won by systems that learn from experience. Today, Prime Intellect Lab is out of beta, open for you to start training your own models. The era of self-improving agents is here.
    Image
    00:00
    1.3M
  • Prime Intellect reposted
    user avatar
    Vibrant Labs
    @VibrantLabsAI
    13h
    1/n For browser agents, a major bottleneck in evaluation is truthful scoring on the live web. A task is only as good as your ability to confirm the agent actually did it, on a real site whose state keeps moving and that the agent can potentially misreport. So we took matters
    Cost vs Accuracy (DOM and CUA)
    4.4K
  • Prime Intellect reposted
    user avatar
    Mika Senghaas
    Prime Intellect
    @mikasenghaas
    Jun 23
    this is a good one
    user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time.
    Image
    5.2K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    prime-rl is fully open source, and we're hiring systems engineers to take it further. Read the full prime-rl performance deep dive:
    Image
    RL at 1T Scale: prime-rl Performance Deep Dive
    From primeintellect.ai
    17K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    Huge thanks to the @vllm_project team, and @robertshaw21 in particular, for all the help along the way. Also to the llm-d and Dynamo teams for the collaboration on routing and inference.
    3.4K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    The trainer is 3D-parallel (FSDP2 + CP + EP), built on TorchTitan. FSDP2 shards params, grads & optimizer state. EP keeps experts sharded and routes tokens with all2all instead of all-gathering ~80GB per layer. CP handles the 131k context and GLM-5's DSA attention.
    Image
    3.1K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    Over a long run the trainer and inference policies slowly drift apart, and that mismatch can kill your training. R3 (router replay) captures the routing decisions from the inference engine, replays them on the trainer - KL mismatch drops ~10x.
    Image
    3.3K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    One Mooncake store pools KV cache across all nodes, so any worker can reuse any prefix. The router picks workers by a score over load, queue depth, KV usage and prefix overlap. You get cross-replica cache hits with balanced routing across the whole deployment.
    Image
    3.5K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    We disaggregate prefill and decode onto separate workers. A long prefill used to stall decode for everyone. Now it doesn't.
    Image
    4.2K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Replying to @PrimeIntellect
    In RL, inference is the bottleneck — we optimize for throughput, not latency. High concurrency, FP8 precision, and wide expert parallelism over 32+ GPUs. Every GPU holds its own slice of experts and acts as its own endpoint.
    Image
    5.6K
  • user avatar
    Prime Intellect
    @PrimeIntellect
    Jun 23
    Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time.
    Image
    257K
  • Prime Intellect reposted
    user avatar
    Johannes Hagemann
    Prime Intellect
    @johannes_hage
    Jun 16
    awesome post by @kimbochen covering RL systems end-to-end, including a SWE training run on GLM-5 using our prime-rl framework.
    Image
    user avatar
    SemiAnalysis
    @SemiAnalysis_
    Jun 16
    RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/p/rl-systems-m…
    21K
  • Prime Intellect reposted
    user avatar
    elie
    Prime Intellect
    @eliebakouch
    Jun 16
    nice blog by @kimbochen about the current RL ecosystem, goes into detail about the different settings and tradeoffs to consider when RLing open models
    Image
    user avatar
    SemiAnalysis
    @SemiAnalysis_
    Jun 16
    RL Systems Mind the Gap: Matching Trainer and Generator Throughput RL Training Infrastructure, GRPO, PipelineRL, Async RL, Policy Staleness, RL Sandbox Infra, CPU Requirements, TCO Analysis, Thinking Machines Tinker newsletter.semianalysis.com/p/rl-systems-m…
    13K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement