Log inSign up
Together AI
2,819 posts
Image
user avatar
Together AI
@togethercompute
Accelerate inference, model shaping, and pre-training on a research-optimized platform.
San Francisco, CA
together.ai
Joined November 2022
376
Following
56.9K
Followers
  • Pinned
    user avatar
    Together AI
    @togethercompute
    Jun 21
    GLM-5.2 on Together AI is showing up fast on @OpenRouter ⚡️ The model is strong, and our serving path makes that strength usable in the loop. Together has been pushing hard on inference so long-context coding and agent workloads get more tokens per GPU while staying fast.
    Image
    user avatar
    SK
    @Samking207
    Jun 20
    @togethercompute I see you guys, fastest GLM-5.2 TPS on @OpenRouter now. God bless 🙌 #glm #llm #ai #openrouter #zai
    24K
  • user avatar
    Together AI
    @togethercompute
    1h
    LLMs write fast single-GPU kernels. Ask for a multi-GPU one and they fall apart. ParallelKernelBench measures how they fail by benchmarking against 87 problems pulled from real codebases including Megatron-LM, DeepSpeed, DeepEP, TensorRT-LLM, NeMo-RL. New research from Willy
    Image
    5.7K
    user avatar
    Together AI
    @togethercompute
    1h
    Replying to @togethercompute
    An agentic loop (compile, test, profile, revise) helps. Gemini 3 Pro went from 24 to 35/87 correct, then plateaued after ~20 steps. Feedback fixes syntax, not rank coordination, collective ordering, or transfer-mechanism choice. TMA and NVLS stay almost unused.
    Image
    534
    user avatar
    Together AI
    @togethercompute
    1h
    Single-shot generation still surfaces net-new kernels with no public reference: NeMo vocab-parallel log-probs, Hyena context parallelism, SAM 3 mask suppression. One GEMM + All-Gather kernel hit 87.9µs vs 320.6µs for NCCL. PKB is open. Read more and contribute below. Blog:
    Image
    375
  • Together AI reposted
    user avatar
    Hassan
    Together AI
    @nutlope
    3h
    Ran 10 more tests comparing GLM 5.2 & Opus. On average, GLM 5.2 produced 2x the tokens but was still faster + 3x cheaper with similar quality! I'm open sourcing all these tests tomorrow, including the code, my prompts, and the token/cost stats.
    Image
    00:00
    Image
    00:11
    user avatar
    Hassan
    Together AI
    @nutlope
    Jun 17
    This model is insane at design. I asked GLM 5.2 (left) and Opus 4.8 (right) to build me a landing page and you can't even tell the difference. GLM cost $0.06 while opus cost $0.49. More than 6x cheaper while being faster + more token efficient. Another win for open source AI.
    15K
  • user avatar
    Together AI
    @togethercompute
    21h
    .@cartesia runs one of the hardest inference workloads: real-time voice. Their stack has to keep long-lived streams moving, serve millions of audio minutes a day, and hold model latency around 90ms. Together gives them the managed GPU infrastructure and low-level cluster
    user avatar
    Together AI
    @togethercompute
    22h
    Article cover image
    Article
    How Cartesia Runs Real-Time Voice AI on Together AI
    The challenge Cartesia’s growth in real-time voice AI created four specific infrastructure requirements that many hosted platforms were not fit for. Voice workloads have tight latency budgets: Voice...
    4.1K
  • user avatar
    Together AI
    @togethercompute
    22h
    Article cover image
    Article
    How Cartesia Runs Real-Time Voice AI on Together AI
    The challenge Cartesia’s growth in real-time voice AI created four specific infrastructure requirements that many hosted platforms were not fit for. Voice workloads have tight latency budgets: Voice...
    6.3K
  • user avatar
    Together AI
    @togethercompute
    Jun 22
    Brrrrr 🚀 and it's free to use
    user avatar
    qilua
    @qiluaH02
    Jun 22
    glm 5.2 = 131 token/s 🙀
    Image
    59K
  • Together AI reposted
    user avatar
    Hassan
    Together AI
    @nutlope
    Jun 22
    Introducing The Blind Test. Two landing pages. One built by GLM 5.2 and one by Opus 4.8. Can you tell which is which? It's very difficult to get a perfect score, just try :)
    Image
    00:00
    15K
  • user avatar
    Together AI
    @togethercompute
    Jun 22
    The next generation of inference needs purpose-built infrastructure. Together AI and 5C are deploying NVIDIA GB300 NVL72 systems with high-density compute, advanced cooling, and AI-optimized storage for large-scale inference and reasoning.
    user avatar
    5C
    @5CGroupAI
    Jun 16
    Co-built with @togethercompute, our next-gen AI Factory is deploying @nvidia GB300 NVL72 for AI inference and reasoning at scale. With Pegatron, @Vertiv, and @VAST_Data, we’re bringing high-density compute, advanced cooling, and AI-optimized storage together to power the future.
    Image
    Image
    Image
    Image
    10K
  • user avatar
    Together AI
    @togethercompute
    Jun 21
    Everyone’s trying to find where to test GLM-5.2. You can try it free on Together Chat (link below) No API setup. Just pick GLM-5.2 and start prompting. Served by Together AI on secure North American infrastructure.
    Image
    00:00
    12K
    user avatar
    Together AI
    @togethercompute
    Jun 21
    Try GLM-5.2 free on Together Chat
    Image
    Together Chat
    From chat.together.ai
    2.6K
  • user avatar
    Together AI
    @togethercompute
    Jun 21
    A year ago this would have been an obvious closed-model task. Now GLM-5.2 can read the issue, reason through the scene, patch the code, and keep moving on Together AI.
    user avatar
    Brandon
    @BphilSoChill
    Jun 20
    @togethercompute + @Zai_org GLM 5.2 are a dope combination. The inference is just so fast... I'm no pro with three js, and this is probably amateur work... But Opus 4.8 was having trouble with the 3d -> 2d transition. I started debuggin with GLM 5.2 ... #win
    Image
    00:00
    5.5K
  • user avatar
    Together AI
    @togethercompute
    Jun 21
    Voice agents get a lot more interesting when they can use the screen 🔥 This demo runs the full loop on Together AI: STT, voice, and reasoning across Parakeet, MiniMax Speech 2.8, and MiniMax M3. Real-time systems need every layer of the stack to be fast.
    user avatar
    Victor Su-Ortiz
    MiniMax (official)
    @VictorSuOrtiz
    Jun 16
    forked clicky into a tiny Mac top-bar app that reviews my website designs, talks back, and patches the code itself. the loop: “what’s wrong with this?” → @MiniMax_AI M3 reads the screen + points at the weak parts “fix it” → it edits the actual files on disk @togethercompute
    Image
    00:00
    4.6K
  • user avatar
    Together AI
    @togethercompute
    Jun 19
    MiniMax-M3 expands what agents can carry into context: long histories, images, video, documents, and tool outputs. Together’s inference work makes that practical at scale by improving token throughput across the serving path. More tokens per GPU means more work automated per
    user avatar
    Together AI
    @togethercompute
    Jun 2
    Article cover image
    Article
    Serving MiniMax-M3 on Together AI: sparse attention, paged decode, and multimodal inference
    MiniMax-M3 combines three serving challenges in one model: a 1M-token context window, native multimodality, and MiniMax Sparse Attention. Together AI is the preferred cloud partner for MiniMax-M3,...
    4.7K
  • user avatar
    Together AI
    @togethercompute
    Jun 19
    GPT Image 2 from @OpenAI is now available on Together AI. Teams can now build image generation and editing into their multimodal apps through Together Serverless Inference, using OpenAI’s flagship image model for layout control, readable text, and reference-guided generation.
    Image
    2.9K
    user avatar
    Together AI
    @togethercompute
    Jun 19
    Highlights: 👉 95%+ multilingual text rendering accuracy across Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts 👉 Up to 16 reference images per call for product comps, style transfer, and iterative editing 👉 Native 1K, 2K, and 4K outputs 👉 Built for
    1.5K
    user avatar
    Together AI
    @togethercompute
    Jun 19
    GPT Image 2 is live on Together AI. Try it now:
    Image
    GPT Image 2 API | Together AI
    From together.ai
    1.2K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement