Together AI (@togethercompute) / X

Together AI

2,819 posts

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CA

Joined November 2022

Pinned
Together AI
@togethercompute
Jun 21
GLM-5.2 on Together AI is showing up fast on @OpenRouter ⚡️ The model is strong, and our serving path makes that strength usable in the loop. Together has been pushing hard on inference so long-context coding and agent workloads get more tokens per GPU while staying fast.
SK
@Samking207
Jun 20
@togethercompute I see you guys, fastest GLM-5.2 TPS on @OpenRouter now. God bless 🙌 #glm #llm #ai #openrouter #zai
24K
Together AI
@togethercompute
1h
LLMs write fast single-GPU kernels. Ask for a multi-GPU one and they fall apart. ParallelKernelBench measures how they fail by benchmarking against 87 problems pulled from real codebases including Megatron-LM, DeepSpeed, DeepEP, TensorRT-LLM, NeMo-RL. New research from Willy
5.7K
Together AI
@togethercompute
1h
Replying to @togethercompute
An agentic loop (compile, test, profile, revise) helps. Gemini 3 Pro went from 24 to 35/87 correct, then plateaued after ~20 steps. Feedback fixes syntax, not rank coordination, collective ordering, or transfer-mechanism choice. TMA and NVLS stay almost unused.
534
Together AI
@togethercompute
1h
Single-shot generation still surfaces net-new kernels with no public reference: NeMo vocab-parallel log-probs, Hyena context parallelism, SAM 3 mask suppression. One GEMM + All-Gather kernel hit 87.9µs vs 320.6µs for NCCL. PKB is open. Read more and contribute below. Blog:
375
Together AI reposted
Hassan
@nutlope
3h
Ran 10 more tests comparing GLM 5.2 & Opus. On average, GLM 5.2 produced 2x the tokens but was still faster + 3x cheaper with similar quality! I'm open sourcing all these tests tomorrow, including the code, my prompts, and the token/cost stats.
00:00
00:11
Hassan
@nutlope
Jun 17
This model is insane at design. I asked GLM 5.2 (left) and Opus 4.8 (right) to build me a landing page and you can't even tell the difference. GLM cost $0.06 while opus cost $0.49. More than 6x cheaper while being faster + more token efficient. Another win for open source AI.
15K
Together AI
@togethercompute
21h
.@cartesia runs one of the hardest inference workloads: real-time voice. Their stack has to keep long-lived streams moving, serve millions of audio minutes a day, and hold model latency around 90ms. Together gives them the managed GPU infrastructure and low-level cluster
Together AI
@togethercompute
22h
Article
How Cartesia Runs Real-Time Voice AI on Together AI
The challenge Cartesia’s growth in real-time voice AI created four specific infrastructure requirements that many hosted platforms were not fit for. Voice workloads have tight latency budgets: Voice...
4.1K
Together AI
@togethercompute
22h
Article
How Cartesia Runs Real-Time Voice AI on Together AI
The challenge Cartesia’s growth in real-time voice AI created four specific infrastructure requirements that many hosted platforms were not fit for. Voice workloads have tight latency budgets: Voice...
6.3K
Together AI
@togethercompute
Jun 22
Brrrrr 🚀 and it's free to use
qilua
@qiluaH02
Jun 22
glm 5.2 = 131 token/s 🙀
59K
Together AI reposted
Hassan
@nutlope
Jun 22
Introducing The Blind Test. Two landing pages. One built by GLM 5.2 and one by Opus 4.8. Can you tell which is which? It's very difficult to get a perfect score, just try :)
00:00
15K
Together AI
@togethercompute
Jun 22
The next generation of inference needs purpose-built infrastructure. Together AI and 5C are deploying NVIDIA GB300 NVL72 systems with high-density compute, advanced cooling, and AI-optimized storage for large-scale inference and reasoning.
5C
@5CGroupAI
Jun 16
Co-built with @togethercompute, our next-gen AI Factory is deploying @nvidia GB300 NVL72 for AI inference and reasoning at scale. With Pegatron, @Vertiv, and @VAST_Data, we’re bringing high-density compute, advanced cooling, and AI-optimized storage together to power the future.
10K
Together AI
@togethercompute
Jun 21
Everyone’s trying to find where to test GLM-5.2. You can try it free on Together Chat (link below) No API setup. Just pick GLM-5.2 and start prompting. Served by Together AI on secure North American infrastructure.
00:00
12K
Together AI
@togethercompute
Jun 21
Try GLM-5.2 free on Together Chat
Together Chat
From chat.together.ai
2.6K
Together AI
@togethercompute
Jun 21
A year ago this would have been an obvious closed-model task. Now GLM-5.2 can read the issue, reason through the scene, patch the code, and keep moving on Together AI.
Brandon
@BphilSoChill
Jun 20
@togethercompute + @Zai_org GLM 5.2 are a dope combination. The inference is just so fast... I'm no pro with three js, and this is probably amateur work... But Opus 4.8 was having trouble with the 3d -> 2d transition. I started debuggin with GLM 5.2 ... #win
00:00
5.5K
Together AI
@togethercompute
Jun 21
Voice agents get a lot more interesting when they can use the screen 🔥 This demo runs the full loop on Together AI: STT, voice, and reasoning across Parakeet, MiniMax Speech 2.8, and MiniMax M3. Real-time systems need every layer of the stack to be fast.
Victor Su-Ortiz
@VictorSuOrtiz
Jun 16
forked clicky into a tiny Mac top-bar app that reviews my website designs, talks back, and patches the code itself. the loop: “what’s wrong with this?” → @MiniMax_AI M3 reads the screen + points at the weak parts “fix it” → it edits the actual files on disk @togethercompute
00:00
4.6K
Together AI
@togethercompute
Jun 19
MiniMax-M3 expands what agents can carry into context: long histories, images, video, documents, and tool outputs. Together’s inference work makes that practical at scale by improving token throughput across the serving path. More tokens per GPU means more work automated per
Together AI
@togethercompute
Jun 2
Article
Serving MiniMax-M3 on Together AI: sparse attention, paged decode, and multimodal inference
MiniMax-M3 combines three serving challenges in one model: a 1M-token context window, native multimodality, and MiniMax Sparse Attention. Together AI is the preferred cloud partner for MiniMax-M3,...
4.7K
Together AI
@togethercompute
Jun 19
GPT Image 2 from @OpenAI is now available on Together AI. Teams can now build image generation and editing into their multimodal apps through Together Serverless Inference, using OpenAI’s flagship image model for layout control, readable text, and reference-guided generation.
2.9K
Together AI
@togethercompute
Jun 19
Highlights: 👉 95%+ multilingual text rendering accuracy across Latin, Chinese, Japanese, Korean, Hindi, Bengali, and Arabic scripts 👉 Up to 16 reference images per call for product comps, style transfer, and iterative editing 👉 Native 1K, 2K, and 4K outputs 👉 Built for
1.5K
Together AI
@togethercompute
Jun 19
GPT Image 2 is live on Together AI. Try it now:
GPT Image 2 API | Together AI
From together.ai
1.2K