Tenstorrent Galaxy™ Blackhole is designed to deliver flexible and scalable AI infrastructure with industry-leading performance across real-world AI workloads.
Performance starts with data movement:
- Unified compute, memory, and networking
- Seamlessly data flow across chips
Introducing tt-vscode-toolkit, an interactive learning environment designed to accelerate developing on Tenstorrent hardware.
The toolkit brings project templates directly into VS Code, with lessons covering model deployment, agent frameworks, video generation, and more.
Get
💡New Analyst Insight: AI inference is not shaping up to be a one-chip market.
VP & Principal Analyst @MattKimball_MIS examines how @tenstorrent's TT-Deploy launch positions Galaxy Blackhole, RISC-V, open-source software, and cost-per-token economics for a more heterogeneous AI
Run @tenstorrent kernels on your CPU.
ttsim is a bit-exact, full-system simulator of Wormhole and Blackhole. The whole tt-metal stack on Linux, Mac (UTM/QEMU) or Windows (WSL2) — numerics match silicon bit-for-bit.
Bring up kernels of explore the architecture. No silicon
We created an agentic AI model pipeline utilizing a fleet of our new TT-QuietBoxes and Tenstorrent Galaxies to download random models from @huggingface to port to our hardware, compile, and test for accuracy.
After thousands and thousands of models, the model pass rate has been
ICYMI we showcased our 10x faster than real-time video generation demos at TT-Deploy ft. Tenstorrent Galaxy Blackhole in collaboration with @prodialabs .
- 10x faster real-time high-quality video gen
- infinite movie w/ real-time prompting
Very big TT-Boltz update!
You can now connect multiple @tenstorrent machines together and TT-Boltz makes it all look like a single big machine. They just have to be in the same network. No expensive cables. You can connect anything.
The video shows a cluster of QuietBox 1,
"But what that means is that not only can you generate any kernel in TT-Lang, you can take kernels written in other languages targeted to other hardware, like CUDA GPUs, for example, and convert them to TT-Lang in seconds and run on our hardware.
So if there ever was a
“There are no Ethernet switches in the picture. All the data is moved through the galaxies themselves and through simple Ethernet cables which is how we achieve our effective cost as well as scale. And it’s really really critical for performance because that’s what gives us our
4x cost reduction in TTS inference with @tenstorrent!
11 NVIDIA L40S ran 550 simultaneous audio-stream at ~$100K.
Now, 27 Tenstorrent P100 chips do the same at ~$27k.
First production-grade TTS to match the cost of text tokens without degradation in audio quality.
Hear it
"We are not trying to be a narrow provider. We're not trying to be a point solution. We want to solve a lot of different problems." @jimkxa
We're focused on the fundamentals of scale, general purpose, and lower cost compute.