MaxText

MaxText#

MaxText

High performance, highly scalable, open-source LLM library and reference implementation written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training.

Get started

High-performance

MaxText achieves high Model FLOPs Utilization (MFU) and tokens/second from single host to very large clusters while staying simple and largely "optimization-free" thanks to the power of JAX and the XLA compiler.

Pre-training

MaxText provides opinionated implementations for how to achieve optimal performance across a wide variety of dimensions like sharding, quantization, and checkpointing.

Post-training

MaxText provides a scalable framework to fine-tune proprietary or OSS models using state-of-the-art Reinforcement Learning (RL) algorithms (e.g., GRPO) and techniques (e.g. SFT, Knowledge Distillation, etc).

JAX AI Stack

The JAX AI Stack is a curated collection of libraries that researchers and engineers, both inside and outside of Google, have found useful for implementing and deploying the models behind generative AI tools like Imagen, Gemini, and more.

JAX - core array operations and program transformations
Flax - For building neural networks
Orbax - For checkpointing and persistence utilities
Optax - For gradient processing and optimization
Tunix - A JAX Library with the latest experimental algorithms and post-training techniques
ml_dtypes - NumPy dtype extensions for machine learning.
MaxText model library for JAX LLMs highly optimized for TPUs
vLLM on TPU for high performance sampling (inference) for Reinforcement Learning (RL)
Pathways for multi-host inference (sampling) and highly efficient weight transfer
Optional data loading libraries (Grain or tf.data)

:link: reference/api

🔥 Latest news 🔥#

[March 6, 2026] New features from DeepSeek-AI are now supported: Conditional Memory via Scalable Lookup (Engram) and Manifold-Constrained Hyper-Connections (mHC). Try them out with our deepseek-custom starter config.
[March 5, 2026] New tpu-post-train target in PyPI. Please also use this installation option for running vllm_decode. See the MaxText installation instructions for more info.
[March 5, 2026] Qwen3-Next is now supported.
[February 27, 2026] New MaxText structure! MaxText has been restructured according to RESTRUCTURE.md. Please feel free to share your thoughts and feedback.
[December 22, 2025] Muon optimizer is now supported.
[December 10, 2025] DeepSeek V3.1 is now supported. Use existing configs for DeepSeek V3 671B and load in V3.1 checkpoint to use model.
[December 9, 2025] New RL and SFT Notebook tutorials are available.
[December 4, 2025] The ReadTheDocs documentation site has been reorganized.
[December 3, 2025] Multi-host support for GSPO and GRPO is now available via new RL tutorials.
[November 20, 2025] A new guide, What is Post Training in MaxText?, is now available.
[November 6, 2025] Ironwood TPU co-designed AI stack announced. Read the blog post on its co-design with MaxText.
[October 29, 2025] Optimized models tiering documentation has been refreshed.
[October 12, 2025] Added Versioning. Check out our first set of release notes!
[October 10, 2025] Post-Training (SFT, RL) via Tunix is now available.
[September 26, 2025] Vocabulary tiling (PR) is now supported in MaxText! Adjust config num_vocab_tiling to unlock more efficient memory usage.
[September 24, 2025] The GPT-OSS family of models (20B, 120B) is now supported.
[September 15, 2025] MaxText is now available as a PyPI package. Users can now install maxtext through pip.
[September 5, 2025] MaxText has moved to an src layout as part of RESTRUCTURE.md. For existing environments, please run pip install -e . from MaxText root.
[August 13, 2025] The Qwen3 2507 MoE family of models is now supported: MoEs: 235B Thinking & 480B Coder as well as existing dense models: 0.6B, 4B, 8B, 14B, and 32B.
[July 27, 2025] Updated TFLOPS/s calculation (PR) to account for causal attention, dividing the attention flops in half. Accounted for sliding window and chunked attention reduced attention flops in PR and PR. Changes impact large sequence configs, as explained in this doc
[July 16, 2025] We will be restructuring the MaxText repository for improved organization and clarity. Please review the proposed structure and provide feedback.
[July 11, 2025] Multi-Token Prediction (MTP) training support! Adds an auxiliary loss based on predicting multiple future tokens, inspired by DeepSeek-V3 paper, to enhance training efficiency.
[June 25, 2025] DeepSeek R1-0528 variant is now supported.
[April 24, 2025] Llama 4 Maverick models are now supported.

MaxText

Contents

MaxText#

MaxText

High performance, highly scalable, open-source LLM library and reference implementation written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training.

High-performance

Pre-training

Post-training

JAX AI Stack

🔥 Latest news 🔥#