A high-performance inference framework for AI

MAX provides powerful libraries and tools to develop, optimize and deploy AI on GPUs fast.

Quickstart with MAX

Talk to our engineers

Why developers use MAX

Incredible Performance
MAX was built from the ground up to deliver out-of-the-box performance for AI workloads.
See how we measure performance.
Hardware Portability
MAX provides portability across CPU+GPU generations, and gives you incredible utilization benefits - driving real compute cost savings.
Complete Control
Optimize your model's performance, write custom ops, or build your own model. MAX gives you full control over every layer of the stack.

MAX Inference

Deploy GenAI in Seconds

AI Agents

Scale AI Agents seamlessly with enterprise-grade speed, reliability, and efficiency.

RAG & CAG

Enhance decision-making, drive automation, and optimize enterprise operations for efficiency.

Chatbots

Automate conversations and enhance user engagement and satisfaction.

Batch processing

Optimize GPU utilization, reduce latency and improve throughput for AI workloads.

Code Generation

Generate accurate, efficient, and reliable code that aligns with coding standards.

AI Inference

Build a scalable, cost effective inference infrastructure for your AI models.

Develop custom GPU research with MAX

Advanced developers

MAX for Research

Advanced tools and libraries for model, kernel, and hardware developers to deliver even more precise control. Some of the many tools include:

Always-on compute with SOTA inference performance
Control single to multi-gpu scaling
Program heterogenous compute
Write custom GPU code
Low-level host and device control

Works with AI code editors

Plug into any AI Coding Editor

The Modular Platform works great with any AI code editor. Cursor, Claude Code, Windsurf - all supported with streamlined setup.

Deploy GenAI Now: Step by step examples

AI Weather Agent

Build an intelligent weather assistant with MAX Serve, FastAPI and NextJS.

View Editions

Continuous Chat App

Build a functional chat application using Llama 3 and MAX Serve.

View Editions

Use Open WebUI

Use Open WebUI with MAX Serve for RAG and Web Search.

View Editions

Offline Inference with MAX

Pair MAX with Hugging Face to perform inference locally and efficiently.

View Editions

Generate embeddings

Generate Embeddings with MAX Serve, using an OpenAI-compatible API.

View Editions

OpenAI Function Calling

Creating AI Agents with MAX Serve and OpenAI Function Calling

View Editions

FREE for everyone

Paid support for scaled enterprise deployments

MAX Self Managed
Free Forever
MAX is available FREE for everyone to self manage
- Incredible performance for LLMs, PyTorch, and ONNX models
- Deploy MAX yourself on-prem or on any cloud provider
- Community support through Discord and Github
Read More
MAX Enterprise
PAY AS YOU GO
MAX is available FREE for everyone to self manage
- SLA support with guaranteed response time.
- Dedicated Slack channel and account manager.
- Access to the world’s best AI engineering team.
Read More

Developer Approved

very excited

strangemonad

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

works across the stack

scrumtuous

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

performance is insane

drdude81

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

was a breeze!

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

impressed

justin_76273

“The more I benchmark, the more impressed I am with the MAX Engine.”

huge increase in performance

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

easy to optimize

dorjeduck

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

impressive speed

Adalseno

"It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing."

one language all the way through

fnands

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through is be awesome.”

amazing achievements

Eprahim

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

feeling of superpowers

Aydyn

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

12x faster without even trying

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

The future is bright!

mytechnotalent

Mojo destroys Python in speed. 12x faster without even trying. The future is bright!

Community is incredible

benny.n

“The Community is incredible and so supportive. It’s awesome to be part of.”

completely different ballgame

scrumtuous

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

actually flies on the GPU

Sanika

"after wrestling with CUDA drivers for years, it felt surprisingly… smooth. No, really: for once I wasn’t battling obscure libstdc++ errors at midnight or re-compiling kernels to coax out speed. Instead, I got a peek at writing almost-Pythonic code that compiles down to something that actually flies on the GPU."

high performance code

jeremyphoward

"Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators."

pure iteration power

Jayesh

"This is about unlocking freedom for devs like me, no more vendor traps or rewrites, just pure iteration power. As someone working on challenging ML problems, this is a big thing."

surest bet for longterm

pagilgukey

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

potential to take over

svpino

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

Show more quotes

Build the future of AI with Modular

Get started - FREE

View Editions

Get started guide
Install MAX with a few commands and deploy a GenAI model locally.
Read Guide
Browse open models
500+ models, many optimized for lightning-fast performance
Browse models

A high-performance inference framework for AI

Why developers use MAX

MAX Inference

Develop custom GPU research with MAX

MAX for Research

Works with AI code editors

Plug into any AI Coding Editor

Deploy GenAI Now: Step by step examples

FREE for everyone

MAX Self Managed

MAX Enterprise

Developer Approved