-
-
Notifications
You must be signed in to change notification settings - Fork 196
2.2.17 Backend Modular MAX
Handle:
modularmax
URL: http://localhost:34471
MAX is a platform from Modular (creators of Mojo) for running LLMs.
Warning
Right now, MAX container only works with Nvidia GPUs or CPUs
# [Optional] pre-pull the image
harbor pull modularmax
# Start the service
# ⚠️ This will download configured model
harbor up modularmaxSee troubleshooting guide if you encounter any issues.
- Even when model is already downloaded it may take some time to compile it during the startup (~30s for 3B model)
- GPU-enabled inference with MAX typically running BF16 weights (unquantized models) - will require a lot of VRAM (see estimates in the model registry)
- Harbor connects global HuggingFace cache to the container
- Harbor shares your (configured) HuggingFace token with the container to download gated/private models from HuggingFace
-
modularmaxwill be connected towebuiwhen running together
Best way to find a supported model is to look it up on the Modular model registry. Model IDs are usually handles from HuggingFace.
MAX will automatically download configured model to the HF cache when it is first used. While downloading - the start might take a while. Use harbor logs to monitor the progress.
# -n 1000 is to see the last 1000 lines
# of logs out of the box
harbor logs modularmax -n 1000Configure the model:
# Set the model
harbor modularmax model <model_id>
# See currently set model
harbor modularmax modelAdditionally, you can configure any arguments that are supported by the max-pipelines CLI.
# See possible arguments
harbor run modularmax --help
# Set the arguments
harbor modularmax args "--enable-structured-output --trust-remote-code"
# See currently set arguments
harbor modularmax argsTip
Use harbor profiles to quickly switch between configuration presets
See official Configuration options for reference.
Following options can be set via harbor config:
# The port on the host where Modular MAX endpoint will be available
HARBOR_MODULARMAX_HOST_PORT 34471
# The image to use for Modular MAX
# You can switch to max-nvidia-base if needed
HARBOR_MODULARMAX_IMAGE docker.modular.com/modular/max-nvidia-full
# Specific Docker tag to target
HARBOR_MODULARMAX_VERSION latest
# The model that'll be passed to "--model-path" argument
# Same as in `harbor modularmax model`
HARBOR_MODULARMAX_MODEL Qwen/Qwen3.5-4B
# The arguments to pass to the max-pipelines binary
# Same as in `harbor modularmax args`
HARBOR_MODULARMAX_EXTRA_ARGSSee environment configuration guide to set arbitrary environment variables for the service.