Installation¶
This guide provides instructions for installing and running tpu-inference.
There are three ways to install tpu-inference:
Install using pip¶
-
Create a working directory:
-
Set up a Python virtual environment:
-
Use the following command to install vllm-tpu using
pip
Run with Docker¶
Include the --privileged, --net=host, and --shm-size=150gb options to enable TPU interaction and shared memory.
export DOCKER_URI=vllm/vllm-tpu:latest
sudo docker run -it --rm --name $USER-vllm --privileged --net=host \
-v /dev/shm:/dev/shm \
--shm-size 150gb \
-p 8000:8000 \
--entrypoint /bin/bash ${DOCKER_URI}
Install from source¶
For debugging or development purposes, you can install tpu-inference from source. tpu-inference is a plugin for vllm, so you need to install both from source.
-
Install system dependencies:
-
Clone the
vllmandtpu-inferencerepositories: -
Set up a Python virtual environment:
-
Install
vllmfrom source, targeting the TPU device: -
Install
tpu-inferencefrom source: