Installation¶

This guide provides instructions for installing and running tpu-inference.

There are three ways to install tpu-inference:

Install with pip
Run with Docker
Install from source

Install using pip¶

Create a working directory:
```
mkdir ~/work-dir
cd ~/work-dir
```

Set up a Python virtual environment:

python3.12 -m venv vllm_env --symlinks
source vllm_env/bin/activate

Use the following command to install vllm-tpu using pip
```
pip install vllm-tpu
```

Run with Docker¶

Include the --privileged, --net=host, and --shm-size=150gb options to enable TPU interaction and shared memory.

export DOCKER_URI=vllm/vllm-tpu:latest
sudo docker run -it --rm --name $USER-vllm --privileged --net=host \
    -v /dev/shm:/dev/shm \
    --shm-size 150gb \
    -p 8000:8000 \
    --entrypoint /bin/bash ${DOCKER_URI}

Install from source¶

For debugging or development purposes, you can install tpu-inference from source. tpu-inference is a plugin for vllm, so you need to install both from source.

Install system dependencies:

sudo apt-get update && sudo apt-get install -y libopenblas-base libopenmpi-dev libomp-dev

Clone the vllm and tpu-inference repositories:

git clone https://github.com/vllm-project/vllm.git
git clone https://github.com/vllm-project/tpu-inference.git

Set up a Python virtual environment:

python3.12 -m venv vllm_env --symlinks
source vllm_env/bin/activate

Install vllm from source, targeting the TPU device:

cd vllm
pip install -r requirements/tpu.txt
VLLM_TARGET_DEVICE="tpu" pip install -e .
cd ..

Install tpu-inference from source:

cd tpu-inference
pip install -e .
cd ..