Skip to content

Installation

This guide provides instructions for installing and running tpu-inference.

There are three ways to install tpu-inference:

  1. Install with pip
  2. Run with Docker
  3. Install from source

Install using pip

  1. Create a working directory:

    mkdir ~/work-dir
    cd ~/work-dir
    
  2. Set up a Python virtual environment:

    python3.12 -m venv vllm_env --symlinks
    source vllm_env/bin/activate
    
  3. Use the following command to install vllm-tpu using pip

    pip install vllm-tpu
    

Run with Docker

Include the --privileged, --net=host, and --shm-size=150gb options to enable TPU interaction and shared memory.

export DOCKER_URI=vllm/vllm-tpu:latest
sudo docker run -it --rm --name $USER-vllm --privileged --net=host \
    -v /dev/shm:/dev/shm \
    --shm-size 150gb \
    -p 8000:8000 \
    --entrypoint /bin/bash ${DOCKER_URI}

Install from source

For debugging or development purposes, you can install tpu-inference from source. tpu-inference is a plugin for vllm, so you need to install both from source.

  1. Install system dependencies:

    sudo apt-get update && sudo apt-get install -y libopenblas-base libopenmpi-dev libomp-dev
    
  2. Clone the vllm and tpu-inference repositories:

    git clone https://github.com/vllm-project/vllm.git
    git clone https://github.com/vllm-project/tpu-inference.git
    
  3. Set up a Python virtual environment:

    python3.12 -m venv vllm_env --symlinks
    source vllm_env/bin/activate
    
  4. Install vllm from source, targeting the TPU device:

    cd vllm
    pip install -r requirements/tpu.txt
    VLLM_TARGET_DEVICE="tpu" pip install -e .
    cd ..
    
  5. Install tpu-inference from source:

    cd tpu-inference
    pip install -e .
    cd ..