Skip to content

microsoft/onnxruntime-genai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ONNX Runtime GenAI

Status

Latest version

Nightly Build

Description

Run generative AI models with ONNX Runtime. This API gives you an easy, flexible and performant way of running LLMs on device. It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, KV cache management, and grammar specification for tool calling.

ONNX Runtime GenAI powers Foundry Local, Windows ML, and the Visual Studio Code AI Toolkit.

See documentation at the ONNX Runtime website for more details.

Support matrix Supported now Under development On the roadmap
Model architectures ChatGLM
DeepSeek
Ernie
Fara
Gemma
GPTOSS
Granite
Llama
Mistral
Nemotron
OLMo
Phi
Phi3V
Phi4MM
Qwen
Qwen-2.5VL
SmolLM3
Whisper
Stable diffusion
API Python
C#
C/C++
Java ^
Objective-C
O/S Linux
Windows
Mac
Android
iOS
Architecture x86
x64
arm64
Hardware Acceleration CPU
CUDA
DirectML
NvTensorRtRtx (TRT-RTX)
OpenVINO
QNN
WebGPU
AMD GPU
Features Multi-LoRA
Continuous decoding
Constrained decoding
Speculative decoding

^ Requires build from source

Installation

See installation instructions or build from source

Sample code for Phi-3 in Python

  1. Download the model

    huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
  2. Install the API

    pip install numpy
    pip install --pre onnxruntime-genai
  3. Run the model

    import onnxruntime_genai as og
    
    model = og.Model('cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4')
    tokenizer = og.Tokenizer(model)
    tokenizer_stream = tokenizer.create_stream()
     
    # Set the max length to something sensible by default,
    # since otherwise it will be set to the entire context length
    search_options = {}
    search_options['max_length'] = 2048
    search_options['batch_size'] = 1
    
    chat_template = '<|user|>\n{input} <|end|>\n<|assistant|>'
    
    text = input("Input: ")
    if not text:
       print("Error, input cannot be empty")
       exit()
    
    prompt = f'{chat_template.format(input=text)}'
    
    input_tokens = tokenizer.encode(prompt)
    
    params = og.GeneratorParams(model)
    params.set_search_options(**search_options)
    generator = og.Generator(model, params)
    
    print("Output: ", end='', flush=True)
    
    try:
       generator.append_tokens(input_tokens)
       while True:
          generator.generate_next_token()
          if generator.is_done():
             break
          new_token = generator.get_next_tokens()[0]
          print(tokenizer_stream.decode(new_token), end='', flush=True)
    except KeyboardInterrupt:
          print("  --control+c pressed, aborting generation--")
    
    print()
    del generator

Choose the correct version of the examples

Due to the evolving nature of this project and ongoing feature additions, examples in the main branch may not always align with the latest stable release. This section outlines how to ensure compatibility between the examples and the corresponding version.

Stable version

Install the package according to the installation instructions. For example, install the Python package.

pip install onnxruntime-genai

Get the version of the package

Linux/Mac:

pip list | grep onnxruntime-genai

Windows:

pip list | findstr "onnxruntime-genai"

Checkout the version of the examples that correspond to that release.

# Clone the repo
git clone https://github.com/microsoft/onnxruntime-genai.git && cd onnxruntime-genai
# Checkout the branch for the version you are using
git checkout v0.11.4
cd examples

Nightly version (main branch)

Checkout the main branch of the repo

git clone https://github.com/microsoft/onnxruntime-genai.git && cd onnxruntime-genai

Build from source, using these instructions. For example, to build the Python wheel:

python build.py

Navigate to the examples folder in the main branch.

cd examples

Breaking API changes

v0.11.0

Between v0.11.0 and v0.10.1, there is a breaking API usage change to improve model quality during multi-turn conversations.

Previously, the decoding loop could be written as follows.

while not IsDone():
    GenerateToken()
    GetLastToken()
    PrintLastToken()

In 0.11.0, the decoding loop should now be written as follows.

while True:
    GenerateToken()
    if IsDone():
        break
    GetLastToken()
    PrintLastToken()

Roadmap

See the Discussions to request new features and up-vote existing requests.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Linting

This project enables lintrunner for linting. You can install the dependencies and initialize with

pip install -r requirements-lintrunner.txt
lintrunner init

This will install lintrunner on your system and download all the necessary dependencies to run linters locally.

To format local changes:

lintrunner -a

To format all files:

lintrunner -a --all-files

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.