Build once and run across tasks.

Run AI in One Call

Chat, Image, Speech, Multimodal all with a single API.

Try Playground

View Documentation

javascript

python

curl

const response = await fetch('https://relay.opengpu.network/v2/ollama/api/chat', {
    method: 'POST',
    headers: {
      'X-API-Key': process.env.RELAY_API_KEY,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-oss:20b',
      messages: [{
        role: 'user',
        content: 'Break down the pros and cons of decentralized GPU compute.'
      }],
      stream: false,
      think: 'low',
    }),
  })

  const data = await response.json()
  console.info(data.message.content)

Relay runs on two compute paths.

Dual Engine Infrastructure

Switch your mode anytime, or let Relay decide nothing to master or maintain.

Direct Mode

Low latency runs on trusted hosts, built for speed and reliability.

OpenGPU Mode

Decentralized network where GPU providers and clients set prices and availability in real time.

See Routing in Action

Supported models

Available Models

Video Models

Generate high-quality videos from text prompts. From cinematic scenes to creative animations, powered by the latest video generation models.

ModelType

Sora 2text-to-video

Sora 2 Protext-to-video

Kling 2.6text-to-video

Wan 2.5text-to-video

Wan 2.6text-to-video

Image Models

Create stunning images from text descriptions. High-resolution outputs with fine-grained control over style, composition, and detail.

ModelType

FLUX 2 Protext-to-image

GPT Image 1.5text-to-image

Gemini 3 Protext-to-image

Qwen Imagetext-to-image

Text Models

Large language models for text generation, summarization, code, and more. Fast inference with high throughput.

ModelType

GPT-OSS 120Btext-to-text

Run your own model or source

AI Service, Your Terms

Publish tasks directly onto the OpenGPU network, or host through our managed layer. Relay adapts to how you build from experimentation to production scale.

Direct mode

Orchestrated, low latency inference.

OpenGPU mode

True decentralized compute.

Dashboard view

Just monitor. No clutter.

Launch Your AI Compute Into a New Orbit

Partner with Relay to route workloads smarter, balance cost and latency, and keep your business light-years ahead of the competition.

Try Playground

View Documentation