Cog Overview

Relevant source files

Purpose and Scope

This document provides a high-level introduction to Cog, an open-source tool for packaging machine learning models into production-ready Docker containers. It covers Cog's purpose, architecture, and core workflow from model definition to deployment.

For detailed information about specific subsystems:

System architecture components: see System Architecture
Core terminology and concepts: see Key Concepts
CLI command reference: see CLI Reference
Configuration details: see Configuration
Python SDK usage: see Python SDK
Build system internals: see Build System
Runtime system details: see Runtime System

What is Cog?

Cog is a tool that packages machine learning models into standard Docker containers with an HTTP API for running predictions. It was created by Andreas Jansson and Ben Firshman to solve the complexity of deploying ML models to production.

Core Value Proposition:

Simplified Docker: Define your environment with cog.yaml instead of writing complex Dockerfiles
Dependency Management: Automatically resolves compatible versions of CUDA, cuDNN, PyTorch, TensorFlow, and Python
Standardized Interface: Define inputs/outputs using Python type hints and the BasePredictor class
Production-Ready HTTP API: Automatically generates a RESTful API using a high-performance Rust server (coglet)
Portable: Deploy anywhere Docker runs—your own infrastructure or Replicate

Primary Use Case: Researchers and ML engineers can ship models to production without deep Docker or deployment expertise. Cog bridges the gap between model code and production infrastructure.

Sources: README.md1-117 docs/llms.txt1-18

System Architecture

Cog uses a three-layer architecture that separates concerns between build-time tooling, user-facing interfaces, and serving infrastructure.

Layer Responsibilities:

CLI Layer (Go): User-facing commands (cog build, cog predict, cog push) orchestrate the build and runtime operations. Entry point is pkg/cli/root.go:NewRootCommand().
Build System (Go): Transforms cog.yaml into a Docker image by parsing configuration (pkg/config), generating Dockerfiles (pkg/dockerfile), and handling image builds (pkg/image).
Python SDK Layer: Provides the user interface for defining models. Users subclass BasePredictor and implement setup() and predict() methods. The cog.server.http module acts as a shim to coglet.
Runtime Layer (Rust): The coglet binary serves HTTP requests at /predictions and other endpoints, manages prediction lifecycle, and handles metrics collection.

Sources: pkg/cli/root.go14-60 README.md19-53 docs/python.md42-63

Core Workflow

The typical Cog workflow progresses from model definition to containerized deployment:

Workflow Steps:

Define Environment: Create cog.yaml specifying Python version, packages, system dependencies, and GPU requirements
Define Model Interface: Create predict.py with a Predictor class that inherits from BasePredictor
Build Image: Run cog build which:
- Parses cog.yaml via pkg/config/Config
- Generates a Dockerfile via pkg/dockerfile/StandardGenerator
- Resolves cog and coglet wheel versions via pkg/wheels
- Builds the Docker image with docker build
Run Predictions:
- Locally: cog predict -i <inputs> or docker run the image
- Production: Push with cog push to a registry and deploy
Serve HTTP API: The image runs python -m cog.server.http which starts the coglet Rust server

Sources: README.md54-78 docs/getting-started.md44-172 pkg/cli/root.go45-56

Component Overview

CLI Layer (Go)

The command-line interface is implemented in Go and provides these commands:

Command	Module	Purpose
`cog build`	`pkg/cli/build.go`	Build Docker image from `cog.yaml`
`cog predict`	`pkg/cli/predict.go`	Run a prediction on a model
`cog push`	`pkg/cli/push.go`	Build and push image to registry
`cog run`	`pkg/cli/run.go`	Run arbitrary commands in Docker environment
`cog serve`	`pkg/cli/serve.go`	Start HTTP prediction server
`cog init`	`pkg/cli/init.go`	Generate starter `cog.yaml` and `predict.py`
`cog login`	`pkg/cli/login.go`	Authenticate with container registry
`cog train`	`pkg/cli/train.go`	Run model training (fine-tuning API)

The CLI uses the Cobra framework and is built into a single binary. Global state like version and debug flags are in pkg/global/global.go.

Sources: pkg/cli/root.go45-57 pkg/global/global.go1-30

Python SDK Layer

Users interact with Cog through the Python SDK, defining their model's prediction interface:

Core Classes:

BasePredictor: Base class for model implementation with setup() and predict() methods
Input(): Function to define input parameters with validation and documentation
Path: Type for file inputs/outputs (paths on disk)
File: Type for file-like objects
Secret: Type for sensitive string inputs
BaseModel: Base class for custom Output objects

Example Usage:

The SDK is installed as the cog Python package in the Docker container. It integrates with coglet through the cog.server.http module.

Sources: docs/python.md42-100 README.md34-53

Runtime Layer (Rust)

The coglet binary is a high-performance HTTP server written in Rust that:

Serves HTTP endpoints: /predictions, /health-check, /openapi.json
Manages prediction lifecycle (starting, processing, succeeded/failed states)
Handles streaming output via Python generators
Collects metrics from predictions
Supports webhooks for asynchronous predictions
Uses the Axum web framework

When a Docker image starts, it runs python -m cog.server.http, which launches coglet. The coglet server then imports the user's Predictor class and routes HTTP requests to the predict() method.

Sources: docs/http.md1-86 README.md15-16 docs/deploy.md1-85

External Systems

Cog integrates with:

Docker Daemon: For building and running containers
Container Registries: Push images to r8.im (Replicate), Docker Hub, or any OCI-compliant registry
PyPI: Downloads cog and coglet Python wheels during build
GitHub Actions: CI/CD workflows in .github/workflows/ for releases and testing

Sources: go.mod1-43 docs/deploy.md1-48

Configuration Files

cog.yaml

The cog.yaml file defines the Docker environment and prediction interface:

The configuration is parsed by pkg/config/Config and validated during build.

Sources: docs/yaml.md1-231 README.md21-31

predict.py

The predict.py file implements the model interface:

The predict field in cog.yaml must reference this class in the format "file.py:ClassName".

Sources: docs/python.md42-100 README.md34-53

Build Pipeline

Key Build Steps:

Parse Configuration: pkg/config/Config.Parse() reads cog.yaml and validates settings
Resolve Dependencies: Determines compatible CUDA/cuDNN/PyTorch/TensorFlow versions
Select Base Image: Chooses between pre-built cog-base images (fastest), NVIDIA CUDA images, or Python slim images
Generate Dockerfile: pkg/dockerfile/StandardGenerator creates a multi-stage Dockerfile
Resolve Wheels: pkg/wheels/WheelConfig determines where to get cog and coglet packages (PyPI, local dist/, or URL via environment variables)
Build Image: Runs docker build with BuildKit
Generate Schema: Extracts OpenAPI schema either statically (tree-sitter) or at runtime (starts container)
Add Metadata: Adds labels to image with version, config, and schema

The build artifacts are stored in .cog/ (defined as global.CogBuildArtifactsFolder).

Sources: pkg/cli/build.go pkg/dockerfile pkg/image pkg/config pkg/wheels

Runtime Execution

Execution Flow:

Container Startup: Docker runs the image, which executes python -m cog.server.http
Server Initialization: The coglet Rust binary starts and imports predict.py
Model Loading: User's Predictor.setup() runs once to load weights
Health Check: coglet exposes /health-check endpoint returning {"status": "READY"}
Prediction Request: Client sends POST to /predictions with inputs
Prediction Execution:
- coglet creates a prediction scope for metrics and lifecycle
- Calls Predictor.predict() with inputs
- Handles streaming output if predict() yields values
- Collects metrics via self.record_metric()
Response: Returns JSON with status, output, and metrics

Sources: docs/http.md1-220 docs/deploy.md14-85 docs/python.md249-303

Version Management

Cog uses lockstep versioning where all packages share the same version number:

Source of Truth: crates/Cargo.toml contains the canonical version (e.g., 0.17.1)
Packages Released:
- cog CLI binary (Go) - embedded version from build time
- cog Python SDK - published to PyPI
- coglet binary (Rust) - published to PyPI as wheels
- coglet on crates.io

Release Process:

Version is set in crates/Cargo.toml
Git tag v0.17.1 triggers .github/workflows/release-build.yaml
Artifacts are built: CLI binaries (goreleaser), Python wheels (uv, maturin)
Draft GitHub release is created
Manual publish triggers .github/workflows/release-publish.yaml
Packages are pushed to PyPI, crates.io, and Homebrew

The version can be overridden at build time using environment variables:

COG_SDK_WHEEL: Path/URL to custom cog wheel
COGLET_WHEEL: Path/URL to custom coglet wheel

Sources: pkg/global/global.go10-12 docs/yaml.md151-172

Key Technologies

Component	Language	Key Libraries	Purpose
CLI	Go 1.26	cobra, docker/docker, google/go-containerregistry	User commands, Docker orchestration
Python SDK	Python 3.10-3.13	pydantic	Model interface, input validation
Runtime Server	Rust	axum, tokio	High-performance HTTP serving
Build System	Go	moby/buildkit	Dockerfile generation, image building

Sources: go.mod1-43 README.md1-117

For detailed information on specific subsystems, see the related wiki pages linked at the beginning of this document.

Cog Overview

Relevant source files

Purpose and Scope

For detailed information about specific subsystems:

System architecture components: see System Architecture
Core terminology and concepts: see Key Concepts
CLI command reference: see CLI Reference
Configuration details: see Configuration
Python SDK usage: see Python SDK
Build system internals: see Build System
Runtime system details: see Runtime System

What is Cog?

Core Value Proposition:

Simplified Docker: Define your environment with cog.yaml instead of writing complex Dockerfiles
Dependency Management: Automatically resolves compatible versions of CUDA, cuDNN, PyTorch, TensorFlow, and Python
Standardized Interface: Define inputs/outputs using Python type hints and the BasePredictor class
Production-Ready HTTP API: Automatically generates a RESTful API using a high-performance Rust server (coglet)
Portable: Deploy anywhere Docker runs—your own infrastructure or Replicate

Primary Use Case: Researchers and ML engineers can ship models to production without deep Docker or deployment expertise. Cog bridges the gap between model code and production infrastructure.

Sources: README.md1-117 docs/llms.txt1-18

System Architecture

Cog uses a three-layer architecture that separates concerns between build-time tooling, user-facing interfaces, and serving infrastructure.

Layer Responsibilities:

CLI Layer (Go): User-facing commands (cog build, cog predict, cog push) orchestrate the build and runtime operations. Entry point is pkg/cli/root.go:NewRootCommand().
Build System (Go): Transforms cog.yaml into a Docker image by parsing configuration (pkg/config), generating Dockerfiles (pkg/dockerfile), and handling image builds (pkg/image).
Python SDK Layer: Provides the user interface for defining models. Users subclass BasePredictor and implement setup() and predict() methods. The cog.server.http module acts as a shim to coglet.
Runtime Layer (Rust): The coglet binary serves HTTP requests at /predictions and other endpoints, manages prediction lifecycle, and handles metrics collection.

Sources: pkg/cli/root.go14-60 README.md19-53 docs/python.md42-63

Core Workflow

The typical Cog workflow progresses from model definition to containerized deployment:

Workflow Steps:

Define Environment: Create cog.yaml specifying Python version, packages, system dependencies, and GPU requirements
Define Model Interface: Create predict.py with a Predictor class that inherits from BasePredictor
Build Image: Run cog build which:
- Parses cog.yaml via pkg/config/Config
- Generates a Dockerfile via pkg/dockerfile/StandardGenerator
- Resolves cog and coglet wheel versions via pkg/wheels
- Builds the Docker image with docker build
Run Predictions:
- Locally: cog predict -i <inputs> or docker run the image
- Production: Push with cog push to a registry and deploy
Serve HTTP API: The image runs python -m cog.server.http which starts the coglet Rust server

Sources: README.md54-78 docs/getting-started.md44-172 pkg/cli/root.go45-56

Component Overview

CLI Layer (Go)

The command-line interface is implemented in Go and provides these commands:

Command	Module	Purpose
`cog build`	`pkg/cli/build.go`	Build Docker image from `cog.yaml`
`cog predict`	`pkg/cli/predict.go`	Run a prediction on a model
`cog push`	`pkg/cli/push.go`	Build and push image to registry
`cog run`	`pkg/cli/run.go`	Run arbitrary commands in Docker environment
`cog serve`	`pkg/cli/serve.go`	Start HTTP prediction server
`cog init`	`pkg/cli/init.go`	Generate starter `cog.yaml` and `predict.py`
`cog login`	`pkg/cli/login.go`	Authenticate with container registry
`cog train`	`pkg/cli/train.go`	Run model training (fine-tuning API)

The CLI uses the Cobra framework and is built into a single binary. Global state like version and debug flags are in pkg/global/global.go.

Sources: pkg/cli/root.go45-57 pkg/global/global.go1-30

Python SDK Layer

Users interact with Cog through the Python SDK, defining their model's prediction interface:

Core Classes:

BasePredictor: Base class for model implementation with setup() and predict() methods
Input(): Function to define input parameters with validation and documentation
Path: Type for file inputs/outputs (paths on disk)
File: Type for file-like objects
Secret: Type for sensitive string inputs
BaseModel: Base class for custom Output objects

Example Usage:

The SDK is installed as the cog Python package in the Docker container. It integrates with coglet through the cog.server.http module.

Sources: docs/python.md42-100 README.md34-53

Runtime Layer (Rust)

The coglet binary is a high-performance HTTP server written in Rust that:

Serves HTTP endpoints: /predictions, /health-check, /openapi.json
Manages prediction lifecycle (starting, processing, succeeded/failed states)
Handles streaming output via Python generators
Collects metrics from predictions
Supports webhooks for asynchronous predictions
Uses the Axum web framework

Sources: docs/http.md1-86 README.md15-16 docs/deploy.md1-85

External Systems

Cog integrates with:

Docker Daemon: For building and running containers
Container Registries: Push images to r8.im (Replicate), Docker Hub, or any OCI-compliant registry
PyPI: Downloads cog and coglet Python wheels during build
GitHub Actions: CI/CD workflows in .github/workflows/ for releases and testing

Sources: go.mod1-43 docs/deploy.md1-48

Configuration Files

cog.yaml

The cog.yaml file defines the Docker environment and prediction interface:

The configuration is parsed by pkg/config/Config and validated during build.

Sources: docs/yaml.md1-231 README.md21-31

predict.py

The predict.py file implements the model interface:

The predict field in cog.yaml must reference this class in the format "file.py:ClassName".

Sources: docs/python.md42-100 README.md34-53

Build Pipeline

Key Build Steps:

Parse Configuration: pkg/config/Config.Parse() reads cog.yaml and validates settings
Resolve Dependencies: Determines compatible CUDA/cuDNN/PyTorch/TensorFlow versions
Select Base Image: Chooses between pre-built cog-base images (fastest), NVIDIA CUDA images, or Python slim images
Generate Dockerfile: pkg/dockerfile/StandardGenerator creates a multi-stage Dockerfile
Resolve Wheels: pkg/wheels/WheelConfig determines where to get cog and coglet packages (PyPI, local dist/, or URL via environment variables)
Build Image: Runs docker build with BuildKit
Generate Schema: Extracts OpenAPI schema either statically (tree-sitter) or at runtime (starts container)
Add Metadata: Adds labels to image with version, config, and schema

The build artifacts are stored in .cog/ (defined as global.CogBuildArtifactsFolder).

Sources: pkg/cli/build.go pkg/dockerfile pkg/image pkg/config pkg/wheels

Runtime Execution

Execution Flow:

Container Startup: Docker runs the image, which executes python -m cog.server.http
Server Initialization: The coglet Rust binary starts and imports predict.py
Model Loading: User's Predictor.setup() runs once to load weights
Health Check: coglet exposes /health-check endpoint returning {"status": "READY"}
Prediction Request: Client sends POST to /predictions with inputs
Prediction Execution:
- coglet creates a prediction scope for metrics and lifecycle
- Calls Predictor.predict() with inputs
- Handles streaming output if predict() yields values
- Collects metrics via self.record_metric()
Response: Returns JSON with status, output, and metrics

Sources: docs/http.md1-220 docs/deploy.md14-85 docs/python.md249-303

Version Management

Cog uses lockstep versioning where all packages share the same version number:

Source of Truth: crates/Cargo.toml contains the canonical version (e.g., 0.17.1)
Packages Released:
- cog CLI binary (Go) - embedded version from build time
- cog Python SDK - published to PyPI
- coglet binary (Rust) - published to PyPI as wheels
- coglet on crates.io

Release Process:

Version is set in crates/Cargo.toml
Git tag v0.17.1 triggers .github/workflows/release-build.yaml
Artifacts are built: CLI binaries (goreleaser), Python wheels (uv, maturin)
Draft GitHub release is created
Manual publish triggers .github/workflows/release-publish.yaml
Packages are pushed to PyPI, crates.io, and Homebrew

The version can be overridden at build time using environment variables:

COG_SDK_WHEEL: Path/URL to custom cog wheel
COGLET_WHEEL: Path/URL to custom coglet wheel

Sources: pkg/global/global.go10-12 docs/yaml.md151-172

Key Technologies

Component	Language	Key Libraries	Purpose
CLI	Go 1.26	cobra, docker/docker, google/go-containerregistry	User commands, Docker orchestration
Python SDK	Python 3.10-3.13	pydantic	Model interface, input validation
Runtime Server	Rust	axum, tokio	High-performance HTTP serving
Build System	Go	moby/buildkit	Dockerfile generation, image building

Sources: go.mod1-43 README.md1-117

For detailed information on specific subsystems, see the related wiki pages linked at the beginning of this document.

Cog Overview

Purpose and Scope

What is Cog?

System Architecture

Core Workflow

Component Overview

CLI Layer (Go)

Python SDK Layer

Runtime Layer (Rust)

External Systems

Configuration Files

cog.yaml

predict.py

Build Pipeline

Runtime Execution

Version Management

Key Technologies

On this page

Cog Overview

Purpose and Scope

What is Cog?

System Architecture

Core Workflow

Component Overview

CLI Layer (Go)

Python SDK Layer

Runtime Layer (Rust)

External Systems

Configuration Files

cog.yaml

predict.py

Build Pipeline

Runtime Execution

Version Management

Key Technologies

On this page