GitHub - llmrb/llm.rb: Ruby toolkit for multiple Large Language Models (LLMs)

About

llm.rb is a runtime for building AI systems that integrate directly with your application. It is not just an API wrapper. It provides a unified execution model for providers, tools, MCP servers, streaming, schemas, files, and state.

It is built for engineers who want control over how these systems run. llm.rb stays close to Ruby, runs on the standard library by default, loads optional pieces only when needed, and remains easy to extend. It also works well in Rails or ActiveRecord applications, where a small wrapper around context persistence is enough to save and restore long-lived conversation state across requests, jobs, or retries.

Most LLM libraries stop at request/response APIs. Building real systems means stitching together streaming, tools, state, persistence, and external services by hand. llm.rb provides a single execution model for all of these, so they compose naturally instead of becoming separate subsystems.

Architecture

Core Concept

LLM::Context is the execution boundary in llm.rb.

It holds:

message history
tool state
schemas
streaming configuration
usage and cost tracking

Instead of switching abstractions for each feature, everything builds on the same context object.

Differentiators

Execution Model

A system layer, not just an API wrapper
Put providers, tools, MCP servers, and application APIs behind one runtime model instead of stitching them together by hand.
Contexts are central
Keep history, tools, schema, usage, persistence, and execution state in one place instead of spreading them across your app.
Contexts can be serialized
Save and restore live state for jobs, databases, retries, or long-running workflows.

Runtime Behavior

Streaming and tool execution work together
Start tool work while output is still streaming so you can hide latency instead of waiting for turns to finish.
Requests can be interrupted cleanly
Stop in-flight provider work through the same runtime instead of treating cancellation as a separate concern. LLM::Context#cancel! is inspired by Go's context cancellation model.
Concurrency is a first-class feature
Use threads, fibers, or async tasks without rewriting your tool layer.
Advanced workloads are built in, not bolted on
Streaming, concurrent tool execution, persistence, tracing, and MCP support all fit the same runtime model.

Integration

MCP is built in
Connect to MCP servers over stdio or HTTP without bolting on a separate integration stack.
Provider support is broad
Work with OpenAI, OpenAI-compatible endpoints, Anthropic, Google, DeepSeek, Z.ai, xAI, llama.cpp, and Ollama through the same runtime.
Tools are explicit
Run local tools, provider-native tools, and MCP tools through the same path with fewer special cases.
Providers are normalized, not flattened
Share one API surface across providers without losing access to provider- specific capabilities where they matter.
Responses keep a uniform shape
Provider calls return LLM::Response objects as a common base shape, then extend them with endpoint- or provider-specific behavior when needed.
Low-level access is still there
Normalized responses still keep the raw Net::HTTPResponse available when you need headers, status, or other HTTP details.
Local model metadata is included
Model capabilities, pricing, and limits are available locally without extra API calls.

Design Philosophy

Runs on the stdlib
Start with Ruby's standard library and add extra dependencies only when you need them.
It is highly pluggable
Add tools, swap providers, change JSON backends, plug in tracing, or layer internal APIs and MCP servers into the same execution path.
It scales from scripts to long-lived systems
The same primitives work for one-off scripts, background jobs, and more demanding application workloads with streaming, persistence, and tracing.
Thread boundaries are clear
Providers are shareable. Contexts are stateful and should stay thread-local.

Capabilities

Chat & Contexts — stateless and stateful interactions with persistence
Context Serialization — save and restore state across processes or time
Streaming — visible output, reasoning output, tool-call events
Request Interruption — stop in-flight provider work cleanly
Tool Calling — class-based tools and closure-based functions
Run Tools While Streaming — overlap model output with tool latency
Concurrent Execution — threads, async tasks, and fibers
Agents — reusable assistants with tool auto-execution
Structured Outputs — JSON Schema-based responses
Responses API — stateful response workflows where providers support them
MCP Support — stdio and HTTP MCP clients with prompt and tool support
Multimodal Inputs — text, images, audio, documents, URLs
Audio — speech generation, transcription, translation
Images — generation and editing
Files API — upload and reference files in prompts
Embeddings — vector generation for search and RAG
Vector Stores — retrieval workflows
Cost Tracking — local cost estimation without extra API calls
Observability — tracing, logging, telemetry
Model Registry — local metadata for capabilities, limits, pricing
Persistent HTTP — optional connection pooling for providers and MCP

Installation

gem install llm.rb

Example

require "llm"

llm = LLM.openai(key: ENV["KEY"])
ctx = LLM::Context.new(llm, stream: $stdout)

loop do
  print "> "
  ctx.talk(STDIN.gets || break)
  puts
end

Resources

deepdive is the examples guide.
_examples/relay shows a real application built on top of llm.rb.
doc site has the API docs.

License

BSD Zero Clause
See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 1,454 Commits
.bundle		.bundle
.github/workflows		.github/workflows
_examples		_examples
data		data
lib		lib
resources		resources
spec		spec
.editorconfig		.editorconfig
.env.sample		.env.sample
.gitignore		.gitignore
.gitmodules		.gitmodules
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.yardopts		.yardopts
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
llm-small.png		llm-small.png
llm.gemspec		llm.gemspec
llm.png		llm.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Architecture

Core Concept

Differentiators

Execution Model

Runtime Behavior

Integration

Design Philosophy

Capabilities

Installation

Example

Resources

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Architecture

Core Concept

Differentiators

Execution Model

Runtime Behavior

Integration

Design Philosophy

Capabilities

Installation

Example

Resources

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages