Moss is a real-time semantic search runtime for AI agents, voice agents, copilots, and multimodal apps. It delivers sub-10ms lookups with zero infrastructure. Built in Rust and WebAssembly, Moss runs search inside your agent runtime — browser, edge, on-device, or cloud — so retrieval is local, fast, and private.

How fast is Moss compared to traditional vector databases?

Moss delivers sub-10ms lookups by running search locally inside your agent runtime. Traditional vector databases require network round-trips at 100–500ms per query. With Moss, there are no hops, no lag, and no infrastructure to manage — your agent retrieves context in under 10 milliseconds.

What programming languages does Moss support?

Moss provides official SDKs for JavaScript/TypeScript (npm install @inferedge/moss) and Python (pip install inferedge-moss). Both SDKs support the full feature set including index creation, document management, semantic search, hybrid search, and auto-refresh.

Does Moss work offline and on-device?

Yes. Moss is built with a local-first architecture. Once an index is loaded, all queries run entirely in-memory on the device — no network required. It works in browsers via WebAssembly, on mobile devices, desktop apps, and edge servers. Data syncs automatically when connectivity is restored.

How does Moss handle data privacy?

Moss is privacy-by-architecture. Data stays on-device by default with no round-trips to cloud databases. There is no centralized storage of user queries or data. This makes Moss compliance-friendly for regulated industries including healthcare (HIPAA) and enterprise (SOC2).

What is the pricing for Moss?

Moss offers a free Developer tier with $5/month in free credits and unlimited local queries. Paid plans include Hobbyist at $30/month with continuous sync and unlimited projects, Start-up at $200/month with cloud search and priority support, and Enterprise with custom pricing, 99.9% SLA, SSO, and SOC2/HIPAA compliance.

How do I get started with Moss?

Getting started takes minutes: install the SDK (npm install @inferedge/moss or pip install inferedge-moss), sign up at the Moss Portal to get your project credentials, then create an index and start querying in just a few lines of code. No infrastructure setup required.

What types of search does Moss support?

Moss supports three search modes: pure semantic search (vector similarity), pure keyword search (BM25), and hybrid search that combines both. The alpha parameter controls the blend — 1.0 for pure semantic, 0.0 for pure keyword, and values in between for hybrid. This lets you tune retrieval quality for your specific use case.

Backed by

Semantic Search Infrastructure

Real-time retrieval for Conversational AI.

Zero infrastructure. Browser. Edge. Device. Cloud.

Moss AgentLive

Try asking

"What is Moss?"

<10ms

Embedding Inference + Retrieval

150K+

Package Installs

100%

Offline Indexing + Querying

Trusted by 1000+ teams

How It Works

Three steps to production

Connect Once

Push your data source (docs, knowledge bases, or live data) via our SDK or portal.

Moss Handles the Rest

We index, sync, and distribute a compact index wherever your agent runs: browser, edge, device, or cloud.

Search Every Critical Path

Your agent retrieves context locally, in under 10ms. No hops. No lag. No infra to manage.

Integration

Get Started in Minutes

Install Moss and start querying in just a few lines of code

from inferedge_moss import MossClient

client = MossClient(PROJECT_ID, PROJECT_KEY)

docs = [{"text": "How do I track my order?"}]

await client.add_docs("my-index", docs)

Benchmarks

100x faster than the alternatives

Query latency measured in milliseconds.Lower is better.

P50

P99

Moss

ChromaDB

Pinecone

Qdrant

3.1

5.4

351.8

538.5

432.6

934.2

597.8

1120.2

02505007501000

Benchmarked on 100k documents. Latency includes embedding inference and end-to-end cloud roundtrip. View benchmark script

Use Cases

What teams build with Moss

If you're building voice AI, copilots, or multimodal apps where retrieval is on your critical path, Moss is built for you.

Voice Agents and Copilots

Sub-10ms context retrieval for real-time conversation. Your agent recalls, reasons, and responds without the pause.

<10msp99 retrieval latency

FAQ

Frequently asked questions

Everything you need to know about Moss and real-time semantic search for AI agents.