Preparing your content
<10ms
Embedding Inference + Retrieval
150K+
Package Installs
100%
Offline Indexing + Querying
Trusted by 1000+ teams










How It Works
01
Push your data source (docs, knowledge bases, or live data) via our SDK or portal.
02
We index, sync, and distribute a compact index wherever your agent runs: browser, edge, device, or cloud.
03
Your agent retrieves context locally, in under 10ms. No hops. No lag. No infra to manage.
Integration
Install Moss and start querying in just a few lines of code
from inferedge_moss import MossClient
client = MossClient(PROJECT_ID, PROJECT_KEY)
docs = [{"text": "How do I track my order?"}]
await client.add_docs("my-index", docs)Benchmarks
Benchmarked on 100k documents. Latency includes embedding inference and end-to-end cloud roundtrip. View benchmark script
Use Cases
If you're building voice AI, copilots, or multimodal apps where retrieval is on your critical path, Moss is built for you.
Sub-10ms context retrieval for real-time conversation. Your agent recalls, reasons, and responds without the pause.
FAQ
Everything you need to know about Moss and real-time semantic search for AI agents.