The SQLite for AI memory.
One file. Full RAG. Zero infrastructure.
Quick Start โข Performance โข How It Works โข Install
import Wax
// Create a memory file
let brain = try await MemoryOrchestrator(
at: URL(fileURLWithPath: "brain.mv2s")
)
// Remember something
try await brain.remember(
"User prefers dark mode and gets headaches from bright screens",
metadata: ["source": "onboarding"]
)
// Recall with RAG
let context = try await brain.recall(query: "user preferences")
// โ "User prefers dark mode and gets headaches from bright screens"
// + relevant context, ranked and token-budgetedThat's it. No Docker. No vector DB. No network calls.
You wanted to add memory to your AI app.
3 hours later you're still configuring Docker Compose for a vector database that crashes if you look at it wrong, sends your data to who-knows-where, and needs a DevOps team to keep running.
Wax replaces your entire RAG stack with a file format.
Traditional RAG Stack: Wax:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Your App โ โ Your App โ
โโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโค
โ ChromaDB โ โ โ
โ PostgreSQL โ vs. โ brain. โ
โ Redis โ โ mv2s โ
โ Elasticsearchโ โ โ
โ Docker โ โ โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
~5 services 1 file
| โก Fast | 0.84ms vector search @ 10K docs (Metal GPU) |
| ๐ก๏ธ Durable | Kill -9 safe, power-loss safe, tested |
| ๐ฏ Deterministic | Same query = same context, every time |
| ๐ฆ Portable | One .mv2s file โ move it, backup it, ship it |
| ๐ Private | 100% on-device. Zero network calls. |
Apple Silicon (M1 Pro)
Vector Search Latency (10K ร 384-dim)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Wax Metal (warm) โโโโโโโโโโโโโโโโโโโโ 0.84ms
Wax Metal (cold) โโโโโโโโโโโโโโโโโโโโ 9.2ms
Wax CPU โโโโโโโโโโโโโโโโโโโโ 105ms
SQLite FTS5 โโโโโโโโโโโโโโโโโโโโ 150ms
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cold Open โ First Query: 17ms
Hybrid Search @ 10K docs: 105ms
These are reproducible XCTest benchmark baselines captured from the current Wax benchmark harness.
| Workload | Time | Throughput |
|---|---|---|
| smoke (200 docs) | 0.103s |
~1941.7 docs/s |
| standard (1000 docs) | 0.309s |
~3236.2 docs/s |
| stress (5000 docs) | 2.864s |
~1745.8 docs/s |
| 10k | 7.756s |
~1289.3 docs/s |
| Workload | Time | Throughput |
|---|---|---|
| warm CPU smoke | 0.0015s |
~666.7 ops/s |
| warm CPU standard | 0.0033s |
~303.0 ops/s |
| warm CPU stress | 0.0072s |
~138.9 ops/s |
| 10k CPU hybrid iteration | 0.103s |
~9.7 ops/s |
| Workload | Time |
|---|---|
| smoke | 0.103s |
| standard | 0.101s |
Stress recall is currently harness-blocked (signal 11) and treated as a known benchmark issue.
| Mode | Time |
|---|---|
| fast mode | 0.102s |
| dense cached | 0.102s |
For benchmark commands, profiling traces, and methodology, see:
/Users/chriskarani/CodingProjects/Wax/Tasks/hot-path-specialization-investigation.md
No, that's not a typo. GPU vector search really is sub-millisecond.
Wax now includes a WAL/storage health track focused on commit latency tails, long-run file growth, and recovery behavior:
- No-op index compaction guards to avoid unnecessary index rewrites.
- Single-pass WAL replay with guarded replay snapshot fast path.
- Proactive WAL-pressure commits for targeted workloads (guarded rollout).
- Scheduled
rewriteLiveSetmaintenance with dead-payload thresholds, validation, and rollback.
- Repeated unchanged index compaction growth improved from
+61,768,464bytes over 8 runs (~7.72MB/run) to bounded drift (test-gated). - Commit latency improved in most matrix workloads in recent runs (examples:
medium_hybridp95-13.9%,large_text_10kp95-8.0%,sustained_write_textp95-5.7%). - Reopen/recovery p95 is generally flat-to-improved across the matrix.
sustained_write_hybridremains workload-sensitive, so proactive/scheduled maintenance stays guarded by default.
- Proactive pressure commits are tuned for targeted workloads and validated with percentile guardrails.
- Replay snapshot open-path optimization is additive and guarded.
- Scheduled live-set rewrite is configurable and runs deferred from the
flush()hot path. - Rewrite candidates are automatically validated and rolled back on verification failure.
import Wax
var config = OrchestratorConfig.default
config.liveSetRewriteSchedule = LiveSetRewriteSchedule(
enabled: true,
checkEveryFlushes: 32,
minDeadPayloadBytes: 64 * 1024 * 1024,
minDeadPayloadFraction: 0.25,
minimumCompactionGainBytes: 0,
minimumIdleMs: 15_000,
minIntervalMs: 5 * 60_000,
verifyDeep: false
)WAX_BENCHMARK_WAL_COMPACTION=1 \
WAX_BENCHMARK_WAL_OUTPUT=/tmp/wal-matrix.json \
swift test --filter WALCompactionBenchmarks.testWALCompactionWorkloadMatrixWAX_BENCHMARK_WAL_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testProactivePressureCommitGuardrailsWAX_BENCHMARK_WAL_REOPEN_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testReplayStateSnapshotGuardrailsSee /Users/chriskarani/CodingProjects/Wax/Tasks/wal-compaction-investigation.md and /Users/chriskarani/CodingProjects/Wax/Tasks/wal-compaction-baseline.json for methodology and full baseline artifacts.
.package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.6")๐ Text Memory โ Documents, notes, conversations
import Wax
let orchestrator = try await MemoryOrchestrator(at: storeURL)
// Ingest
try await orchestrator.remember(documentText, metadata: ["source": "report.pdf"])
// Recall
let context = try await orchestrator.recall(query: "key findings")
for item in context.items {
print("[\(item.kind)] \(item.text)")
}๐ธ Photo Memory โ Photo library with OCR + CLIP embeddings
import Wax
let photoRAG = try await PhotoRAGOrchestrator(
storeURL: storeURL,
config: .default,
embedder: MyCLIPEmbedder() // Your CoreML model
)
// Index local photos (offline-only)
try await photoRAG.syncLibrary(scope: .fullLibrary)
// Search
let ctx = try await photoRAG.recall(.init(text: "Costco receipt"))๐ฌ Video Memory โ Video segments with transcripts
import Wax
let videoRAG = try await VideoRAGOrchestrator(
storeURL: storeURL,
config: .default,
embedder: MyEmbedder(),
transcriptProvider: MyTranscriber()
)
// Ingest
try await videoRAG.ingest(files: [videoFile])
// Search by content or transcript
let ctx = try await videoRAG.recall(.init(text: "project timeline discussion"))Wax packs everything into a single .mv2s file:
- โ Your raw documents
- โ Embeddings (any dimension, any provider)
- โ BM25 full-text search index (FTS5)
- โ HNSW vector index (USearch)
- โ Write-ahead log for crash recovery
- โ Metadata & entity graph
The file format is:
- Append-only โ Fast writes, no fragmentation
- Checksum-verified โ Every byte validated
- Dual-header โ Atomic updates, never corrupt
- Self-contained โ No external dependencies
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Header Page A (4KB) โ
โ Header Page B (4KB) โ atomic switch โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ WAL Ring Buffer โ
โ (crash recovery log) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Document Payloads (compressed) โ
โ Embeddings โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ TOC (Table of Contents) โ
โ Footer + Checksum โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Feature | Wax | Chroma | Core Data + FAISS | Pinecone |
|---|---|---|---|---|
| Single file | โ | โ | โ | โ |
| Works offline | โ | โ | โ | |
| Crash-safe | โ | โ | N/A | |
| GPU vector search | โ | โ | โ | โ |
| No server required | โ | โ | โ | โ |
| Swift-native | โ | โ | โ | โ |
| Deterministic RAG | โ | โ | โ | โ |
๐ง Query-Adaptive Hybrid Search
Wax doesn't just do vector search. It runs multiple lanes in parallel (BM25, vector, temporal, structured evidence) and fuses results based on query type.
"When was my last dentist appointment?" โ boosts temporal + structured
"Explain quantum computing" โ boosts vector + BM25
๐ญ Tiered Memory Compression (Surrogates)
Not all context is equal. Wax generates hierarchical summaries:
fullโ Complete document (for deep dives)gistโ Key paragraphs (for balanced recall)microโ One-liner (for quick context)
At query time, it picks the right tier based on query signals and remaining token budget.
๐ฏ Deterministic Token Budgeting
Strict cl100k_base token counting. No "oops, context window exceeded." No non-deterministic truncation. Reproducible RAG you can test and benchmark.
- ๐ค AI assistants that remember users across launches
- ๐ฑ Offline-first apps with serious search requirements
- ๐ Privacy-critical products where data never leaves the device
- ๐งช Research tooling that needs reproducible retrieval
- ๐ฎ Agent workflows that require durable state
- Swift 6.2
- iOS 26 / macOS 26
- Apple Silicon (for Metal GPU features)
git clone https://github.com/christopherkarani/Wax.git
cd Wax
swift testMiniLM CoreML tests are opt-in:
WAX_TEST_MINILM=1 swift testโญ Star Wax on GitHub โข ๐ Read the Docs โข ๐ Report Issues
Built with ๐ฏ by Christopher Karani
