Skip to content

๐Ÿฏ Memory layer for on-device AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer.

License

Notifications You must be signed in to change notification settings

christopherkarani/Wax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

116 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Wax Logo

Wax

The SQLite for AI memory.
One file. Full RAG. Zero infrastructure.

Quick Start โ€ข Performance โ€ข How It Works โ€ข Install

Swift 6.2 Platforms License


30-Second Demo

import Wax

// Create a memory file
let brain = try await MemoryOrchestrator(
    at: URL(fileURLWithPath: "brain.mv2s")
)

// Remember something
try await brain.remember(
    "User prefers dark mode and gets headaches from bright screens",
    metadata: ["source": "onboarding"]
)

// Recall with RAG
let context = try await brain.recall(query: "user preferences")
// โ†’ "User prefers dark mode and gets headaches from bright screens"
//   + relevant context, ranked and token-budgeted

That's it. No Docker. No vector DB. No network calls.


The Problem

You wanted to add memory to your AI app.

3 hours later you're still configuring Docker Compose for a vector database that crashes if you look at it wrong, sends your data to who-knows-where, and needs a DevOps team to keep running.

Wax replaces your entire RAG stack with a file format.

Traditional RAG Stack:                     Wax:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your App   โ”‚                           โ”‚  Your App   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ChromaDB   โ”‚                           โ”‚             โ”‚
โ”‚  PostgreSQL โ”‚        vs.                โ”‚   brain.    โ”‚
โ”‚  Redis      โ”‚                           โ”‚    mv2s     โ”‚
โ”‚  Elasticsearchโ”‚                         โ”‚             โ”‚
โ”‚  Docker     โ”‚                           โ”‚             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     ~5 services                              1 file

Why Wax?

โšก Fast 0.84ms vector search @ 10K docs (Metal GPU)
๐Ÿ›ก๏ธ Durable Kill -9 safe, power-loss safe, tested
๐ŸŽฏ Deterministic Same query = same context, every time
๐Ÿ“ฆ Portable One .mv2s file โ€” move it, backup it, ship it
๐Ÿ”’ Private 100% on-device. Zero network calls.

Performance

Apple Silicon (M1 Pro)

Vector Search Latency (10K ร— 384-dim)
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
Wax Metal (warm)     โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  0.84ms
Wax Metal (cold)     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘  9.2ms
Wax CPU              โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  105ms
SQLite FTS5          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘  150ms
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

Cold Open โ†’ First Query: 17ms
Hybrid Search @ 10K docs: 105ms

Core Benchmark Baselines (as of February 17, 2026)

These are reproducible XCTest benchmark baselines captured from the current Wax benchmark harness.

Ingest throughput (testIngestHybridBatchedPerformance)

Workload Time Throughput
smoke (200 docs) 0.103s ~1941.7 docs/s
standard (1000 docs) 0.309s ~3236.2 docs/s
stress (5000 docs) 2.864s ~1745.8 docs/s
10k 7.756s ~1289.3 docs/s

Search latency

Workload Time Throughput
warm CPU smoke 0.0015s ~666.7 ops/s
warm CPU standard 0.0033s ~303.0 ops/s
warm CPU stress 0.0072s ~138.9 ops/s
10k CPU hybrid iteration 0.103s ~9.7 ops/s

Recall latency (testMemoryOrchestratorRecallPerformance)

Workload Time
smoke 0.103s
standard 0.101s

Stress recall is currently harness-blocked (signal 11) and treated as a known benchmark issue.

FastRAG builder

Mode Time
fast mode 0.102s
dense cached 0.102s

For benchmark commands, profiling traces, and methodology, see:

  • /Users/chriskarani/CodingProjects/Wax/Tasks/hot-path-specialization-investigation.md

No, that's not a typo. GPU vector search really is sub-millisecond.


WAL Compaction and Storage Health (2026-02)

Wax now includes a WAL/storage health track focused on commit latency tails, long-run file growth, and recovery behavior:

  • No-op index compaction guards to avoid unnecessary index rewrites.
  • Single-pass WAL replay with guarded replay snapshot fast path.
  • Proactive WAL-pressure commits for targeted workloads (guarded rollout).
  • Scheduled rewriteLiveSet maintenance with dead-payload thresholds, validation, and rollback.

Measured outcomes

  • Repeated unchanged index compaction growth improved from +61,768,464 bytes over 8 runs (~7.72MB/run) to bounded drift (test-gated).
  • Commit latency improved in most matrix workloads in recent runs (examples: medium_hybrid p95 -13.9%, large_text_10k p95 -8.0%, sustained_write_text p95 -5.7%).
  • Reopen/recovery p95 is generally flat-to-improved across the matrix.
  • sustained_write_hybrid remains workload-sensitive, so proactive/scheduled maintenance stays guarded by default.

Safe rollout defaults

  • Proactive pressure commits are tuned for targeted workloads and validated with percentile guardrails.
  • Replay snapshot open-path optimization is additive and guarded.
  • Scheduled live-set rewrite is configurable and runs deferred from the flush() hot path.
  • Rewrite candidates are automatically validated and rolled back on verification failure.

Configure scheduled live-set rewrite

import Wax

var config = OrchestratorConfig.default
config.liveSetRewriteSchedule = LiveSetRewriteSchedule(
    enabled: true,
    checkEveryFlushes: 32,
    minDeadPayloadBytes: 64 * 1024 * 1024,
    minDeadPayloadFraction: 0.25,
    minimumCompactionGainBytes: 0,
    minimumIdleMs: 15_000,
    minIntervalMs: 5 * 60_000,
    verifyDeep: false
)

Reproduce benchmark matrix

WAX_BENCHMARK_WAL_COMPACTION=1 \
WAX_BENCHMARK_WAL_OUTPUT=/tmp/wal-matrix.json \
swift test --filter WALCompactionBenchmarks.testWALCompactionWorkloadMatrix
WAX_BENCHMARK_WAL_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testProactivePressureCommitGuardrails
WAX_BENCHMARK_WAL_REOPEN_GUARDRAILS=1 \
swift test --filter WALCompactionBenchmarks.testReplayStateSnapshotGuardrails

See /Users/chriskarani/CodingProjects/Wax/Tasks/wal-compaction-investigation.md and /Users/chriskarani/CodingProjects/Wax/Tasks/wal-compaction-baseline.json for methodology and full baseline artifacts.


Quick Start

1. Add to Package.swift

.package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.6")

2. Choose Your Memory Type

๐Ÿ“ Text Memory โ€” Documents, notes, conversations
import Wax

let orchestrator = try await MemoryOrchestrator(at: storeURL)

// Ingest
try await orchestrator.remember(documentText, metadata: ["source": "report.pdf"])

// Recall
let context = try await orchestrator.recall(query: "key findings")
for item in context.items {
    print("[\(item.kind)] \(item.text)")
}
๐Ÿ“ธ Photo Memory โ€” Photo library with OCR + CLIP embeddings
import Wax

let photoRAG = try await PhotoRAGOrchestrator(
    storeURL: storeURL,
    config: .default,
    embedder: MyCLIPEmbedder()  // Your CoreML model
)

// Index local photos (offline-only)
try await photoRAG.syncLibrary(scope: .fullLibrary)

// Search
let ctx = try await photoRAG.recall(.init(text: "Costco receipt"))
๐ŸŽฌ Video Memory โ€” Video segments with transcripts
import Wax

let videoRAG = try await VideoRAGOrchestrator(
    storeURL: storeURL,
    config: .default,
    embedder: MyEmbedder(),
    transcriptProvider: MyTranscriber()
)

// Ingest
try await videoRAG.ingest(files: [videoFile])

// Search by content or transcript
let ctx = try await videoRAG.recall(.init(text: "project timeline discussion"))

How It Works

Wax packs everything into a single .mv2s file:

  • โœ… Your raw documents
  • โœ… Embeddings (any dimension, any provider)
  • โœ… BM25 full-text search index (FTS5)
  • โœ… HNSW vector index (USearch)
  • โœ… Write-ahead log for crash recovery
  • โœ… Metadata & entity graph

The file format is:

  • Append-only โ€” Fast writes, no fragmentation
  • Checksum-verified โ€” Every byte validated
  • Dual-header โ€” Atomic updates, never corrupt
  • Self-contained โ€” No external dependencies
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Header Page A (4KB)                    โ”‚
โ”‚  Header Page B (4KB) โ† atomic switch    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  WAL Ring Buffer                        โ”‚
โ”‚  (crash recovery log)                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Document Payloads (compressed)         โ”‚
โ”‚  Embeddings                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  TOC (Table of Contents)                โ”‚
โ”‚  Footer + Checksum                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Comparison

Feature Wax Chroma Core Data + FAISS Pinecone
Single file โœ… โŒ โŒ โŒ
Works offline โœ… โš ๏ธ โœ… โŒ
Crash-safe โœ… โŒ โš ๏ธ N/A
GPU vector search โœ… โŒ โŒ โŒ
No server required โœ… โœ… โœ… โŒ
Swift-native โœ… โŒ โœ… โŒ
Deterministic RAG โœ… โŒ โŒ โŒ

Features That Actually Matter

๐Ÿง  Query-Adaptive Hybrid Search

Wax doesn't just do vector search. It runs multiple lanes in parallel (BM25, vector, temporal, structured evidence) and fuses results based on query type.

"When was my last dentist appointment?" โ†’ boosts temporal + structured
"Explain quantum computing" โ†’ boosts vector + BM25

๐ŸŽญ Tiered Memory Compression (Surrogates)

Not all context is equal. Wax generates hierarchical summaries:

  • full โ€” Complete document (for deep dives)
  • gist โ€” Key paragraphs (for balanced recall)
  • micro โ€” One-liner (for quick context)

At query time, it picks the right tier based on query signals and remaining token budget.

๐ŸŽฏ Deterministic Token Budgeting

Strict cl100k_base token counting. No "oops, context window exceeded." No non-deterministic truncation. Reproducible RAG you can test and benchmark.


Perfect For

  • ๐Ÿค– AI assistants that remember users across launches
  • ๐Ÿ“ฑ Offline-first apps with serious search requirements
  • ๐Ÿ”’ Privacy-critical products where data never leaves the device
  • ๐Ÿงช Research tooling that needs reproducible retrieval
  • ๐ŸŽฎ Agent workflows that require durable state

Requirements

  • Swift 6.2
  • iOS 26 / macOS 26
  • Apple Silicon (for Metal GPU features)

Contributing

git clone https://github.com/christopherkarani/Wax.git
cd Wax
swift test

MiniLM CoreML tests are opt-in:

WAX_TEST_MINILM=1 swift test

Ready to stop shipping databases?

โญ Star Wax on GitHub โ€ข ๐Ÿ“– Read the Docs โ€ข ๐Ÿ› Report Issues

Built with ๐Ÿฏ by Christopher Karani