Honest pull-model memory for LLM agents. The open-source alternative to background-consolidation systems like Anthropic Dreaming, with a different shape: nothing happens in the background, every retrieval shows its work, and deletes are real deletes.
Anthropic shipped Dreaming on May 6, 2026: a managed background consolidation pass that turns episodic conversation traces into semantic memory the next session can use. The OSS reflex is to clone it next weekend with Llama or Qwen. I sat with that and walked away. Full reasoning in Why I refused to build a Dreaming clone for OSS Claude.
The short version: the consolidator IS the model. Run a smaller LLM to summarize, you get a different feature with the same name and lower quality. Plus deletion gets harder once memories are baked.
agentmemory is a different shape that solves the same job: pull-on-demand instead of push-in-background. The latency tax is real (200ms-2s on cold start). In exchange you get full reversibility, no derived artifacts, and the user can see exactly what was retrieved before it goes into the context.
npm install @mukundakatta/agentmemoryRequires Node 20+. Pure ESM, zero runtime dependencies.
Append-only event log of agent interactions. Embedded at write time when an embedder is configured. Real deletes, no tombstones, no derived artifacts.
import { EpisodicStore } from "@mukundakatta/agentmemory";
const store = new EpisodicStore({
embedder: async (text) => myEmbedder(text), // optional; falls back to keyword overlap
});
await store.append({
sessionId: "user-42",
kind: "user_message",
text: "I prefer Postgres for the new project",
});
const hits = await store.retrieve("which database should I use", {
sessionId: "user-42",
topK: 5,
});
// Real delete: gone, no trace.
const eventId = hits[0].id;
store.deleteEvent(eventId);
// Retention policies are first-class:
store.deleteOlderThan(Date.now() - 30 * 24 * 60 * 60 * 1000);
store.deleteSession("user-42");The pull-model context builder. Bring your own LLM. The summary is shown in the trace, never silently injected.
import { OnDemandSummarizer } from "@mukundakatta/agentmemory";
import Anthropic from "@anthropic-ai/sdk";
const claude = new Anthropic();
const summarizer = new OnDemandSummarizer({
llm: async (prompt) => {
const r = await claude.messages.create({
model: "claude-3-5-haiku-latest",
max_tokens: 400,
messages: [{ role: "user", content: prompt }],
});
return r.content[0].text;
},
maxTokens: 300,
});
const events = await store.retrieve("pick a database", { topK: 5 });
const { summary, trace } = await summarizer.summarize(events, "pick a database");
console.log("Summary:", summary);
console.log("Built from event ids:", trace.eventIds);
console.log("Prompt sent to LLM:", trace.prompt);The trace lets you show the user (or log to your audit trail) exactly which events fed the summary. This is the key honesty property: nothing silent, nothing magical.
Watches retrieval quality over time. If yesterday's "remember when we discussed X" stops returning anything because user intent has drifted, you get a signal instead of a silent regression.
import { MemoryDriftWatcher } from "@mukundakatta/agentmemory";
const watcher = new MemoryDriftWatcher({
windowSize: 20,
dropThreshold: 0.15, // 15% mean-score drop alerts
});
// After every retrieval call:
watcher.record({ ts: Date.now(), scores: hits.map((h) => h.score) });
const state = watcher.state();
if (state.alert) {
console.warn("Memory drift alert:", state.reason);
}For the heavy-duty drift math (MMD, sliced Wasserstein, KS, PSI, k-means cluster shift across five dimensions) see the sibling library ragdrift.
- No background work. Everything is synchronous-from-the-caller's-perspective. No cron, no consolidation pass, no "memories are being baked" race conditions.
- Real deletes. No tombstones. No derived artifacts that survive after the source is deleted. If a user asks you to forget something, you can.
- Pull, never push. The summarizer is called explicitly from the agent's main loop. Nothing gets injected without a call.
- Show the trace. Every summary returns the event ids and the exact prompt that produced it.
- BYO LLM. No assumption about which model summarizes. Use Claude, GPT, Gemini, or a local model. The library is the same.
- Zero runtime dependencies. The whole library is < 500 lines. Easy to read end-to-end.
- Not a Dreaming clone. Different shape on purpose.
- Not a vector database. The default in-memory store is for tests and small agents. For production, swap for a persistent backend that satisfies the same interface.
- Not a "memory framework." Three small classes you compose into your existing agent loop.
Compatibility with the @mukundakatta/agent* reliability stack
agentmemory pairs cleanly with the existing zero-dep agent stack:
| Library | What it does |
|---|---|
@mukundakatta/agentfit |
Token-aware truncation. Use to fit a summary plus the new turn into your context budget. |
@mukundakatta/agentguard |
Network egress allowlist. Use to keep retrieved memories from triggering unrelated tool calls. |
@mukundakatta/agentsnap |
Tool-call trace snapshots. Snapshot the agent's behavior with and without memory. |
@mukundakatta/agentvet |
Tool arg validation before execution. |
@mukundakatta/agentcast |
Structured output enforcer. Use to make the summarizer return JSON when needed. |
Two runnable demos that need no API key and no network. They print every retrieval block (event id, timestamp, similarity score) and prove the two honesty properties from the design rules.
node examples/visible_retrieval_demo.js
node examples/vs_background_consolidator.jsThe first script is a 5-turn mock agent loop. On every turn it prints the exact memory it pulled, summarizes via a local echo "LLM", and times the retrieval. On turn 3 the user asks the agent to forget one event, and turn 4 re-queries the same topic to prove the forgotten event does not come back. A small slice of the transcript:
TURN 2
USER: And what was the caching layer plan?
RETRIEVED (query="cache layer plan with Redis") [3 item(s)]:
- e_2 ts=2026-05-20T09:15:00Z score=1.000 "Use Redis as the cache layer with a 5 minute TTL."
- e_6 ts=2026-05-24T16:14:03Z score=0.333 "Which database did we pick again?"
- e_1 ts=2026-05-20T09:00:00Z score=0.272 "I prefer Postgres for the new project, skip Mongo."
RETRIEVAL LATENCY: 0.03 ms
----------------------------------------------------------------
TURN 3 (forget request)
USER: Forget the cache-layer note (event e_2).
forget(e_2) => true
(Real delete: no tombstone, no derived artifact.)
----------------------------------------------------------------
TURN 4 (reversibility check)
USER: Remind me of the cache plan.
RETRIEVED (after forget) [3 item(s)]:
- e_6 ts=2026-05-24T16:14:03Z score=0.333 "Which database did we pick again?"
- e_1 ts=2026-05-20T09:00:00Z score=0.272 "I prefer Postgres for the new project, skip Mongo."
- e_3 ts=2026-05-20T09:45:00Z score=0.000 "Security review must happen before deploy."
Forgotten event e_2 re-appeared? false
The second script runs the same input stream through a "sham" background
consolidator that folds every 3 turns into one summary string, and through
agentmemory side by side. The sham consolidator demonstrates three failure
modes agentmemory does not share: forget() is a no-op once the trace has
been baked into the summary, retrievals return an opaque blob with no event
ids, and the source bytes are not guaranteed to survive consolidation.
Source: examples/visible_retrieval_demo.js
and examples/vs_background_consolidator.js.
A small runnable demo wires EpisodicStore, OnDemandSummarizer, and the Anthropic SDK together:
npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=sk-ant-... node examples/claude-agent.jsThe demo shows two sessions, retrieval across them, the summary printed before injection (so you can see exactly what's going into Claude's context), and a real delete that removes a memory with no tombstone left behind. Source: examples/claude-agent.js.
The default EpisodicStore is in-memory. For production, swap in PostgresEpisodicStore (same interface, real deletes via DELETE):
import pg from "pg";
import { PostgresEpisodicStore } from "@mukundakatta/agentmemory/postgres";
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
const store = new PostgresEpisodicStore({ pool, embedder: myEmbedder });
await store.init(); // creates `agentmemory_events` table + indexes if missing
await store.append({ sessionId: "user-42", kind: "user_message", text: "hi" });
const hits = await store.retrieve("greetings", { sessionId: "user-42", topK: 5 });
await store.deleteEvent(hits[0].id); // real delete, no tombstoneSchema is documented in src/adapters/postgres.js. Works on plain Postgres; if you have pgvector you can swap the embedding FLOAT8[] column for vector(N) and rewrite the retrieve ORDER BY for indexed cosine.
Peer dependency: npm install pg.
npm test # in-memory store + summarizer + drift + example smoke (25 tests)
npm run test:postgres # Postgres adapter (skipped unless DATABASE_URL set)
npm run test:all # everything25 in-memory tests + 9 Postgres tests, all passing. Tests cover:
- EpisodicStore: append, embed, retrieve (cosine + keyword fallback), filters (session, time, kind), deleteEvent, deleteSession, deleteOlderThan, sessions
- OnDemandSummarizer: requires LLM, empty-events shortcut, prompt structure, summary trim, custom system prompt
- MemoryDriftWatcher: cold-start, stable scores, drop alert, sliding window, reset
- Integration: end-to-end flow + drift watcher catching memory-quality decay
- Example smoke:
visible_retrieval_demo.jsandvs_background_consolidator.jsend-to-end, with assertions on the retrieved blocks, the forget round-trip, and the three sham failure modes
MIT. See LICENSE.
- Companion essay: Why I refused to build a Dreaming clone for OSS Claude
- Sibling library for full drift math: ragdrift
- The rest of the agent reliability stack: @mukundakatta on npm
This repository includes a dependency-free health check for core documentation, metadata, and CI wiring. Run it locally before publishing changes:
python3 scripts/check_repository_health.pyThe same check runs in GitHub Actions on pushes and pull requests.