Skip to content

MukundaKatta/agentmemory

Repository files navigation

agentmemory

Honest pull-model memory for LLM agents. The open-source alternative to background-consolidation systems like Anthropic Dreaming, with a different shape: nothing happens in the background, every retrieval shows its work, and deletes are real deletes.

npm tests license

Why this exists

Anthropic shipped Dreaming on May 6, 2026: a managed background consolidation pass that turns episodic conversation traces into semantic memory the next session can use. The OSS reflex is to clone it next weekend with Llama or Qwen. I sat with that and walked away. Full reasoning in Why I refused to build a Dreaming clone for OSS Claude.

The short version: the consolidator IS the model. Run a smaller LLM to summarize, you get a different feature with the same name and lower quality. Plus deletion gets harder once memories are baked.

agentmemory is a different shape that solves the same job: pull-on-demand instead of push-in-background. The latency tax is real (200ms-2s on cold start). In exchange you get full reversibility, no derived artifacts, and the user can see exactly what was retrieved before it goes into the context.

Install

npm install @mukundakatta/agentmemory

Requires Node 20+. Pure ESM, zero runtime dependencies.

Three pieces

1. EpisodicStore

Append-only event log of agent interactions. Embedded at write time when an embedder is configured. Real deletes, no tombstones, no derived artifacts.

import { EpisodicStore } from "@mukundakatta/agentmemory";

const store = new EpisodicStore({
  embedder: async (text) => myEmbedder(text), // optional; falls back to keyword overlap
});

await store.append({
  sessionId: "user-42",
  kind: "user_message",
  text: "I prefer Postgres for the new project",
});

const hits = await store.retrieve("which database should I use", {
  sessionId: "user-42",
  topK: 5,
});

// Real delete: gone, no trace.
const eventId = hits[0].id;
store.deleteEvent(eventId);

// Retention policies are first-class:
store.deleteOlderThan(Date.now() - 30 * 24 * 60 * 60 * 1000);
store.deleteSession("user-42");

2. OnDemandSummarizer

The pull-model context builder. Bring your own LLM. The summary is shown in the trace, never silently injected.

import { OnDemandSummarizer } from "@mukundakatta/agentmemory";
import Anthropic from "@anthropic-ai/sdk";

const claude = new Anthropic();

const summarizer = new OnDemandSummarizer({
  llm: async (prompt) => {
    const r = await claude.messages.create({
      model: "claude-3-5-haiku-latest",
      max_tokens: 400,
      messages: [{ role: "user", content: prompt }],
    });
    return r.content[0].text;
  },
  maxTokens: 300,
});

const events = await store.retrieve("pick a database", { topK: 5 });
const { summary, trace } = await summarizer.summarize(events, "pick a database");

console.log("Summary:", summary);
console.log("Built from event ids:", trace.eventIds);
console.log("Prompt sent to LLM:", trace.prompt);

The trace lets you show the user (or log to your audit trail) exactly which events fed the summary. This is the key honesty property: nothing silent, nothing magical.

3. MemoryDriftWatcher

Watches retrieval quality over time. If yesterday's "remember when we discussed X" stops returning anything because user intent has drifted, you get a signal instead of a silent regression.

import { MemoryDriftWatcher } from "@mukundakatta/agentmemory";

const watcher = new MemoryDriftWatcher({
  windowSize: 20,
  dropThreshold: 0.15, // 15% mean-score drop alerts
});

// After every retrieval call:
watcher.record({ ts: Date.now(), scores: hits.map((h) => h.score) });

const state = watcher.state();
if (state.alert) {
  console.warn("Memory drift alert:", state.reason);
}

For the heavy-duty drift math (MMD, sliced Wasserstein, KS, PSI, k-means cluster shift across five dimensions) see the sibling library ragdrift.

Design rules

  1. No background work. Everything is synchronous-from-the-caller's-perspective. No cron, no consolidation pass, no "memories are being baked" race conditions.
  2. Real deletes. No tombstones. No derived artifacts that survive after the source is deleted. If a user asks you to forget something, you can.
  3. Pull, never push. The summarizer is called explicitly from the agent's main loop. Nothing gets injected without a call.
  4. Show the trace. Every summary returns the event ids and the exact prompt that produced it.
  5. BYO LLM. No assumption about which model summarizes. Use Claude, GPT, Gemini, or a local model. The library is the same.
  6. Zero runtime dependencies. The whole library is < 500 lines. Easy to read end-to-end.

What this is not

  • Not a Dreaming clone. Different shape on purpose.
  • Not a vector database. The default in-memory store is for tests and small agents. For production, swap for a persistent backend that satisfies the same interface.
  • Not a "memory framework." Three small classes you compose into your existing agent loop.

Compatibility with the @mukundakatta/agent* reliability stack

agentmemory pairs cleanly with the existing zero-dep agent stack:

Library What it does
@mukundakatta/agentfit Token-aware truncation. Use to fit a summary plus the new turn into your context budget.
@mukundakatta/agentguard Network egress allowlist. Use to keep retrieved memories from triggering unrelated tool calls.
@mukundakatta/agentsnap Tool-call trace snapshots. Snapshot the agent's behavior with and without memory.
@mukundakatta/agentvet Tool arg validation before execution.
@mukundakatta/agentcast Structured output enforcer. Use to make the summarizer return JSON when needed.

Visible retrieval demo

Two runnable demos that need no API key and no network. They print every retrieval block (event id, timestamp, similarity score) and prove the two honesty properties from the design rules.

node examples/visible_retrieval_demo.js
node examples/vs_background_consolidator.js

The first script is a 5-turn mock agent loop. On every turn it prints the exact memory it pulled, summarizes via a local echo "LLM", and times the retrieval. On turn 3 the user asks the agent to forget one event, and turn 4 re-queries the same topic to prove the forgotten event does not come back. A small slice of the transcript:

TURN 2
USER: And what was the caching layer plan?

  RETRIEVED (query="cache layer plan with Redis") [3 item(s)]:
    - e_2  ts=2026-05-20T09:15:00Z  score=1.000  "Use Redis as the cache layer with a 5 minute TTL."
    - e_6  ts=2026-05-24T16:14:03Z  score=0.333  "Which database did we pick again?"
    - e_1  ts=2026-05-20T09:00:00Z  score=0.272  "I prefer Postgres for the new project, skip Mongo."
  RETRIEVAL LATENCY: 0.03 ms

----------------------------------------------------------------
TURN 3 (forget request)
USER: Forget the cache-layer note (event e_2).
  forget(e_2) => true
  (Real delete: no tombstone, no derived artifact.)
----------------------------------------------------------------
TURN 4 (reversibility check)
USER: Remind me of the cache plan.

  RETRIEVED (after forget) [3 item(s)]:
    - e_6  ts=2026-05-24T16:14:03Z  score=0.333  "Which database did we pick again?"
    - e_1  ts=2026-05-20T09:00:00Z  score=0.272  "I prefer Postgres for the new project, skip Mongo."
    - e_3  ts=2026-05-20T09:45:00Z  score=0.000  "Security review must happen before deploy."

  Forgotten event e_2 re-appeared? false

The second script runs the same input stream through a "sham" background consolidator that folds every 3 turns into one summary string, and through agentmemory side by side. The sham consolidator demonstrates three failure modes agentmemory does not share: forget() is a no-op once the trace has been baked into the summary, retrievals return an opaque blob with no event ids, and the source bytes are not guaranteed to survive consolidation.

Source: examples/visible_retrieval_demo.js and examples/vs_background_consolidator.js.

End-to-end Claude demo

A small runnable demo wires EpisodicStore, OnDemandSummarizer, and the Anthropic SDK together:

npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=sk-ant-... node examples/claude-agent.js

The demo shows two sessions, retrieval across them, the summary printed before injection (so you can see exactly what's going into Claude's context), and a real delete that removes a memory with no tombstone left behind. Source: examples/claude-agent.js.

Postgres adapter (production backend)

The default EpisodicStore is in-memory. For production, swap in PostgresEpisodicStore (same interface, real deletes via DELETE):

import pg from "pg";
import { PostgresEpisodicStore } from "@mukundakatta/agentmemory/postgres";

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
const store = new PostgresEpisodicStore({ pool, embedder: myEmbedder });
await store.init();  // creates `agentmemory_events` table + indexes if missing

await store.append({ sessionId: "user-42", kind: "user_message", text: "hi" });
const hits = await store.retrieve("greetings", { sessionId: "user-42", topK: 5 });
await store.deleteEvent(hits[0].id);  // real delete, no tombstone

Schema is documented in src/adapters/postgres.js. Works on plain Postgres; if you have pgvector you can swap the embedding FLOAT8[] column for vector(N) and rewrite the retrieve ORDER BY for indexed cosine.

Peer dependency: npm install pg.

Testing

npm test            # in-memory store + summarizer + drift + example smoke (25 tests)
npm run test:postgres   # Postgres adapter (skipped unless DATABASE_URL set)
npm run test:all    # everything

25 in-memory tests + 9 Postgres tests, all passing. Tests cover:

  • EpisodicStore: append, embed, retrieve (cosine + keyword fallback), filters (session, time, kind), deleteEvent, deleteSession, deleteOlderThan, sessions
  • OnDemandSummarizer: requires LLM, empty-events shortcut, prompt structure, summary trim, custom system prompt
  • MemoryDriftWatcher: cold-start, stable scores, drop alert, sliding window, reset
  • Integration: end-to-end flow + drift watcher catching memory-quality decay
  • Example smoke: visible_retrieval_demo.js and vs_background_consolidator.js end-to-end, with assertions on the retrieved blocks, the forget round-trip, and the three sham failure modes

License

MIT. See LICENSE.

Related

Repository Health

This repository includes a dependency-free health check for core documentation, metadata, and CI wiring. Run it locally before publishing changes:

python3 scripts/check_repository_health.py

The same check runs in GitHub Actions on pushes and pull requests.

About

Honest pull-model memory for LLM agents. The OSS alternative to Anthropic Dreaming, with real deletes and no silent context injection.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors