Hoard everything. Find anything.
A private, self-hosted "save anything" inbox with a brain —
every link, screenshot, snippet and thought you throw at it is understood, filed, and answerable.
100% on your own hardware. No accounts. No API keys. No cloud.
You already have a place where you dump things — a Discord channel with one member, Telegram "Saved Messages", a notes file called stuff.txt. The problem is never saving. It's finding.
Stash replaces that dump with an inbox that reads everything you save. Paste anything, hit Enter, done — it's instantly stored, then a local AI classifies it, titles it, summarizes it, tags it, and indexes it for semantic search in the background. Later, switch to Ask mode and just talk to your stash:
"what was that article about ARM SVE?" "what plant do I need to water, and when?" "show me that pasta recipe with the burnt butter"
Answers come from your own saved items only, with the source cards cited and highlighted.
- Instant saves, lazy AI — items appear the moment you hit Enter (heuristic-classified); enrichment, captioning, and indexing happen in a background queue. Nothing blocks, nothing gets lost.
- Automatic understanding — every item gets a type (
link/video/code/quote/reminder/image/text), a topical category, a title, a one-line summary, and tags. Reminders get a parsed ISO due date. - Searchable images — paste a screenshot and a local vision model captions it so it shows up in search and answers. The vision model loads on demand and unloads itself when idle.
- Rich link & video cards — oEmbed + OpenGraph metadata (title, description, thumbnail) fetched once and cached locally, so YouTube videos and articles look like cards, not URLs.
- Ask mode with citations — questions are embedded, the closest items retrieved by cosine similarity, and the LLM answers using only those items, citing each by number.
- Chat history — every conversation is saved and browsable; pick up where you left off or prune what you don't need.
- Model settings — swap the language, embedding, and vision models from curated presets in the UI, with device badges (
PI ★/M2 ★), real download sizes, and live progress. Switching the embedding model re-indexes everything automatically. - Private by construction — all inference runs on-device via QVAC. Notes live in a flat JSON file, uploads in a folder. Nothing ever leaves the machine.
git clone https://github.com/savewithstash/stash.git
cd stash
npm install
npm start # → http://localhost:5173The UI is usable immediately. On first launch Stash downloads ~1.3 GB of default model weights in the background (progress bar in the UI); AI classification and Ask mode switch on automatically at Ready.
docker run -d --name stash \
-p 5173:5173 \
-v stash-data:/app/data \
-v stash-models:/app/models \
savewithstash/stash:1.0.0Stash ships in a community app store. In umbrelOS: App Store → ⋯ → Community App Stores, add
https://github.com/savewithstash/umbrel-app-store
then install Stash from the store.
Everything is configurable from the Settings tab — pick per role, watch it download, keep using the app while it swaps.
| Role | Default | Raspberry Pi pick | Apple Silicon pick |
|---|---|---|---|
| Language — classifies & answers | Qwen3 1.7B | Qwen3 0.6B / Llama 3.2 1B | Qwen3 4B |
| Embedding — semantic search | EmbeddingGemma Q8 | EmbeddingGemma Q4 | GTE Large |
| Vision — image captions | Qwen3-VL 2B | SmolVLM2 0.5B | Qwen3-VL 2B |
Rule of thumb: on a Pi, the starred Pi trio is ~1.5 GB total and keeps everything responsive; on an M-series Mac the defaults are near-instant and the 4B language model is a free quality upgrade.
What to expect on small hardware: a Raspberry Pi 5 produces a few tokens per second — fine for short answers, not a chat firehose. 8 GB RAM recommended (4 GB works if you skip images).
┌─ public/ React UI (in-browser Babel — no build step, fully offline vendored)
├─ server.js dependency-free Node HTTP server + JSON API
├─ lib/
│ ├─ qvac.js model manager: presets, load/swap, classify, embed, caption, answer
│ ├─ store.js notes persistence (data/notes.json) + cosine-similarity retrieval
│ ├─ meta.js oEmbed / OpenGraph fetcher with local thumbnail cache
│ ├─ chats.js saved conversations (data/chats.json)
│ └─ settings.js model selection persistence (data/settings.json)
├─ data/ your notes, uploads, chats, settings (git-ignored)
└─ models/ cached model weights (git-ignored)
Saving is a two-phase pipeline: the note is persisted instantly with a heuristic type and title, then a FIFO background queue enriches it — fetch link metadata, caption images, LLM-classify (grammar-constrained JSON), embed, update in place. If a model is still downloading or fails, Stash degrades gracefully to the heuristics instead of erroring.
Storage is honest and portable: flat JSON with embeddings inline. Good to a few thousand items; swap lib/store.js for a vector DB if you hoard harder than that.
Stash is a deliberately small showcase of @qvac/sdk — four SDK calls do all the AI:
| SDK API | Used for |
|---|---|
loadModel({ modelSrc, modelType, onProgress }) |
Loads the three models from the QVAC registry with live download progress; the vision model adds projectionModelSrc for its mmproj and loads lazily on first image |
completion({ modelId, history, responseFormat }) |
Classification uses responseFormat: { type: 'json_schema' } — the LLM's output is grammar-constrained to a JSON schema, so parsing never fails. Vision calls attach the saved file via attachments: [{ path }] |
embed({ modelId, text }) |
Embeds every saved item and every question |
unloadModel / close |
Idle vision unload + clean shutdown |
Model weights are kept inside the project (./models/): on boot the server writes qvac.config.json with cacheDirectory and sets QVAC_CONFIG_PATH before importing the SDK.
The linux/arm64 image carries two hard-won fixes worth knowing about:
- Slimming —
@qvac/*packages ship native prebuilds for every platform (iOS/Android/Windows/macOS/Linux). The build strips all butlinux-arm64, cutting the image from 7.6 GB to ~3.4 GB. - ARM SVE workaround — the
linux-arm64prebuild of@qvac/translation-nmtcppcontains SVE instructions, which SIGILL on any CPU without SVE (every Raspberry Pi, Apple Silicon under emulation) and kill the QVAC worker at startup. Stash never uses translation, so the build replaces that one native binding with a JS stub (docker/nmtcpp-binding-stub.js). Remove the stub once upstream ships non-SVE (or runtime-dispatched) arm64 builds.
docker buildx build --platform linux/arm64 -t savewithstash/stash:1.0.0 .Issues and PRs welcome. The codebase is intentionally small and dependency-free on the server side — please keep it that way. If you're adding a feature, open an issue first so we can talk shape. Humans reviewed AI PRs are welcome.
MIT © 2026 Suryaansh Singh