A minimal LLM chatbot for Vector built with:
- Vector SDK (Vector Bot client)
- dotenvy for environment config
- reqwest for OpenAI-compatible LLM calls
- Serde + std::fs for simple per-chat JSON memory stores
Core behavior:
- Per-chat memory: one JSON file per chat under
data/, retaining the system prompt and up to N recent turns - LLM calls use the OpenAI-compatible
/chat/completionsschema, so you can point it to OpenAI, llama.cpp servers, Groq, etc. - Listens for gift-wrap (NIP-59) events, unwraps via Vector SDK, processes direct messages only, and replies privately
Cargo.toml— crate metadata and dependenciessrc/main.rs— Vector bot bootstrap, subscription loop, unwrap and reply flowsrc/config.rs— loads configuration from environment variablessrc/llm.rs— OpenAI-compatible clientsrc/memory.rs— simple per-chat JSON memory with trimming.env.example— template for configurationdata/— per-chat memory files (gitignored)
- Rust (stable) and Cargo installed
- Network access to your LLM endpoint (OpenAI, local llama.cpp, etc.)
Copy the example environment file and edit values:
cp .env.example .envEnvironment variables (defaults shown):
LLM_BASE_URL=https://api.openai.com/v1- Base URL for an OpenAI-compatible API. Examples:
- OpenAI:
https://api.openai.com/v1 - llama.cpp server:
http://localhost:8080/v1 - Groq OpenAI compatible:
https://api.groq.com/openai/v1
- OpenAI:
- Base URL for an OpenAI-compatible API. Examples:
LLM_API_KEY=sk-...(optional for local servers)LLM_MODEL=gpt-4o-mini(e.g.,llama-3.1-8b-instruct)LLM_TEMPERATURE=0.2HISTORY_LIMIT=16(number of user/assistant messages to keep per chat)DATA_DIR=dataSYSTEM_PROMPT="You are a helpful assistant. Keep responses concise and factual."VECTOR_SECRET_KEY=(optional; nsec bech32 or hex). If unset, ephemeral keys are generatedTYPING_INDICATOR=true(send kind 30078 typing indicator before replying)
cargo runOn start, the bot will:
- Load environment config
- Create keys (generate if
VECTOR_SECRET_KEYis not set) - Create a VectorBot with default metadata and relays
- Subscribe to gift-wrap events and process direct messages
- For each message:
- Load per-chat memory (JSON) and append the user message
- Trim memory to
HISTORY_LIMIT - Call your configured LLM (
/chat/completions) - Persist the assistant reply and send it back privately
Run your llama.cpp server with OpenAI-compatible HTTP (ensure /v1/chat/completions is served):
# Example; your binary/flags may differ
./llama.cpp/server -m ./models/llama-3.1-8b-instruct.gguf --port 8080 --apiUse a .env like:
LLM_BASE_URL=http://localhost:8080/v1
LLM_MODEL=llama-3.1-8b-instruct
LLM_TEMPERATURE=0.2
HISTORY_LIMIT=12
DATA_DIR=data
TYPING_INDICATOR=true
# No API key required for local server
Then:
cargo runLLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o-mini
HISTORY_LIMIT=16
DATA_DIR=data
TYPING_INDICATOR=true
cargo runFor a chat ID (sender npub), the bot stores:
data/npub1abc...xyz.json
It contains:
- system prompt (string)
- history_limit (usize)
- messages: array of user/assistant messages
Oldest turns are trimmed when exceeding HISTORY_LIMIT. The system prompt is not counted against this limit.
- Unwrapping messages uses Vector SDK (no direct nostr-sdk usage required in your code).
- The bot filters only
Kind::PrivateDirectMessageafter unwrap and ignores other kinds. - If
TYPING_INDICATOR=true, it sends a brief typing indicator before LLM completion.
- Build errors about missing environment variables usually mean your
.envisn’t loaded; confirm it exists and values are present. - If HTTP 401/403 from LLM, confirm
LLM_API_KEYandLLM_BASE_URLare correct. - For local servers, confirm they implement
/v1/chat/completionswith the standard OpenAI payload/response shape.
MIT