Skip to content
GitHub stars

Configuration

Config File

Default location:

PlatformPath
macOS / Linux~/.msgvault/config.toml
WindowsC:\Users\<you>\.msgvault\config.toml

Override the data directory with the MSGVAULT_HOME environment variable or the --home flag (see below).

[data]
# Base data directory (default: ~/.msgvault)
data_dir = "/path/to/msgvault/data"
# Database URL (default: {data_dir}/msgvault.db)
database_url = "/path/to/msgvault.db"
[oauth]
# Path to Google OAuth client secrets JSON for browser OAuth
client_secrets = "/path/to/client_secret.json"
# Google service account key for Workspace domain-wide delegation (optional)
# service_account_key = "/path/to/service-account.json"
# Named OAuth apps for Google Workspace orgs (optional)
[oauth.apps.acme]
client_secrets = "/path/to/acme_workspace_secret.json"
# service_account_key = "/path/to/acme_service_account.json"
[microsoft]
# Azure AD app registration client ID (required for M365)
client_id = "your-azure-app-client-id"
# tenant_id = "your-tenant-id" # optional, default "common"
[log]
# Persistent structured file logging (opt-in)
enabled = true
# dir = "/path/to/logs" # default: <data_dir>/logs
# level = "info" # debug, info, warn, error
# sql_trace = false # log every SQL query (verbose)
# sql_slow_ms = 100 # slow query threshold in ms
[sync]
# Gmail API rate limit (requests per second)
rate_limit_qps = 5
[server]
# API server settings (used by `msgvault serve`)
api_port = 8080
bind_addr = "127.0.0.1"
api_key = "your-secret-key"
[remote]
# Remote msgvault endpoint for CLI remote mode
url = "http://nas-ip:8080"
api_key = "remote-api-key"
allow_insecure = true
# Scheduled sync accounts
[[accounts]]
schedule = "0 * * * *"
enabled = true
[vector]
# Semantic and hybrid search (opt-in; requires a build with sqlite_vec)
enabled = true
backend = "sqlite-vec"
[vector.embeddings]
endpoint = "http://localhost:11434/v1"
model = "nomic-embed-text"
dimension = 768
eta_window = 10
[vector.preprocess]
strip_quotes = true
strip_signatures = true
strip_html = true
strip_base64 = true
strip_url_tracking = true
collapse_whitespace = true
[[synctech_sms.sources]]
name = "phone-backups"
enabled = true
backend = "drive"
folder_id = "google-drive-folder-id"
google_account = "[email protected]"
owner_phone = "+14155551234"
schedule = "30 4 * * *"

Windows Paths

TOML treats backslashes inside double-quoted strings as escape characters. On Windows, this means native paths like "C:\Users\you\..." will cause a parse error.

Use one of these formats instead:

# Forward slashes (recommended)
client_secrets = "C:/Users/you/Downloads/client_secret.json"
# Single-quoted string (backslashes are literal)
client_secrets = 'C:\Users\you\Downloads\client_secret.json'

Sections

[data]

KeyDefaultDescription
data_dir~/.msgvaultBase directory for all data
database_url{data_dir}/msgvault.dbSQLite database path

Attachments and OAuth tokens are stored in subdirectories of data_dir (attachments/ and tokens/ respectively). These paths are not independently configurable.

[oauth]

KeyDefaultDescription
client_secretsPath to Google OAuth client_secret.json for browser OAuth flows
service_account_keyPath to a Google service account key JSON for Workspace domain-wide delegation

[oauth.apps.<name>]

Named OAuth apps for Google Workspace organizations that require their own OAuth credentials. Each entry can define a separate browser OAuth client_secret.json, service account key, or both. Use --oauth-app <name> with add-account to bind an account to a named app.

KeyDefaultDescription
client_secretsPath to the org’s client_secret.json
service_account_keyPath to the org’s Google service account key JSON

See OAuth Setup: Google Workspace Accounts for when and why you need named apps.

When service_account_key is configured, msgvault add-account <email> validates the delegated Gmail profile and registers the account without storing a per-user refresh token. The service account key file must be owner-only on Unix-like systems, for example chmod 600 /path/to/service-account.json.

[microsoft]

Configuration for Microsoft 365 / Outlook.com OAuth. Required only if you use add-o365.

KeyDefaultDescription
client_idAzure AD Application (client) ID (required)
tenant_idcommonAzure AD tenant ID; common allows both personal and org accounts

See OAuth Setup: Microsoft 365 for app registration steps.

[log]

Structured file logging. Disabled by default. Enable it to get persistent, machine-readable logs for troubleshooting. Every CLI invocation writes a unique run_id on every log line so you can trace a single run across shared daily log files.

KeyDefaultDescription
enabledfalseTurn on persistent file logging. Setting dir also enables it implicitly.
dir<data_dir>/logsDirectory for log files
levelinfoLog level: debug, info, warn, error
sql_tracefalseLog every SQL query at info level (verbose, for debugging)
sql_slow_ms100Threshold in ms above which SQL queries are logged at warn level. 0 uses the built-in default (100 ms).

Log files are named msgvault-YYYY-MM-DD.log (UTC date), written as newline-delimited JSON. When a daily log exceeds 50 MiB it rotates to .log.1, .log.2, etc. (up to 5 rotated files).

When SQL logging is enabled, slow/error entries include query arguments and streaming query durations, which makes it easier to diagnose expensive reads without enabling full trace output.

Use msgvault logs to view and tail log files. See CLI Reference: logs.

[sync]

KeyDefaultDescription
rate_limit_qps5Gmail API requests per second

[server]

Settings for the web server started by msgvault serve. See Web Server for endpoint documentation.

KeyDefaultDescription
api_port8080Port the server listens on
bind_addr127.0.0.1Bind address
api_keyAPI key for authentication
allow_insecurefalseAllow non-loopback binding without api_key
cors_origins[]Allowed CORS origins
cors_credentialsfalseAllow credentials in CORS requests
cors_max_age0CORS preflight cache duration in seconds

[remote]

When set, CLI commands read from the remote server by default for supported operations.

KeyDefaultDescription
urlRemote API base URL (e.g. http://nas-ip:8080)
api_keyAPI key used by remote commands
allow_insecurefalseAllow HTTP remote connections

Affected CLI commands: search, show-message, stats, list-accounts, tui.

[[accounts]]

Scheduled sync accounts for the web server. Each [[accounts]] entry defines a cron schedule for automatic background syncing. Gmail and IMAP sources are supported; for IMAP, use the account display name/email when available rather than the raw imaps://... source identifier.

KeyDefaultDescription
email(required)Account identifier or display name to sync
scheduleCron expression for sync schedule (e.g., 0 * * * *)
enabledtrueWhether scheduled sync is active for this account

SyncTech SMS Sources

Scheduled SMS Backup & Restore sources are configured with [[synctech_sms.sources]] entries. These are created automatically by msgvault add-synctech-sms-drive, but can also be edited directly.

KeyDefaultDescription
name(required)Source name used by sync-synctech-sms <name> and scheduler logs
enabledtrueWhether the source is active
backendlocallocal for a path on disk, or drive for Google Drive
pathLocal XML/ZIP file or directory when backend = "local"
folder_idGoogle Drive folder ID when backend = "drive"
google_accountGoogle account used for Drive access
owner_phone(required)Owner phone number in E.164 format
scheduleCron expression used by msgvault serve
include_smstrueImport SMS records
include_mmstrueImport MMS records
include_callstrueImport call logs
include_attachmentstrueImport MMS attachments
stable_after10mHow long Drive files must remain unchanged before import
oauth_appNamed Google OAuth app to use

[vector]

Top-level toggle and backend selection for semantic/hybrid search. Requires a build with sqlite_vec support (default via make build). See Vector Search for prerequisites, initial embedding, and the full workflow.

KeyDefaultDescription
enabledfalseTurn on vector and hybrid search. When false, mode=vector and mode=hybrid return vector_not_enabled.
backendsqlite-vecVector backend. Only sqlite-vec is supported.
db_path<data_dir>/vectors.dbPath to the co-located vectors database.

[vector.embeddings]

External OpenAI-compatible embedding endpoint used to convert message text into vectors. msgvault does not host a model; it calls the endpoint you configure. Use a local or self-hosted endpoint (Ollama, llama.cpp server, LM Studio, etc.) when message text must stay on your machine or network. Hosted endpoints also work but receive the text being embedded.

KeyDefaultDescription
endpoint(required)HTTP(S) base URL for an OpenAI-compatible embeddings API. msgvault appends /embeddings (for example, set http://localhost:11434/v1, not .../embeddings).
model(required)Model name to pass in each request (e.g., nomic-embed-text).
dimension(required)Vector dimension. Must match the model’s output dimension.
api_key_envName of an environment variable containing the API key. Omit for anonymous endpoints.
batch_size32Embedding inputs per HTTP call. Long messages can contribute multiple chunk inputs.
timeout30sPer-request timeout.
max_retries3Retries per batch on transient failures.
max_input_chars32768Character cap per embedding chunk. Set below your model’s context window (e.g., 2000 for Ollama’s default nomic-embed-text).
eta_window10Number of recent progress samples used for ETA smoothing.

The index generation fingerprint includes the model, dimension, preprocessing settings, max_input_chars, and embedding policy. Changing those settings triggers a stale-index error on the next vector/hybrid query until you run msgvault embeddings build --full-rebuild.

[vector.preprocess]

Controls text normalization before embedding.

KeyDefaultDescription
strip_quotestrueDrop quoted reply blocks (> ... lines, reply preambles) before embedding.
strip_signaturestrueDrop trailing signature blocks (content after -- ).
strip_htmltrueConvert HTML-only bodies to text and remove HTML markup before embedding.
strip_base64trueRemove base64/data blobs before HTML stripping so encoded data does not crowd out prose.
strip_url_trackingtrueRemove common tracking parameters such as utm_*, fbclid, and gclid from URLs.
collapse_whitespacetrueNormalize repeated horizontal whitespace and blank lines.

[vector.search]

Hybrid ranking parameters applied at query time.

KeyDefaultDescription
rrf_k60Reciprocal Rank Fusion constant. Higher values flatten score differences between signals.
k_per_signal100Candidate pool size drawn from each signal (BM25 or vector) before fusion.
subject_boost2.0Multiplier applied when a query term matches a message’s subject line.
max_page_size_hybrid50Hard cap on page_size for vector/hybrid responses. Set to 0 to disable clamping.

[vector.embed.schedule]

Optional background scheduling for the embed worker inside msgvault serve. Empty config disables scheduled embedding; you can still run msgvault embeddings build by hand.

KeyDefaultDescription
cron5-field cron expression. Empty string disables the standalone cron.
run_after_syncfalseWhen true, an embed pass runs after every successful scheduled sync.

Overriding the Home Directory

By default, msgvault stores everything under ~/.msgvault (macOS/Linux) or C:\Users\<you>\.msgvault (Windows). To use a different location, you have two options:

--home flag (per-command):

Terminal window
msgvault sync --home /mnt/data/msgvault

MSGVAULT_HOME environment variable (persistent):

Terminal window
export MSGVAULT_HOME=/mnt/data/msgvault

Both options are equivalent: config.toml is loaded from the specified directory, and all data (database, tokens, attachments) is stored there. The --home flag takes priority over MSGVAULT_HOME.

Environment Variables

VariableDescription
MSGVAULT_HOMEBase directory for all data (default: ~/.msgvault)
MSGVAULT_REMOTE_URLRemote URL for export-token (flag > env > config)
MSGVAULT_REMOTE_API_KEYRemote API key for export-token (flag > env > config)

File Locations

All data lives under the msgvault home directory (~/.msgvault on macOS/Linux, C:\Users\<you>\.msgvault on Windows). The directory is created automatically on first use.

FileDescription
config.tomlConfiguration file
msgvault.dbSQLite database (system of record)
attachments/Content-addressed attachment files
tokens/OAuth tokens per account
logs/Structured log files (when file logging is enabled)
analytics/Parquet cache files for TUI