Skip to content

JasonHonKL/PardusDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PardusDB

A fast, SQLite-like embedded vector database with graph-based approximate nearest neighbor search
Open-source project from the team behind Pardus AI

PardusDB is designed for developers building local AI applications — RAG pipelines, semantic search, recommendation systems, or any project that needs lightweight, persistent vector storage without external dependencies.

While Pardus AI gives non-technical users a powerful no-code platform to ask questions of their CSV, JSON, and PDF data in plain English, PardusDB gives developers the same speed and privacy in an embeddable, fully open-source vector database.

Features

  • Single-file storage — Everything lives in one .pardus file, just like SQLite
  • Multiple tables — Store different vector dimensions and metadata in the same database
  • Familiar SQL-like syntax — CREATE, INSERT, SELECT, UPDATE, DELETE feel natural
  • UNIQUE constraints — O(1) duplicate detection using HashSet
  • GROUP BY with aggregates — O(n) hash aggregation with COUNT, SUM, AVG, MIN, MAX
  • JOINs — O(n+m) hash join algorithm for INNER, LEFT, RIGHT joins
  • Fast vector similarity search — Graph-based approximate nearest neighbor search
  • Thread-safe — Safe concurrent reads in multi-threaded applications
  • Full transactions — BEGIN/COMMIT/ROLLBACK for atomic operations
  • Optional GPU acceleration — For large batch inserts and queries
  • Zero external dependencies — Pure Rust, MIT licensed

Installation

Quick Install (Recommended)

git clone https://github.com/pardus-ai/pardusdb
cd pardusdb
./setup.sh

This will build PardusDB and install it as the pardusdb command, available system-wide.

Manual Install

git clone https://github.com/pardus-ai/pardusdb
cd pardusdb
cargo build --release

The binary will be at target/release/pardusdb.

Quick Start

Interactive REPL

pardusdb
╔═══════════════════════════════════════════════════════════════╗
║                    PardusDB REPL                      ║
║          Vector Database with SQL Interface           ║
╚═══════════════════════════════════════════════════════════════╝

pardusdb [memory]> .create mydb.pardus
Created and opened: mydb.pardus

pardusdb [mydb.pardus]> CREATE TABLE docs (embedding VECTOR(768), content TEXT);
Table 'docs' created

pardusdb [mydb.pardus]> INSERT INTO docs (embedding, content)
VALUES ([0.1, 0.2, 0.3, ...], 'Hello World');
Inserted row with id=1

pardusdb [mydb.pardus]> SELECT * FROM docs
WHERE embedding SIMILARITY [0.1, 0.2, 0.3, ...] LIMIT 5;

Found 1 similar rows:
  id=1, distance=0.0000, values=[Vector([...]), Text("Hello World")]

pardusdb [mydb.pardus]> quit
Saved to: mydb.pardus
Goodbye!

Command Line

# Persistent file
pardusdb mydata.pardus

# In-memory only
pardusdb

SQL Syntax

Supported Data Types

Type Description Example
VECTOR(n) n-dimensional float vector VECTOR(768)
TEXT UTF-8 string 'hello world'
INTEGER 64-bit integer 42
FLOAT 64-bit float 3.14
BOOLEAN true/false true

Basic Operations

CREATE TABLE documents (
    id INTEGER PRIMARY KEY,
    embedding VECTOR(768),
    title TEXT,
    category TEXT,
    score FLOAT
);

INSERT INTO documents (embedding, title, category, score)
VALUES ([0.1, 0.2, ...], 'Introduction to Rust', 'tutorial', 0.95);

SELECT * FROM documents WHERE category = 'tutorial' LIMIT 10;

UPDATE documents SET score = 0.99 WHERE id = 1;

DELETE FROM documents WHERE id = 1;

UNIQUE Constraint

Ensure column values are unique with O(1) duplicate detection:

CREATE TABLE users (
    embedding VECTOR(128),
    id INTEGER PRIMARY KEY,
    email TEXT UNIQUE
);

-- This will fail - duplicate email
INSERT INTO users (embedding, id, email) VALUES ([0.1, ...], 1, '[email protected]');
INSERT INTO users (embedding, id, email) VALUES ([0.2, ...], 2, '[email protected]');
-- Error: Duplicate value for UNIQUE column 'email'

GROUP BY with Aggregates

Group and aggregate data with O(n) hash aggregation:

-- Aggregate functions: COUNT, SUM, AVG, MIN, MAX
SELECT category, COUNT(*), AVG(score), SUM(amount)
FROM sales
GROUP BY category;

-- With HAVING clause for filtered groups
SELECT category, SUM(amount) as total
FROM sales
GROUP BY category
HAVING SUM(amount) > 1000;

JOINs

Join tables with O(n+m) hash join algorithm:

-- INNER JOIN
SELECT * FROM orders
INNER JOIN users ON orders.user_id = users.id;

-- LEFT JOIN (include all left rows)
SELECT users.email, orders.product
FROM users
LEFT JOIN orders ON users.id = orders.user_id;

-- RIGHT JOIN (include all right rows)
SELECT * FROM users
RIGHT JOIN orders ON users.id = orders.user_id;

Vector Similarity Search

SELECT * FROM documents
WHERE embedding SIMILARITY [0.12, 0.24, ...]
LIMIT 10;

Results are automatically ordered by distance (closest first).

Utility Commands

SHOW TABLES;
DROP TABLE documents;

REPL Commands

Command Description
.create <file> Create and open a new database
.open <file> Open an existing database
.save Force save current database
.tables List tables
.clear Clear screen
help Show help
quit Exit (auto-saves if file open)

Performance (Apple Silicon M-series)

Operation Time
Single insert ~160 µs/doc
Batch insert (1,000 docs) ~6 ms
Query (k=10) ~3 µs

Benchmark: PardusDB vs Neo4j

Real-world benchmark comparing PardusDB against Neo4j 5.15 for vector similarity operations.

Test Configuration:

  • Vector dimension: 128
  • Number of vectors: 10,000
  • Number of queries: 100
  • Top-K: 10

Results

Database Insert (10K vectors) Search (100 queries) Single Search
PardusDB 18ms (543K/s) 355µs (281K/s) 3µs
Neo4j 35.70s (280/s) 153ms (650/s) 1ms

Speedup

Operation PardusDB Advantage
Insert 1983x faster
Search 431x faster

Batch Insert Performance

PardusDB supports batch inserts for massive performance gains:

Batch Size Insert (10K vecs) Speedup vs Individual
Individual 1.52s 1.0x
100 33ms 45x
500 10ms 149x
1000 6ms 220x

Feature Comparison

Feature PardusDB Neo4j
Architecture Embedded (SQLite-like) Client-Server
Implementation Rust (native) Java (JVM)
Setup Time 0 seconds 5-10 minutes
Memory Overhead Minimal (~50MB) High (JVM ~1GB+)
Deployment Single binary/file Server + Docker/K8s
Query Language SQL-like Cypher

Run the benchmark yourself:

# Without Neo4j (PardusDB only)
cargo run --release --bin benchmark_neo4j

# With Neo4j comparison (requires Neo4j running)
docker run -d -p 7687:7687 -e NEO4J_AUTH=neo4j/password123 neo4j:5.15
cargo run --release --features neo4j --bin benchmark_neo4j

Search Accuracy

Accuracy comparison against brute-force exact search (ground truth).

PardusDB Results:

Metric K=10 K=5 K=1 Description
Recall@K 99.2% 94.8% 68.0% True neighbors found
Precision@K 99.2% 94.8% 68.0% Correct results ratio
MRR 0.292 0.439 0.680 Mean Reciprocal Rank

PardusDB vs Neo4j Accuracy Comparison:

Metric PardusDB Neo4j Winner
Recall@10 99.2% 3.0% PardusDB
Recall@5 94.8% 2.8% PardusDB
Recall@1 68.0% 2.0% PardusDB
MRR 0.292 0.010 PardusDB

Run accuracy benchmark:

# Without Neo4j (PardusDB only)
cargo run --release --bin benchmark_accuracy

# With Neo4j comparison (requires Neo4j running)
cargo run --release --features neo4j --bin benchmark_accuracy

Benchmark: PardusDB vs HelixDB

Comparison against HelixDB, an open-source graph-vector database built in Rust.

Test Configuration:

  • Vector dimension: 128
  • Number of vectors: 10,000
  • Number of queries: 100
  • Top-K: 10

Results

Database Insert (10K vectors) Search (100 queries) Single Search
PardusDB 14ms (696K/s) 280µs (357K/s) 2µs
HelixDB 2.87s (3.5K/s) 17ms (5.8K/s) 172µs

Speedup

Operation PardusDB Advantage
Insert 200x faster
Search 62x faster

Feature Comparison

Feature PardusDB HelixDB
Architecture Embedded (SQLite-like) Server (Docker)
Implementation Rust (native) Rust (native)
Vector Index HNSW (optimized) HNSW
Graph Support No Yes
Deployment Single binary/file Docker + CLI
Setup Time 0 seconds 5-10 minutes
Memory Overhead Minimal (~50MB) Docker container
Query Language SQL-like HelixQL
Network Latency None (in-process) HTTP API overhead
Persistence Single file (.pardus) LMDB
License MIT AGPL-3.0

Run the benchmark yourself:

# Without HelixDB (PardusDB only)
cargo run --release --bin benchmark_helix

# With HelixDB comparison (requires HelixDB running)
curl -sSL "https://install.helix-db.com" | bash
mkdir helix_bench && cd helix_bench
helix init
# Add schema.hx and queries.hx for vectors
helix push dev
cargo run --release --features helix --bin benchmark_helix

Examples

Rust Example

A complete RAG example demonstrating PardusDB's features:

cargo run --example simple_rag --release

This shows:

  • Creating tables with VECTOR columns
  • Individual inserts with insert_direct()
  • Batch inserts with insert_batch_direct()
  • Similarity search with search_similar()

Python Example

See examples/python/simple_rag.py — a RAG demo using Ollama for embeddings and PardusDB as the vector store.

cd examples/python
pip install requests
python simple_rag.py

Why We Built PardusDB

The Pardus AI team built PardusDB because we believe private, local-first AI tools should be accessible to everyone — from individual developers to large teams.

PardusDB gives you the low-level building block for fast, private vector search, while Pardus AI delivers the high-level no-code experience for analysts, marketers, and business users who just want answers from their data.

If you enjoy working with PardusDB, we’d love for you to try Pardus AI — upload your spreadsheets or documents and ask questions in plain English. Free tier available, no credit card required.

License

MIT License — use it freely in personal and commercial projects.


⭐ Star us on GitHub if you find this useful!
🚀 Building something cool with PardusDB? Share it with us on X or Discord — we’d love to hear from you.

Pardus AIhttps://pardusai.org/

About

SQLite-like embedded vector database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published