PageIndex reposted this
🧠 Built a Vectorless RAG — no embeddings, no vector DB, no cosine similarity. Most RAG pipelines follow the same pattern: → Chunking → Embedding → vector DB → Query by similarity search To challenge that assumption. I built a RAG system using Google Gemini + PageIndex that skips the entire vector layer entirely — and yes, Gemini because it's free 😂 --- ⚙️ How it works: Istead of embedding-based retrieval, it uses PageIndex to index document pages directly. At query time, Gemini reasons over the indexed pages to find and synthesize relevant information — no embeddings generated, no ANN lookup. --- ✅ Advantages: • No vector DB setup or maintenance overhead • No embedding model needed — reduces infra cost and latency • Simpler pipeline: fewer moving parts = easier to debug • Works well for structured/paginated documents (PDFs, reports) • Faster to prototype and deploy --- ⚠️ Limitations: • Less effective for large unstructured corpora where semantic similarity shines • Dependent on Gemini's reasoning quality for relevance judgment • PageIndex is optimized for document-style inputs; less flexible for arbitrary text chunks --- This is more of an architectural experiment than a production-ready system, but it opens up an interesting question: Do we always need vectors for retrieval-augmented generation? For certain document-heavy, structured use cases — maybe not. Created a simple repo to test it's working 🔗 Repo: https://lnkd.in/gE8wyRDX Inspired by Krish Naik's work — his video was a key reference while building this. Check out his channel if you're diving into AI/ML! 🔗 Video URL: https://lnkd.in/ghzKykWg