Inspiration
Walking into a massive, “vibecoded” repo can feel like being dropped into a city without a map, stressful, slow, lonely. We built NaviGit to turn that anxiety into momentum: a private, offline guide that shows you where everything lives, how pieces connect, and what matters now, so you move from doubt to flow, fast. Current solutions don’t suffice: cloud assistants risk code leakage and add latency, keyword search misses intent and architecture, LLMs hallucinate when context is thin, and docs/diagrams go stale, leaving you guessing instead of building.
What it does
NaviGit ingests your repo locally, builds a semantic index and a code graph, and answers deep questions about structure and behavior with citations to exact files and lines. Ask “what wraps this function,” “who calls this,” or “how does auth flow end-to-end,” and it returns precise, grounded answers—no hallucinations, no data leaving your machine. In practice, it feels like pairing with a senior teammate who knows the entire codebase and respects your privacy.
How we built it
Ingestion: Parse the repo, respect .gitignore, normalize paths, and extract symbols via ASTs for each language (functions, classes, imports, inheritance, calls).
Chunking: Token- and syntax-aware splits (per function/class/module) with overlap to preserve context; docstrings and comments are linked to code.
Embeddings: Generate local vector embeddings for each chunk and symbol summary; store in a fast ANN index (e.g., FAISS/HNSW) on disk.
Code graph: Build a multi-edge graph (calls, imports, inheritance, composition, file structure, tests ↔ sources). Each node links back to file:line.
Query: Convert questions into vectors, retrieve nearest chunks, expand via graph traversal for relational context, and compose answers locally. The composer is constrained to retrieved context only, with source citations and confidence indicators.
Privacy/Guardrails: Air‑gapped by default—no network calls, no telemetry. The system abstains when context is insufficient rather than guessing.
Challenges we ran into
Grounding over guessing: Forcing the answerer to cite retrieved context and abstain when unsure required careful prompt design, context windows, and retrieval filters.
Language diversity: Building robust parsers and symbol extraction across Python/TS/Go/Java without brittle heuristics.
Speed at scale: Keeping retrieval sub‑100ms and end‑to‑end responses ~4s on large repos meant optimizing chunk sizes, HNSW parameters, disk layout, and caching.
UX trust: Presenting answers with clear “why” (citations, graph hops) to reduce anxiety and build user confidence.
Accomplishments that we're proud of
Speed: Typical deep code questions answered in
4s vs ~18s on a comparable cloud tool—about 4.5× faster (78% less wait).Fidelity: Zero‑hallucination policy in practice—answers are grounded in cited files/lines, or the system abstains.
Understanding: “What wraps what,” cross‑module flows, and dependency chains visualized via the code graph, turning days of onboarding into hours.
What we learned
Speed is emotional: Faster answers don’t just save time, they preserve focus and reduce stress.
Grounding builds trust: Citations and graph context make insights actionable; guesswork erodes confidence.
Privacy is non‑negotiable: Offline by default removes a major barrier for teams with sensitive programs and codebases and regulated data.
What's next for NaviGit
Incremental indexing and background watch for near‑instant updates as code changes.
Deeper IDE integration (inline explanations, hover graphs, “why did this test fail?” one‑click traces).
Multi‑repo reasoning and cross‑service maps for microservices and monorepos.
Visual architecture explorer: interactive call/import graphs and dataflow diagrams.
Local model optimizations (quantization, GPU/CPU selection) and team‑mode knowledge sharing without sharing code.
An online version with Zero Knowldege Architecture
Log in or sign up for Devpost to join the conversation.