GitLab Oracle | Devpost

Inspiration

Every developer has stared at a cryptic piece of legacy code and wondered, "Why was this built this way?" — afraid to touch it, afraid to leave it. This is Chesterton's Fence: the principle that you should never remove something until you understand why it was put there. Studies show developers spend 3-5 hours per week investigating legacy code, and 68% report uncertainty when modifying it. The result is a lose-lose — leave bad code alone and accumulate invisible technical debt, or change it blindly and risk production incidents.

We built GitLab Oracle to break this cycle.

What it does

GitLab Oracle is a RAG-powered semantic search engine that reconstructs the why behind your code. It ingests your GitLab project's complete history — merged MRs, code review discussions, commit messages, and linked issues — and makes it instantly searchable with natural language queries.

Ask "Why was the auth timeout set to 30 seconds?" and get a cited answer pulled directly from the MR discussion where that decision was made.

How we built it

We designed a three-stage pipeline:

Ingestion — A job fetches merged MRs and linked issues via the GitLab API, chunks content into semantic units (512 tokens max), and generates embeddings using all-mpnet-base-v2 sentence transformers. A pool of 5-10 async workers handles concurrent processing with rate limiting.
Storage & Search — Chunks and their 768-dimensional embeddings are stored in PostgreSQL with the pgvector extension. Queries are embedded at search time and matched via cosine similarity. A multi-factor ranker scores results by similarity (40%), recency (20%), discussion depth (20%), source type (10%), and resolution state (10%).
Delivery — Results are served through a FastAPI REST API. The server APIs integrate natively with AI coding tools like Claude Code, Cursor, and GitLab Duo Workflows — putting historical context right where developers need it.

Challenges we ran into

Rate limiting at scale — GitLab's API has strict rate limits. We implemented exponential backoff with thread-safe rate limiting to stay within bounds while processing thousands of MRs.
Chunk quality — Naive text splitting lost context across chunk boundaries. We iterated on token-aware chunking that preserves semantic coherence within the 512-token window.
Ranking relevance — Pure vector similarity surfaced old, irrelevant results. Adding recency, discussion depth, and source-type weighting dramatically improved answer quality.
Agent search loops — Early MCP agent prompts caused AI clients to loop endlessly through search results. We tightened the prompt to prevent this.

What we learned

The biggest insight was that the most valuable engineering knowledge isn't in the code — it's in the conversations around the code. MR discussions, review comments, and issue threads contain the reasoning, trade-offs, and constraints that shaped every decision. By making this searchable, we turned tribal knowledge into institutional memory.