Inspiration
While studying RAG, I noticed even well-built pipelines miss critical information. Similarity alone doesn't guarantee completeness.
As a software architect, I'm concerned about the non-deterministic nature of LLMs and RAGs. Traditional software is predictable. RAG pipelines can silently fail — missing facts, hallucinating details. That makes them hard to trust and debug.
I wondered: what if the system could learn from its own mistakes?
What it does
EvoContext is a CLI tool that demonstrates adaptive context retrieval:
- Run 1 retrieves context using similarity alone — and misses critical facts
- Evaluation scores the answer and identifies what's missing
- Run 2 uses that feedback to retrieve better — and scores higher
The trace explains exactly why.
How we built it
Qdrant for vector retrieval, rule-based evaluation rubrics, and feedback-driven query expansion. Every run is traced for full observability.
Challenges we ran into
- Designing a scenario where Run 1 reliably fails without feeling artificial
- Fighting non-determinism with temperature 0, fixed prompts, and strict output contracts
Accomplishments that we're proud of
A reproducible demo where improvement isn't magic — it's traceable and explainable.
What we learned
- Similarity is necessary but not sufficient
- Observability turns a black box into something you can debug
What's next for EvoContext
- More domains: Testing beyond policy Q&A — runbooks, legal docs, technical manuals
- Smarter memory: Usefulness-weighted retrieval that improves over many runs, not just two
- Integration: Packaging as a library that plugs into existing RAG pipelines
- Benchmarking: Measuring improvement across diverse question types and document sets
The long-term goal: RAG systems that get better with use — not just at inference, but at knowing what to retrieve.
Built With
- .net
- openai
- qdrant
Log in or sign up for Devpost to join the conversation.