Inspiration
While working with Retrieval-Augmented Generation (RAG), I noticed something uncomfortable:
Even well-built pipelines miss critical information.
Similarity-based retrieval often produces answers that look correct, but incomplete. There is no built-in way to detect what is missing or to verify improvement.
As a software architect, this is a problem. Traditional systems are predictable and testable. RAG systems can silently fail.
The question became:
Can a system detect its own mistakes and improve within the same execution?
What it does
EvoContext runs the same question twice and improves the result.
Run 1
Retrieves context using similarity, generates an answer, and produces a score (example: 62/100).
Evaluation
Identifies missing facts and produces targeted feedback.
Run 2
Uses that feedback to retrieve better context, generates a new answer, and scores higher (example: 84/100).
The system shows exactly what changed and why.
How we built it
EvoContext combines:
- Vector retrieval (Qdrant)
- Structured chunking with overlap
- Rule-based evaluation (no LLM judge)
- Feedback-driven query expansion
- Full trace logging for every run
The evaluation layer defines expected facts and scoring rules, making the outcome deterministic and reproducible.
Challenges we ran into
- Designing a scenario where Run 1 fails consistently without feeling artificial
- Controlling LLM variability (temperature 0, strict output structure)
- Preventing false positives in evaluation (negation handling, rule precision)
- Making improvement measurable instead of subjective
Accomplishments that we're proud of
A demo where improvement is visible, measurable, and reproducible.
The system does not just generate a better answer. It shows:
- which documents were retrieved
- which facts were missing
- how retrieval changed
- how the score improved
What we learned
- Similarity alone does not guarantee completeness
- Evaluation must be explicit and rule-based to be reliable
- Observability is required to debug RAG systems
- Feedback loops can turn static pipelines into adaptive systems
What's next for EvoContext
- Apply the approach to regulated domains (legal, medical, operations)
- Extend beyond two runs into continuous improvement loops
- Build tooling for authoring evaluation profiles at scale
- Integrate as a component inside existing RAG systems
The goal is simple: systems that can measure their own quality and improve it.
Built With
- .net
- openai
- qdrant
Log in or sign up for Devpost to join the conversation.