EvoContext | Devpost

Inspiration

While working with Retrieval-Augmented Generation (RAG), I noticed something uncomfortable:

Even well-built pipelines miss critical information.

Similarity-based retrieval often produces answers that look correct, but incomplete. There is no built-in way to detect what is missing or to verify improvement.

As a software architect, this is a problem. Traditional systems are predictable and testable. RAG systems can silently fail.

The question became:

Can a system detect its own mistakes and improve within the same execution?

What it does

EvoContext runs the same question twice and improves the result.

Run 1
Retrieves context using similarity, generates an answer, and produces a score (example: 62/100).

Evaluation
Identifies missing facts and produces targeted feedback.

Run 2
Uses that feedback to retrieve better context, generates a new answer, and scores higher (example: 84/100).

The system shows exactly what changed and why.

How we built it

EvoContext combines:

Vector retrieval (Qdrant)
Structured chunking with overlap
Rule-based evaluation (no LLM judge)
Feedback-driven query expansion
Full trace logging for every run

The evaluation layer defines expected facts and scoring rules, making the outcome deterministic and reproducible.

Challenges we ran into

Designing a scenario where Run 1 fails consistently without feeling artificial
Controlling LLM variability (temperature 0, strict output structure)
Preventing false positives in evaluation (negation handling, rule precision)
Making improvement measurable instead of subjective

Accomplishments that we're proud of

A demo where improvement is visible, measurable, and reproducible.

The system does not just generate a better answer. It shows:

which documents were retrieved
which facts were missing
how retrieval changed
how the score improved

What we learned

Similarity alone does not guarantee completeness
Evaluation must be explicit and rule-based to be reliable
Observability is required to debug RAG systems
Feedback loops can turn static pipelines into adaptive systems

What's next for EvoContext

Apply the approach to regulated domains (legal, medical, operations)
Extend beyond two runs into continuous improvement loops
Build tooling for authoring evaluation profiles at scale
Integrate as a component inside existing RAG systems

The goal is simple: systems that can measure their own quality and improve it.

Built With

.net
openai
qdrant

Updates

Manos Kelaiditis started this project — Mar 04, 2026 08:45 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.