Inspiration

While studying RAG, I noticed even well-built pipelines miss critical information. Similarity alone doesn't guarantee completeness.

As a software architect, I'm concerned about the non-deterministic nature of LLMs and RAGs. Traditional software is predictable. RAG pipelines can silently fail — missing facts, hallucinating details. That makes them hard to trust and debug.

I wondered: what if the system could learn from its own mistakes?

What it does

EvoContext is a CLI tool that demonstrates adaptive context retrieval:

  • Run 1 retrieves context using similarity alone — and misses critical facts
  • Evaluation scores the answer and identifies what's missing
  • Run 2 uses that feedback to retrieve better — and scores higher

The trace explains exactly why.

How we built it

Qdrant for vector retrieval, rule-based evaluation rubrics, and feedback-driven query expansion. Every run is traced for full observability.

Challenges we ran into

  • Designing a scenario where Run 1 reliably fails without feeling artificial
  • Fighting non-determinism with temperature 0, fixed prompts, and strict output contracts

Accomplishments that we're proud of

A reproducible demo where improvement isn't magic — it's traceable and explainable.

What we learned

  • Similarity is necessary but not sufficient
  • Observability turns a black box into something you can debug

What's next for EvoContext

  • More domains: Testing beyond policy Q&A — runbooks, legal docs, technical manuals
  • Smarter memory: Usefulness-weighted retrieval that improves over many runs, not just two
  • Integration: Packaging as a library that plugs into existing RAG pipelines
  • Benchmarking: Measuring improvement across diverse question types and document sets

The long-term goal: RAG systems that get better with use — not just at inference, but at knowing what to retrieve.

Built With

Share this project:

Updates