EvalGuard | Devpost

schema to extract
result

Inspiration

We wanted to address the lack of transparency in AI evaluations, especially when using general models for fact-checking and data generation. Our goal was to build a solution that provides clarity and trust, easy integration with various domain.

Currently available as docker container, API.

What it does

EvalGuard automates fact-checking by verifying data against trusted sources like news articles, scientific research papers, legal documents, and medical codes. It generates a "trustworthiness score" to support human judgment. Additionally, it allows for the generation of reliable, domain-specific synthetic data.

Our tool comes with predefined schemas for:

News checker: Verifies data from news sources.
Scientific research checker: Cross-references academic papers.
Legal corpus checker: Fact-checks legal documents.
Medical coding: Ensures accuracy in healthcare data.

Users can also bring their own data sources and extraction schemas using a simple JSON configuration. We use Weave tracing to log the entire process, ensuring full transparency.

How we built it

We used several key technologies:

LangGraph for managing the flow of data.
OpenRouter for routing tasks.
Weave for Tracing to ensure transparency and process logging.
OpenAI models for fact-checking and data generation, though the models can be swapped based on user needs.
Tavaly API for search capabilities.

Challenges we ran into

Ensuring flexibility across different domains while maintaining transparency was a challenge. We needed to make sure our synthetic data generation and fact-checking pipelines worked seamlessly, regardless of the field.

Accomplishments that we're proud of

We built a transparent, customizable solution that automates fact-checking and data generation. The integration of Weave tracing and the ability for users to bring their own schemas is a major achievement. We're also proud of the predefined schemas for news, research, legal, and medical domains.

What we learned

We learned the value of transparency and flexibility when working with AI models in different domains. The integration of multiple tools, like Weave tracing and the Tavaly API, taught us how to build more adaptable solutions.

What's next for Verify Mate

We plan to expand our library of predefined schemas, refine our trustworthiness score system, and continue integrating more APIs to make the platform even more customizable and user-friendly.

Built With

langchain
langgraph
openai
python
weave

Updates

Livia Ellen started this project — Sep 22, 2024 04:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.