Inspiration

We were motivated by the widespread challenge of working with unstructured documents—files filled with text, tables, and images that are difficult to search through or extract meaningful insights from. Many businesses and organizations struggle with managing and understanding such data efficiently.

What it does

EpicRAG is a powerful solution designed to process and understand unstructured documents. It accepts files containing a mix of text, images, and tables, and intelligently breaks them down using a combination of Hugging Face and OpenAI models. The content is then used to train an OpenAI-powered chatbot, which can answer questions related to the document, enabling fast and structured information retrieval from even the most complex files.

How we built it

EpicRAG uses a hybrid chunking strategy to process documents and separate them into text, tables, and images. Text and tables are passed through the Hugging Face BGE-M3 model to generate vector embeddings. Images are processed using the OpenAI CLIP model to extract their vector representations. All embeddings are stored in a Chroma DB vector database, while the original file content is saved in a document database.

A custom retriever then pulls relevant information and metadata—including confidence scores, image paths, page numbers, and content descriptions—from both databases. This retrieved data is fed into an OpenAI chatbot, enabling it to answer user queries about the document with accuracy and context.

This chatbot is launched on a frontend developed using Streamlit.

Challenges we ran into

While not explicitly mentioned in the provided files, typical challenges for this type of system would include:

Properly handling and processing images from PDF documents Creating effective embeddings for both text and images Balancing retrieval time and answer quality Optimizing parameters such as text chunk count, image count, and temperature settings Integrating all components into a cohesive system Maintaining conversation context while keeping responses relevant

Accomplishments that we're proud of

The system has several notable accomplishments:

Successful integration of both text and image modalities in a RAG system Implementation of a parameter optimization framework that balances performance and quality Comprehensive evaluation metrics to measure system effectiveness Multiple user interfaces for different use cases Calculator tool integration for handling numerical queries Conversation history management for contextual follow-up questions

What we learned

The project demonstrates learning in:

Building effective multimodal retrieval systems Optimizing RAG parameters for best performance Evaluating RAG systems with meaningful metrics Balancing between retrieval speed and answer quality Working with advanced LLM APIs (Mistral AI) Creating effective interfaces for AI systems

What's next for EpicRag

Potential next steps could include:

Supporting additional document formats beyond PDFs Implementing more sophisticated image understanding Adding additional modalities (e.g., audio, video) Improving the optimization algorithms for better parameter tuning Creating more specialized versions for specific domains (e.g., financial, healthcare) Adding features for document comparison and summarization across multiple sources Implementing more advanced evaluation metrics for answer quality

Built With

  • langchain
  • multimodal-rag
  • python
  • rag
  • streamlit
Share this project:

Updates