🏠 Document & Policy Assistant

AI-powered mortgage document extraction and intelligent policy Q&A system with RAG (Retrieval-Augmented Generation).

🚀 Features

📤 Smart Document Upload - Multi-file drag & drop with automatic processing
🔍 Intelligent Text Extraction - OCR for images, PDF parsing, key-value pair extraction
💬 Unified AI Assistant - Query uploaded documents & official mortgage policies
📚 Policy Knowledge Base - Pre-trained on Fannie Mae, FHA, USDA, Freddie Mac guidelines (14,383+ chunks)
🔊 Text-to-Speech - Optional voice responses with ElevenLabs AI
🌐 Web Search Fallback - Live mortgage data when documents don't contain the answer
🎯 Smart Document Routing - Single-page: key-value extraction | Multi-page: RAG indexing

🛠️ Tech Stack

Backend

Technology	Purpose
FastAPI	High-performance Python web framework
Uvicorn	ASGI server
Python 3.9+	Runtime

AI & Machine Learning

Technology	Purpose
Google Gemini 2.0 Flash	Large language model for intelligent responses
Sentence Transformers	Text embeddings (all-MiniLM-L6-v2)
ChromaDB	Vector database for document similarity search
LangChain	Text splitting and chunking utilities
ElevenLabs	Text-to-speech AI (multilingual)
SerpAPI	Real-time web search integration

Document Processing

Technology	Purpose
Google Document AI	Advanced OCR & entity extraction (optional)
PyPDF2 / pdfplumber	PDF text extraction
Pytesseract	OCR for images
Pillow	Image processing

Frontend

Technology	Purpose
React 18	UI framework
TypeScript	Type-safe JavaScript
Tailwind CSS	Utility-first styling
React Router	Client-side routing

📦 Installation

Prerequisites

# Python 3.9+
python3 --version

# Node.js 16+
node --version

# Tesseract OCR (for image text extraction)
brew install tesseract  # macOS
# or: sudo apt-get install tesseract-ocr  # Linux

Backend Setup

cd backend

# Install Python dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env with your API keys:
# - GEMINI_API_KEY
# - SERPAPI_API_KEY
# - ELEVENLABS_API_KEY (optional, for TTS)

# Start backend server
python3 -m uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

cd frontend

# Install dependencies
npm install

# Build production version
npm run build

# Serve static build
npx serve -s build -l 3000

# Or run development server
npm start

🔑 API Keys Required

Essential

GEMINI_API_KEY - Get from Google AI Studio
SERPAPI_API_KEY - Get from SerpAPI

Optional

ELEVENLABS_API_KEY - Get from ElevenLabs (for TTS)
DOCAI_PROJECT_ID - Google Cloud Document AI (fallback uses free OCR)

Add these to backend/.env:

GEMINI_API_KEY="your_key_here"
SERPAPI_API_KEY="your_key_here"
ELEVENLABS_API_KEY="your_key_here"  # Optional

🏗️ Architecture

System Flow

┌─────────────────────────────────────────────────────┐
│  Frontend (React + TypeScript + Tailwind)          │
│  http://localhost:3000                              │
└────────────────────┬────────────────────────────────┘
                     │
                     ↓ HTTP/REST API
┌─────────────────────────────────────────────────────┐
│  Backend (FastAPI + Python)                         │
│  http://localhost:8000                              │
├─────────────────────────────────────────────────────┤
│  ┌─────────────────┐  ┌─────────────────┐          │
│  │ Document AI     │  │ Mortgage KB     │          │
│  │ OCR + Extraction│  │ RAG System      │          │
│  └─────────────────┘  └─────────────────┘          │
│                                                      │
│  ┌─────────────────────────────────────┐           │
│  │ ChromaDB Vector Database             │           │
│  │ • Policy Collection: 14,383 chunks   │           │
│  │ • User Collection: Dynamic           │           │
│  └─────────────────────────────────────┘           │
│                                                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐         │
│  │ Gemini   │  │ SerpAPI  │  │ ElevenLabs│         │
│  │ AI       │  │ Search   │  │ TTS       │         │
│  └──────────┘  └──────────┘  └──────────┘         │
└─────────────────────────────────────────────────────┘

Document Processing Pipeline

Upload File
    ↓
Page Count Detection
    ↓
┌───────────┴───────────┐
│                       │
1 Page              2+ Pages
│                       │
Key-Value           Key-Value +
Extraction          RAG Indexing
    ↓                   ↓
Display          Queryable in Chat

📊 API Endpoints

Document Management

POST   /api/documents/upload          Upload files
POST   /api/documents/{id}/process    Process & extract
GET    /api/documents                 List all documents
GET    /api/documents/{id}            Get document details
DELETE /api/documents/{id}            Delete document

Knowledge Base & Chat

POST   /api/mortgage-kb/query         Query unified KB (ChromaDB → Web)
GET    /api/mortgage-kb/stats         Get KB statistics
POST   /api/mortgage-kb/tts           Text-to-speech conversion

Legacy Endpoints (Still Available)

POST   /api/rag/query                 RAG query with web search
GET    /api/rag/stats                 RAG statistics

🗂️ Project Structure

HackUTA/
├── backend/
│   ├── main.py                    # FastAPI application
│   ├── routers.py                 # API endpoints
│   ├── mortgage_kb_service.py     # Unified RAG + TTS
│   ├── document_ai_service.py     # OCR & extraction
│   ├── rag_service.py             # RAG with web search
│   ├── storage_service.py         # Google Cloud Storage
│   ├── audio_cache/               # TTS audio files
│   ├── uploads/                   # User uploaded files
│   ├── chroma_storage/            # Vector DB
│   └── RAG/
│       └── documents/             # Policy PDFs (14K+ chunks)
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── DocumentUpload.tsx
│   │   │   ├── MortgageKnowledgeBase.tsx  # Unified chat
│   │   │   ├── ExtractedDataView.tsx
│   │   │   └── Header.tsx
│   │   ├── pages/
│   │   │   └── Dashboard.tsx
│   │   └── config/
│   │       └── constants.ts
│   └── build/                     # Production build
│
├── requirements.txt               # Python dependencies
└── README.md                      # This file

💡 How It Works

1. Document Upload

User uploads PDFs, images, or text files
System detects page count automatically
Single-page: Extract key-values (invoices, IDs)
Multi-page: Extract + add to RAG for Q&A

2. Intelligent Query System

Priority Order:

🥇 ChromaDB - Search user documents + policy documents first
🥈 Web Search - Only if no relevant docs found in ChromaDB
🥉 Gemini AI - Generate intelligent answers from context

3. Text-to-Speech (Optional)

Toggle TTS on/off in UI
Responses read aloud with professional voice
Cached for instant replay (saves API calls)

🎯 Use Cases

For Mortgage Professionals

✅ Upload client loan applications
✅ Extract key information automatically
✅ Query multi-page loan agreements
✅ Compare against policy guidelines

For Homebuyers

✅ Understand document requirements
✅ Get mortgage process guidance
✅ Ask policy-related questions
✅ Organize loan documents

🧪 Testing

Test Document Upload

# Visit http://localhost:3000
# Upload a PDF or image
# Watch automatic processing
# View extracted key-value pairs

Test Knowledge Base

# In the Document & Policy Assistant:
# Ask: "What are Fannie Mae debt-to-income requirements?"
# Get answer with policy document citations

Test TTS

# Enable TTS toggle (purple checkbox)
# Ask any question
# Hear response read aloud

📈 Performance

Document Processing

Single-page extraction: < 2 seconds
Multi-page RAG indexing: ~1 second per page
OCR fallback: 2-5 seconds (images)

Query Response Time

ChromaDB search: < 500ms
Gemini generation: 1-3 seconds
Cached TTS: < 50ms (instant!)
New TTS: 2-4 seconds

Knowledge Base

14,383+ policy document chunks indexed
Semantic search across 500+ pages
Sub-second retrieval

🔒 Security Notes

✅ CORS configured for local development
✅ API keys stored in .env (never committed)
✅ File uploads validated and sanitized
✅ UUID-based file naming prevents collisions
⚠️ For production: Add authentication, rate limiting, and HTTPS

🐛 Troubleshooting

Backend won't start

# Check Python version
python3 --version  # Must be 3.9+

# Install dependencies
pip install -r requirements.txt

# Check API keys
grep -E "GEMINI|SERP" backend/.env

TTS not working

# Verify ElevenLabs key is set
grep ELEVENLABS backend/.env

# Check backend logs
tail -20 backend/backend.log | grep -i elevenlabs

Documents not queryable

Single-page docs are NOT queryable (by design)
Only multi-page PDFs are added to RAG
Check processing logs for "added to RAG" message

📚 Dependencies

Python (Backend)

google-generativeai  # Gemini AI
chromadb            # Vector database
sentence-transformers  # Embeddings
elevenlabs          # Text-to-speech
google-search-results  # SerpAPI
fastapi             # Web framework
pdfplumber          # PDF processing
pytesseract         # OCR

Node.js (Frontend)

react              # UI framework
typescript         # Type safety
tailwindcss        # Styling
react-router-dom   # Routing

🎨 Screenshots

Main Dashboard

Document upload with drag & drop
Unified AI assistant interface
TTS toggle for voice responses
Document list with status tracking

Knowledge Base Chat

Color-coded sources (blue = user docs, green = policies)
Real-time query with source citations
Suggested questions
Auto-play audio responses

🤝 Contributing

This project was built for HackUTA. Key features:

Clean, modular architecture
Comprehensive error handling
Persistent vector storage
Intelligent fallback systems
Production-ready codebase

📝 License

MIT License - Built for HackUTA 2025

🌟 Credits

Technologies Used:

Google Gemini AI
ElevenLabs TTS
ChromaDB Vector Database
Sentence Transformers
SerpAPI
FastAPI
React + TypeScript

Built with ❤️ at HackUTA

🚀 Quick Start

# 1. Clone and install
cd backend && pip install -r requirements.txt
cd frontend && npm install

# 2. Configure API keys in backend/.env
GEMINI_API_KEY="your_key"
SERPAPI_API_KEY="your_key"
ELEVENLABS_API_KEY="your_key"  # Optional

# 3. Start services
cd backend && python3 -m uvicorn main:app --reload --host 0.0.0.0 --port 8000 &
cd frontend && npx serve -s build -l 3000 &

# 4. Open browser
open http://localhost:3000

You're ready to go! 🎉

For detailed documentation, see:

System architecture & API docs: http://localhost:8000/docs
Frontend: http://localhost:3000

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🏠 Document & Policy Assistant

🚀 Features

🛠️ Tech Stack

Backend

AI & Machine Learning

Document Processing

Frontend

📦 Installation

Prerequisites

Backend Setup

Frontend Setup

🔑 API Keys Required

Essential

Optional

🏗️ Architecture

System Flow

Document Processing Pipeline

📊 API Endpoints

Document Management

Knowledge Base & Chat

Legacy Endpoints (Still Available)

🗂️ Project Structure

💡 How It Works

1. Document Upload

2. Intelligent Query System

3. Text-to-Speech (Optional)

🎯 Use Cases

For Mortgage Professionals

For Homebuyers

🧪 Testing

Test Document Upload

Test Knowledge Base

Test TTS

📈 Performance

Document Processing

Query Response Time

Knowledge Base

🔒 Security Notes

🐛 Troubleshooting

Backend won't start

TTS not working

Documents not queryable

📚 Dependencies

Python (Backend)

Node.js (Frontend)

🎨 Screenshots

Main Dashboard

Knowledge Base Chat

🤝 Contributing

📝 License

🌟 Credits

🚀 Quick Start

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages