Torpe Hitachi | Devpost

Inspiration

The Torpe Hitachi Classifier was inspired by the growing need for robust, AI-driven document management in enterprise environments, where data security, compliance, and efficiency are paramount. In industries like finance, healthcare, and legal, mishandling sensitive documents can lead to severe consequences such as data breaches, regulatory fines, or reputational damage. We were particularly motivated by Google's Gemini 2.0 Flash model, which offers fast, multimodal AI capabilities for processing text and images. Combining this with Retrieval Augmented Generation (RAG) for policy grounding, Context Augmented Generation (CAG) for optimization, and Solana blockchain for immutable audits felt like a perfect synergy. Our goal was to create a system that not only classifies documents accurately but also incorporates human-in-the-loop (HITL) feedback for continuous improvement, drawing from real-world challenges in data governance and the desire to make AI tools more trustworthy and auditable.

What it does

The Torpe Hitachi Classifier is an enterprise-grade document classification system that categorizes uploaded PDFs into four levels: UNSAFE (harmful content), CONFIDENTIAL (trade secrets, PII like SSNs), SENSITIVE (internal memos, employee data), or PUBLIC (marketing materials). It processes documents using OCR for text and images, applies AI-driven classification with RAG and CAG for accuracy, and validates results with a dual-layer consensus mechanism (auto-approving at 90%+ confidence). All actions are logged immutably on Solana blockchain and SQLite for audits. A Flask-based web UI allows uploads, dashboards for stats, and HITL reviews, where corrections feed back into the system as few-shot examples to enhance future classifications. It ensures compliance by detecting PII patterns and providing structured outputs with reasoning, citations, and blockchain hashes.

How we built it

We built the system in three phases using Python 3.9+, with key technologies like Gemini 2.0 Flash, Solana devnet, Tesseract OCR, PyMuPDF, and Flask.

Phase 1: Foundation & Policy RAG: We created a knowledge base with JSON files for categories, PII patterns, and SME-validated examples. Integrated Gemini's File Search for RAG to ground classifications in policies. Developed document processing to handle PDFs, extract text/images via OCR, and map citations with bounding boxes.

Phase 2: Core AI Engine with RAG/CAG: Implemented a dynamic prompt tree for sequential classification (UNSAFE first for safety). Used Gemini's caching API for CAG on document content. Built structured JSON outputs and dual validation for confidence scoring.

Phase 3: Auditability, UX & Compliance: Integrated Solana for blockchain audits and SQLite for local logs. Created a Flask web app with endpoints for uploads, stats, and HITL queues. Ensured feedback loops update the RAG base automatically.

We managed dependencies via requirements.txt, configured API keys in .env, and tested with sample PDFs to verify end-to-end functionality.

Challenges we ran into

One major challenge was integrating multimodal processing—OCR accuracy varied with scanned PDFs, requiring tweaks to Tesseract and bounding box mapping for precise citations. Blockchain integration with Solana devnet posed issues like network outages, leading us to implement simulated hashes as fallbacks. Balancing AI speed and cost with CAG was tricky; initial prompts were too verbose, causing higher latency, so we optimized the prompt tree. Handling edge cases, like password-protected PDFs or ambiguous PII, required robust error handling. Finally, ensuring the HITL feedback loop updated the RAG without manual intervention involved automating JSON updates and reloads, which tested our modular design.

Accomplishments that we're proud of

We're thrilled with achieving an 85%+ auto-approval rate through dual validation, making the system efficient for high-volume use. Integrating Solana for immutable audits adds a layer of trust rarely seen in AI classifiers. The HITL feedback loop is a standout feature, turning human corrections into AI improvements seamlessly. Our web UI provides a user-friendly experience with real-time dashboards and audio availability for results. Overall, building a comprehensive, self-improving system from scratch that handles multimodal docs securely feels like a big win, especially as a demo project that could scale to production.

What we learned

This project taught us the power of hybrid AI architectures—RAG and CAG drastically reduce errors by grounding models in context. We gained expertise in blockchain for non-financial apps, appreciating Solana's speed for audits. Working with multimodal AI highlighted the importance of preprocessing (e.g., OCR calibration). We learned about compliance nuances, like PII regex patterns and hierarchical classification to prioritize safety. Full-stack development reinforced modular coding for maintainability, and testing revealed the value of few-shot learning for AI adaptability. Ultimately, we understood that enterprise tools thrive on balancing automation with human oversight.

What's next for Torpe Hitachi

Next, we plan to expand to more file types (e.g., DOCX, images) and integrate additional AI models for ensemble classification. We'll enhance blockchain features for mainnet production use and add role-based access in the UI for enterprise teams. Exploring federated learning could allow collaborative improvements across deployments without sharing data. We're also aiming for cloud deployment (e.g., via Docker) and open-sourcing parts to gather community feedback. Long-term, integrating real-time monitoring and API expansions could turn it into a full-fledged SaaS platform for document governance.