Inspiration

In many developing regions, including parts of South Asia, the "Digital Health" revolution has hit a major roadblock: Paper.

Patient history is often trapped in fragmented, handwritten prescription slips that look more like scribbles than medical records. A 2023 study showed that medication errors cause at least one death every day and injure approximately 1.3 million people annually in the US alone, the numbers in developing nations are likely far higher due to illegible handwriting leading to dispensing errors.

We saw a massive gap between High-Tech AI and Low-Tech Reality. We didn't just want to build another chatbot; we wanted to build a bridge. We were inspired to create RxTract to serve as an "Autonomous Guardian" a system that doesn't just "read" text, but understands medical context, catches dangerous drug interactions, and digitizes patient history from a single photo, potentially saving lives in the process.

What it does

RxTract is an intelligent, multi-agent system that converts chaotic, handwritten medical documents into structured, life-saving data.

  1. Visual Extraction: The user uploads a photo of a handwritten prescription or lab report.
  2. Intelligent Digitization: Our fine-tuned PaddleOCR-VL model cuts through the noise, deciphering cursive handwriting, medical abbreviations (e.g., "OD", "BD"), and complex layouts.
  3. Active Reasoning: The data is passed to ERNIE 4.5 (fine-tuned via Unsloth), which acts as a "Digital Pharmacist." It structures the unstructured text into a standard JSON format.
  4. Safety Auditing: This is where the magic happens. Using CAMEL-AI, we spawn two agents:
  5. The Scribe Agent: Proposes the digitized text.
  6. The Auditor Agent: Cross-references the text with medical knowledge. If the Scribe reads "5000mg Aspirin," the Auditor flags it as a likely OCR error (since the max dose is lower) and corrects it to "500mg."
  7. Risk Alerting: The system finally outputs a "Digital Health Passport" and alerts the user to any potential drug-to-drug interactions based on their previous history.

How we built it

We architected RxTract as a pipeline of specialized experts rather than a single monolithic model.

  1. The Vision Layer (PaddleOCR-VL) We started by tackling the hardest part: Handwriting. We used PaddlePaddle to fine-tune PaddleOCR-VL. We curated a dataset of synthetic and real noisy medical receipts.
  2. Technique: We employed transfer learning on the text detection head to better recognize the small, cramped fonts often found on doctor's pads.

  3. The Brain (ERNIE + Unsloth) We needed a model that understood medical context. We utilized Unsloth to fine-tune ERNIE efficiently.

  4. Fine-Tuning: We used Unsloth's optimized training pipeline to teach ERNIE to output strict JSON schemas from messy text inputs. We minimized the loss function

  5. The Multi-Agent System (CAMEL-AI)To reduce hallucinations, we implemented a "Role-Playing" framework using CAMEL-AI.We defined a Pharmacist role and a DataEntry role. The Inception Prompting mechanism ensured that the agents stayed in character, debating the ambiguity of specific handwritten words until a consensus confidence score threshold (theta > 0.95) was reached.

Challenges we ran into

  • The "Doctor's Handwriting" Paradox: Standard OCR models failed miserably on cursive medical abbreviations (e.g., confusing "QD" (once a day) with "QID" (four times a day)). We had to aggressively augment our training data with noise, rotation, and blur to make the PaddleOCR model robust.
  • Hallucination Risks: Early versions of the LLM would invent drugs that didn't exist to fill in gaps. Implementing the CAMEL-AI debate loop solved this; the second agent effectively "grounded" the first one by asking, "Does this drug dosage make sense medically?"
  • Integration: Connecting the Baidu AI Studio API with the Unsloth local training workflow required building a custom wrapper to handle tokenization differences.

Accomplishments that we're proud of

  • Unsloth Efficiency: We achieved a 2x faster training speed and 60% less memory usage during the ERNIE fine-tuning process compared to standard methods, allowing us to run on consumer-grade hardware.
  • Accuracy Boost: Our fine-tuned PaddleOCR model achieved an 89% accuracy rate on our "Hard" test set of cursive prescriptions, compared to just 65% for the base model.
  • Real-World Utility: We successfully processed a completely crumpled, coffee-stained prescription during our final internal demo, and the system correctly identified a "Penicillin" allergy warning.

What we learned

  • The Power of Ecosystems: We learned that combining specialized tools (Paddle for Vision, ERNIE for Logic) beats a "Jack of all trades" model every time.
  • Agentic Workflows: We gained deep insight into Prompt Engineering for agents. Getting two AI agents to "argue" productively requires very specific system prompts.
  • Data Quality is King: The quality of our OCR fine-tuning data was the single biggest factor in our success.

What's next for RxTract

  • Edge Deployment: We plan to port the inference model to the D-Robotics RDK X5 kit to create a standalone "Kiosk" for rural pharmacies that don't have stable internet.
  • FHIR Standard Integration: We aim to make our JSON output fully compliant with the Fast Healthcare Interoperability Resources (FHIR) standard so it can push data directly into hospital EMR systems.
  • Mobile App: Wrapping the web interface into a React Native app for on-the-go scanning by patients.

Built With

  • baidu-ai-studio
  • camel-ai
  • ernie-4.5
  • llm
  • opencv
  • paddlepaddle
  • python
  • streamlit
  • unsloth
Share this project:

Updates