BioMatch: AI-Driven Patient Recruitment & Clinical Trial Synchronization

Inspiration

Clinical trial recruitment remains one of the most significant bottlenecks in medical advancement. Clinical trials fail to meet their enrollment timelines, leading to substantial delays in life-saving treatments. This inefficiency is largely due to the gap between complex medical eligibility criteria and fragmented patient data. BioMatch was developed to bridge this gap using high-precision semantic search and automated data normalization.

What it does

BioMatch is a dual-sided coordination platform designed to streamline the connection between patients and clinical research.

Semantic Patient Profiling: Patients provide medical history, symptoms, and current medications, which are processed to create high-dimensional medical embeddings.
Vector-Based Matching: The engine utilizes Natural Language Processing (NLP) and Transformers to perform semantic matches, identifying correlations between patient descriptions and trial requirements that traditional keyword searches would overlook.
Researcher Oversight: Research professionals can claim official studies from ClinicalTrials.gov and access a prioritized leaderboard of eligible participants based on a multi-factor scoring algorithm.

How we built it

Backend Architecture: Developed with FastAPI, managing a PostgreSQL database integrated with the pgvector extension for efficient vector similarity operations, and AWS-deployed agentic workflow.
Data Pipeline: An automated ETL pipeline was engineered to ingest data from ClinicalTrials.gov and Agentic Medical Anamneses, normalizing heterogeneous data points—such as age ranges and study phases—into standardized formats.
NLP & Embeddings: Study summaries and patient profiles are transformed into vectors using the all-MiniLM-L6-v2 Transformer model.
Frontend: A responsive interface built with React, Tailwind CSS, and Framer Motion, designed for clarity and clinical professional standards.
Authentication: Multi-role identity management (Patient, Researcher) implemented via Clerk.

Challenges we ran into

Natural Language Normalization: Standardizing diverse age formats (e.g., "6 months" vs "18 years") and complex exclusion criteria into queryable numerical constraints.
Database Schema Optimization: Transitioning from a fragmented relational model to a Unified User Model using PostgreSQL JSONB to handle the inherent semi-structured nature of medical data.
Cross-Origin Resource Sharing (CORS) & Auth Integration: Ensuring secure, low-latency synchronization between the Clerk authentication provider and the internal SQLAlchemy relational logic.

Accomplishments that we're proud of

High-Performance Retrieval: Implementing a matching engine capable of sub-second similarity searches across extensive clinical trial datasets.
Unified Data Model: Successfully centralizing patient and researcher data into a single scalable architecture.
Algorithmic Accuracy: Engineering a scoring system that balances broad semantic similarity with strict medical eligibility filters.

What we learned

Schema Flexibility: In health-tech, utilizing flexible storage like JSONB is critical for accommodating the evolving requirements of medical metadata without frequent migration downtime.
Semantic Utility: We confirmed that semantic search significantly outperforms keyword-based systems in clinical settings, where terminology varies widely between patients and professionals.
Infrastructure Reliability: The importance of robust error handling when integrating third-party government APIs and real-time embedding generation.

What's next for BioMatch

RAG-Based Explanations: Implementing Retrieval-Augmented Generation (RAG) to provide patients with natural language explanations of why they matched a specific trial.
Internationalization: Expanding the pipeline to include multi-language clinical trial databases and cross-border research coordination.