The Problem

Annually, 18 million individuals attempt to immigrate to new countries, yet over 70% of these attempts fail. This failure rate is frequently not due to a lack of qualifications, but rather the prohibitive complexity of navigating global immigration systems. Government websites present information fragmented across dozens of pages, eligibility criteria are buried in dense legal jargon, and applicants often waste months of time and thousands of dollars only to discover they do not qualify.

This challenge was validated through direct experience assisting a family member with a skilled worker visa application. The official government website had information scattered across more than 15 different pages, lacked a clear eligibility calculator, and provided conflicting requirements. After observing the necessity of spending $2,000 on an immigration consultant for basic guidance, it became clear that a more efficient solution was required.

New Roots was conceptualized to address this gap by leveraging artificial intelligence to read, synthesize, and centralize real-time government immigration data, making this critical information accessible and actionable for everyone.

Product Overview

New Roots is an AI-powered immigration platform designed to make global mobility accessible. The platform's core functionalities include:

Intelligent Program Discovery

Using xAI's Grok-3 API, our proprietary scraper analyzes government immigration websites in real-time, automatically discovering and extracting comprehensive program data. Unlike traditional scrapers, this system understands semantic context. It identifies and aggregates related pages (eligibility, requirements, FAQs, application processes) and merges them intelligently into a single, coherent program profile.

Instant Eligibility Matching

Users create a detailed profile outlining their education, work experience, language skills, and nationality. Our matching algorithm instantly calculates eligibility scores across all available programs, showing precisely where they qualify and identifying any gaps in their qualifications.

Secure Document Vault

Applicants can store all necessary documents (passports, diplomas, work certificates) in a Firebase-backed vault, enabling reusability across multiple applications without redundant uploads.

Application Workflow Automation

Users can track application status, receive deadline reminders, and manage multiple immigration programs from a centralized dashboard. This eliminates the need to navigate numerous disparate government portals across different countries.

Government Portal (A Future Feature)

A planned feature includes a portal for immigration officials to review applications, request documents, and update applicant statuses directly within the platform.

Technical Architecture

Frontend (React + TypeScript + Firebase)

  • React 18 with TypeScript for type-safe, maintainable code
  • TailwindCSS for rapid, responsive UI development
  • TanStack Query for efficient data caching and real-time updates
  • Firebase Authentication for secure user management
  • Firebase Storage for document management with signed URLs

Backend (Django + Firebase Admin SDK)

  • Django REST Framework providing RESTful API endpoints
  • Firebase Firestore as our NoSQL database, streamlining development
  • Firebase Admin SDK for server-side authentication verification
  • Pydantic for robust data validation
  • Custom eligibility scoring algorithm weighing 8+ factors

Smart Scraper (Python + xAI Grok-3)

This is the core technological innovation of the platform:

  1. Multi-Page Discovery: For each program, Grok analyzes the main page and intelligently discovers related pages (eligibility criteria, document requirements, application steps, FAQs).

  2. Deep Extraction: Each page is fed to Grok-3 with a comprehensive prompt asking for maximum detail extraction, including 15+ FAQs, 7+ application stages, 10+ requirements, fees, processing times, and contact info.

  3. Intelligent Merging: Grok merges data from all related pages, resolving conflicts and creating one comprehensive program profile, often exceeding 120,000 characters of information per program.

  4. Quality Validation: A 70-point scoring system ensures only comprehensive programs (defined by 10+ FAQs, 7+ stages, 8+ benefits) are saved to Firebase.

  5. Automated Updates: GitHub Actions runs the scraper daily, keeping all program data current.

Key Technical Innovation: Traditional scrapers are brittle and break when website layouts change. By leveraging Grok's natural language understanding, our scraper adapts automatically. It understands the query "What documents do I need?" regardless of how different governments phrase it.

Infrastructure

  • Docker Compose for local development (reducing the stack to 2 containers from a traditional 9!).
  • GitHub Actions for CI/CD and automated scraping.
  • Firebase integration eliminates approximately 80% of typical DevOps complexity (removing the need for PostgreSQL, Redis, or an S3 alternative).

Development Challenges and Resolutions

1. Schema Inconsistency in NoSQL Database

A schema mismatch occurred where the frontend expected applicationProcess.steps[] while the scraper saved stages[]. This inconsistency led to a critical data loss incident during a cleanup operation, as the validation logic was referencing an incorrect field name.

Resolution: A data validation script was implemented to enforce schema consistency across all field mappings. The data restore script was updated with createdAt and updatedAt timestamps to enable proper backend querying.

2. API Timeout and Performance

Initial extractions from large, multi-page government websites (e.g., Canada's Express Entry) exceeded API timeout limits, with processes failing after 20+ minutes.

Resolution: Timeouts were extended from 60s to 180s, max_tokens were increased from 4000 to 6000, and a smart fallback logic was implemented. If multi-page merging fails, the system ingests the main page data to ensure service continuity. Retry logic with exponential backoff was also added.

3. Dynamic URL and Source Instability

Government websites reorganize frequently, leading to URL decay. Of our initial 18 program URLs, 8 returned 404 errors during scraping.

Resolution: Graceful error handling was implemented. The scraper now logs failures, continues to the next program, and generates a report at the end. This allows for manual updates to broken URLs in sources.yaml without halting the entire process.

4. Backend Cache Invalidation Issues

After restoring programs to Firebase, the data was not appearing on the frontend. This was traced to a stale backend cache, as the Django application had been running for over 3 hours without a restart.

Resolution: Health checks and a proper cache invalidation strategy were implemented. A simple Docker restart resolved the immediate issue.

5. Upstream API Deprecation

During development, we received a 404 error indicating the grok-beta model was deprecated, which broke the entire scraping pipeline.

Resolution: A one-line fix changing the model dependency to grok-3 resolved the issue. This update also improved results, as Grok-3 extracts even greater detail (16,728 chars vs. 15,851 for one Australian program).

Key Accomplishments

  • Scalable, Real-Time AI Data Extraction: Successfully scraped 6 comprehensive immigration programs from 6 countries, each with 15 FAQs, 7 application stages, and 10+ benefits—all extracted intelligently by xAI Grok-3.

  • Intelligent Multi-Source Data Fusion: Our scraper discovers up to 9 related pages per program and merges over 120,000 characters of data into coherent profiles. The Canada Express Entry profile combines 8 distinct government pages into one.

  • Streamlined, Production-Ready Architecture: A Firebase-first design eliminated 80% of typical backend complexity. No PostgreSQL, Redis, or MinIO were required, enabling deployment with just 2 Docker containers.

  • High-Fidelity Data Assurance: Our 70-point validation system ensures every program profile is comprehensive, prioritizing data quality and completeness over sheer volume.

  • Rapid Prototyping: A full-stack platform with intelligent scraping, user authentication, document management, and application workflow was developed in 36 hours.

Key Learnings

Technical Lessons

  • Efficacy of LLMs for Unstructured Data: xAI Grok is incredibly powerful for parsing unstructured web data, offering superior adaptability compared to traditional, brittle regex-based scrapers.
  • Firebase for Rapid Development: Firebase significantly accelerated the development lifecycle by abstracting database setup, authentication, and file storage implementation.
  • NoSQL Schema Discipline: The flexibility of Firestore requires strict, application-level schema validation to prevent data corruption.
  • Value of Type Safety: TypeScript was instrumental in preventing dozens of bugs before runtime, especially when managing complex, nested data structures.

Product Lessons

  • Market Validation: The $30B immigration services market confirms the difficulty of the problem, indicating a strong user willingness to pay for a high-quality solution.
  • Data Comprehensiveness is Key: Users require comprehensive, all-in-one data. Six complete program profiles are more valuable than 100 incomplete ones.
  • AI as a Tool, Not a Panacea: Grok failed on approximately 50% of program URLs due to broken links, confirming that human-in-the-loop oversight is still necessary.

Team Lessons

  • System Resilience and Recovery: A critical data deletion incident 90 minutes before a deadline underscored the importance of calm, systematic recovery procedures.
  • Importance of Internal Documentation: Comprehensive logging was essential for tracing every scraper decision and debugging issues quickly.

Future Roadmap

Immediate Term

  1. Data Expansion: Correct the 8 failed program URLs and re-scrape to achieve 14+ total programs.
  2. Geographic Expansion: Expand to 10+ new countries, including France, Spain, Portugal, Japan, and the UAE.
  3. Feature Enhancement: Implement the smart eligibility calculator to provide users with actionable feedback (e.g., "Requires 1 more year of work experience").
  4. Automation: Integrate Document OCR to extract data automatically from uploaded diplomas and certificates.

Medium Term (6 Months) - B2B Pivot

We will target immigration law firms as a primary B2B channel:

  • Offering: A white-label SaaS platform for firms to manage their clients.
  • Pricing Model: $500/month per attorney plus a $50/client fee.
  • Market Analysis: The 15,000 immigration attorneys in the US alone represent a $90M ARR potential.

Long-Term Vision (1-2 Years)

  • Government Partnerships: Evolve into the official application portal for smaller countries.
  • AI Application Generation: Develop an AI assistant to generate entire visa applications from user profiles.
  • Predictive Analytics: Build an ML model to predict application approval likelihood based on historical data.
  • Ecosystem Marketplace: Connect users with vetted attorneys, translators, and medical exam providers.

New Roots is positioned to become a foundational platform for the future of global mobility.

*I am looking for the video of the recording on my laptop!! I can't find it - Adel

Built With

Share this project:

Updates