Inspiration

We wanted a way to quantify “who’s most likely to podium” before a race starts and to explain why. Racing fans, teams and commentators benefit from probability-based reasoning and interactive “what-if” scenario simulation (e.g., change start position, change expected pace). PodiumCast combines racing telemetry/history with ensemble ML to produce actionable probabilities plus factor importance so predictions are explainable.

What it does

  • Loads historical race CSVs (Sebring, Road America) and produces: win probability, podium (top-3) probability, top-5/top-10 probabilities, expected finishing position, and a DNF risk estimate.
  • Provides confidence intervals and a factor analysis dashboard showing which features (starting pos, past track history, pace consistency, etc.) moved the probability most.
  • Offers an interactive scenario simulator — change inputs (e.g., starting position) and re-run predictions to see new odds.

How I built it

Data ingestion from structured race CSVs → feature engineering (50+ features) → models (multiple predictors: podium predictor, win probability model, DNF risk model, position predictor) → ensemble aggregator → FastAPI backend → React + Vite frontend dashboard with a 3D podium visualization and scenario controls. Architecture diagram and workflow are in the README.

A compact ensemble representation (used conceptually) is: P_ensemble​=\sum_{i}​w_i​*P_i​ where: P_i are model probabilities and w_i are ensemble weights (normalized). This lets us combine specialist models (win, DNF, finishing-position) into final odds.

Challenges we ran into

  • Feature design. Extracting robust, interpretable features from messy CSVs (different races/tracks) required many edge-case rules.

  • Small dataset size. Motorsport datasets are often small — that forced careful validation and conservative regularization to avoid overfitting.

  • Calibration. Matching model confidence intervals to reality (probability calibration) took extra validation and isotonic/Platt-style calibration checks.

Accomplishments that I'm proud of

  • End-to-end pipeline from raw CSV to live interactive predictions.
  • Factor analysis view that helps users understand why odds change when a parameter is modified.
  • Scenario simulator that recalculates probabilities on the fly in the UI.

What I learned

  • Thoughtful feature engineering and simple ensembles often beat complex black-boxes on small racing datasets.
  • Exposing model uncertainty (confidence intervals) dramatically increases user trust.
  • Building UI controls to let users test “what-if” scenarios is extremely effective at surfacing model strengths & limits.

What's next for PodiumCast

  • Add more tracks and seasons to improve model robustness.
  • Experiment with time-series models (per-lap modelling) and stronger calibration.
  • Add live data integrations (race weekend telemetry) and a lightweight API key service for team integrations.

Built With

Share this project:

Updates