Inspiration

In spec racing series like the Toyota GR Cup, every driver competes in an identical Toyota GR86. Theoretically, this levels the playing field, making driver skill the only differentiator. However, in reality, a massive gap remains: Data Strategy.

While professional teams deploy armies of engineers to crunch telemetry, privateer drivers and smaller teams often rely on gut instinct. We wanted to democratize this strategic advantage. With motorsport viewership and participation becoming increasingly diverse - female participation in Formula 1 fandom nearly doubled from 10% in 2017 to 18.3% in 2021, and 40% of NASCAR's fan base is now female - the demand for accessible, high-level racing tools is higher than ever. By automating the work of a race engineer, RacePulse ensures that in a spec series, the competition is truly decided by who drives the best, not who can afford the most expensive data analyst.

What it does

RacePulse is a comprehensive race strategy command center that transforms raw telemetry into winning decisions across three phases:

  • The Past (Analysis): It ingests historical telemetry from 7 different circuits (Barber, COTA, VIR, etc.) to learn how tire degradation and track conditions affect lap times.
  • The Present (Monitoring): During the race, it visualizes real-time driver inputs (throttle aggression, braking intensity) and tracks "Traffic Stress" - quantifying how much time a driver loses while stuck behind other cars.
  • The Future (Prediction): Using an XGBoost machine learning model, it predicts future lap times with 94.2% accuracy. It identifies the "Undercut Window" - the precise moment to pit to jump ahead of a rival - by calculating when fresh tires will outweigh the time lost in the pits.

How we built it

We architected a high-performance data pipeline centered on speed and accuracy:

  • Data Ingestion: We utilized the Hugging Face mihirphalke/race-pulse-parquet dataset, choosing Apache Parquet over CSV for its columnar storage efficiency, compressing 3,791 laps of high-frequency data into ~50MB.
  • Machine Learning: We trained an XGBoost Regressor on features like tire age, track temperature, and traffic stress. The model was optimized to predict lap times based on current telemetry signatures.
  • Backend: We built an asynchronous FastAPI service to serve predictions and perform heavy statistical analysis (e.g., consistency metrics and tire fall-off curves).
  • Frontend: The dashboard was built with React and TypeScript, using Recharts for visualizing complex telemetry traces. We designed a custom "GR Factory" theme to match the Toyota Gazoo Racing aesthetic.

Challenges we ran into

  • The "Traffic" Problem: Raw telemetry tells you where a car is, but not why it's slow. Differentiating between a driver having a bad lap vs. being stuck in traffic was difficult. We had to engineer a "Traffic Stress" feature that correlates gap times with lap pace to isolate clean air performance from dirty air struggles.
  • Data Volume vs. Latency: Processing 3,791 laps of high-frequency sensor data (throttle, brake, steering) in real-time is computationally heavy. Switching to Parquet format was a breakthrough, allowing us to query specific channels (like brake pressure) without loading the entire dataset.
  • Modeling Tire Wear: Tire degradation isn't linear; it's exponential at the end of a stint. Our early linear regression models failed to capture the "cliff" where tires suddenly lose grip. Moving to Gradient Boosting (XGBoost) allowed us to capture these non-linear relationships accurately.

Accomplishments that we're proud of

  • Model Precision: Achieving 94.2% prediction accuracy on lap times. In a sport where pole position is decided by thousandths of a second, this level of precision is actionable.
  • The "Race Storyboard": We moved beyond dry charts to create a narrative-driven visualization that highlights key moments like overtakes and incidents, making the data accessible to non-engineers.
  • System Architecture: Building a fully decoupled architecture (Parquet -> Python/FastAPI -> React) that is production-ready and scalable to other racing series.

What we learned

  • Parquet Power: We learned that for time-series telemetry, columnar storage (Parquet) is vastly superior to row-based storage (CSV), enabling real-time queries that were previously too slow.
  • The Art of Feature Engineering: We discovered that raw sensor data is noisy. The real value came from derived features - calculating "Braking Intensity" or "Throttle Aggression" gave our model the context it needed to understand driving style rather than just car position.
  • Strategy is Probability: We learned that race strategy isn't about certainty; it's about probability management. Predicting the range of possible outcomes is often more valuable than predicting a single specific lap time.

What's next for RacePulse

  • Monte Carlo Simulations: We plan to implement real-time simulations that run 10,000 virtual race scenarios per lap to calculate the exact probability of winning for every possible pit strategy.
  • Multi-Driver Comparison: Expanding the platform to overlay telemetry from multiple drivers simultaneously, allowing teammates to overlay their throttle/brake traces to learn from each other.
  • Automated Video Highlights: Integrating computer vision to automatically sync telemetry spikes (like hard braking or sudden G-force changes) with onboard video to auto-clip race highlights.

Built With

Share this project:

Updates