90948 TTC delay forecasting project

Inspiration

Anyone who’s taken the TTC knows the frustration of unexpected delays—standing in a packed subway station with no idea when the next train will arrive or watching a streetcar take ages to move through traffic. These delays don’t just inconvenience commuters; they undermine confidence in public transit and make car travel more appealing, counteracting sustainability efforts. We wanted to tackle this issue by predicting when and where delays are most likely to happen, helping both riders and transit operators plan ahead.

What it does

Our project forecasts TTC delays by analyzing past trends and patterns. It predicts the occurrence, location, and duration of disruptions across streetcars, buses, and subways. The goal? Provide actionable insights that could help optimize transit operations, improve service reliability, and make commuting just a little less stressful.

How we built it

Data Collection & Cleaning: TTC delay reports aren’t perfect—there were missing values, inconsistent formats, and plenty of noise. We cleaned and structured the data to make it usable.
Exploratory Data Analysis (EDA): We dug into trends, identifying key factors that contribute to delays (time of day, weather, route type, etc.).
Machine Learning Models: We experimented with models like Random Forest, Multinomial Logistic Regression, fine-tuning them to balance accuracy and interpretability.
Visualization & Insights: To make the findings accessible, we built an app that highlight delay trends and predictions.

Challenges we ran into

Messy Data: Public transit data isn’t always well-organized, so cleaning and standardizing it took longer than expected.
Choosing the Right Features: Finding the best predictors of delays wasn’t straightforward—should we include weather? Time of day? Previous delays? It took multiple iterations to get it right.
Model Performance: Some models were too slow, while others lacked accuracy. We had to strike a balance between efficiency and reliability.
External Factors: Things like sudden traffic congestion or unexpected technical issues don’t always show up in the data, making 100% accuracy impossible.

Accomplishments that we're proud of

Built a functional model that can forecast TTC delays with solid accuracy.
Developed interactive visualizations to make insights useful for both transit authorities and commuters.
Gained a deeper understanding of transit data and the many moving parts behind TTC delays.
Managed to make predictions despite messy and incomplete datasets.

What we learned

Data cleaning is everything—a bad dataset leads to bad predictions, no matter how fancy the model.
Transit systems are complex—delays aren’t just about schedules; they’re influenced by human behavior, weather, infrastructure, and more.
Machine learning isn’t magic—it requires constant tweaking, testing, and fine-tuning to be useful in the real world.
Predicting the future is hard—but even small improvements in forecasting can have a big impact on transit efficiency.

What's next for the TTC Delay Forecasting Project

Improve Accuracy: Incorporate more real-time data sources like live traffic and weather feeds.
Deployment: Turn this into a real-world tool—maybe an API or dashboard that TTC operators and riders can actually use.
Expand to Other Cities: Transit delays aren’t just a Toronto problem. We’d love to apply this model to other systems.
Collaboration with the TTC: Ideally, this could be used to help optimize schedules, allocate resources more effectively, and make commuting in Toronto a smoother experience.

We set out to make TTC delays a little more predictable. While there’s still room for improvement, this project is a step toward a smarter, more reliable transit system.

Built With

Submitted to

SDSS Datathon

Created by

I contributed to the development of the machine learning models, specifically implementing Multinomial Logistic Regression and Random Forest Regression to forecast TTC delays. This was my first time building and running predictive models, and I worked on testing and evaluating their accuracy. Additionally, I handled data cleaning, eliminating null values and preparing the raw dataset for analysis. This experience deepened my understanding of both model implementation and preprocessing techniques essential for real-world data science applications.

Chay Park
Adelyn Lee
naraeleee Lee
Sara Sanas
no

Updates

naraeleee Lee started this project — Mar 02, 2025 12:01 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.