When we were thinking about what to do for this hackathon we thought both about what would be fun to do and what is something we think was relevant to us. What triggered our selection of product was actually the bussing in London. Especially in the morning with the days getting colder and colder, with harsher weather the busses coming a bit late is much more common and inconveniencing. While we figured we couldn't quite predict exactly when the busses were going to come, we instead decided to investigate planes.

What our solution does is examine historical data of airplanes over multiple places and weather conditions and uses a machine learning algorithm to try and predict if a flight on a certain day with certain weather is going to be delayed. We think that while this may not be life-changing, it could easily make many lives easier and more painless.

To create this first we had to get large amounts of data. Thankfully, a US government site had a large amount of flight data on previous flights. This included various statistics such as day of the flight, delay timing, airline and location. This was good data, however, it was lacking any parameters to train our algorithms against. To fix this we found a weather API which provided statistics by the day, including temperature, wind speed, snow and rainfall amounts. Combining this with the flight data allowed us to gather a large data set. After this we created some algorithms to determine a probability fit for the stats gathered as well as creating a front end to help display our content.

At first, the largest challenge was simply finding data. Much of the flight data was not very accessible, with almost all being locked behind a paywall. While it took longer than we hoped, we did eventually find sources that were able to supply us with the data we needed. The biggest issue came from the data we gathered. At first the data seemed extremely good, however, there were some major problems with it that we did not realize at first.

Accomplishments that we're proud of

The issue with the data was in two areas. One is the incredible lopsidedness of the data. The overwhelming majority of flights were not delayed by weather. Only about 1% of the flights we had gathered had been delayed more than 30 min due to weather conditions. This made for an extremely lopsided data-set and makes it very hard for the algorithm to identify patterns, especially with the four parameters we provided it with. The second issue is more of an underlying one, and also extremely harmful to the dataset. For days in the past, we were only able to gather weather data by the day, so on poor weather data days, if poor weather conditions only persisted for a few hours, it would still show up for the whole day. While those conditions may ground all aircraft during that period of the day, after they pass the other aircraft will only be slightly delayed or maybe not even at all. This would throw off out algorithms significantly because of massively different data(delay times) originating from the exact same input(the weather conditions of the day). We tried a few different methods to fix this including weighted normalizing of the data and removal of data based on circumstance. Eventually, we did arrive at a solution that pleased us and we have now successfully implemented it.

Over the course of the hackathon we all learned a lot, both about coding, but also about collaboration, work endurance and perhaps most importantly were able to interact with the wonderful mentors that helped us in this project. This was an enjoyable experience for all of us and in the future we hope to continue attending both this hackaton and others.

Share this project:

Updates