Desktop Disciples

Inspiration

Over 80 percent of the world's flowering plants depend on pollinators, like monarch butterflies. In recent years, however, the monarch butterfly sighting count has declined drastically. As one of the main pollinators in the United States, this could lead to negative impacts on the natural ecosystem, agriculture, and the economy.

What it does

We hope to see whether there is a correlation between the monarch butterfly decline, temperature levels, and air quality. Our model shows that there is somewhat of a correlation between these variables.

How we built it

What we did is pull data from the different monarch butterfly and AQI sites given to us, and we extracted data like daily AQI, temperature, monarch butterfly count and more, then compared these values with one another. We then generated a custom LLM model via OpenAI that can process and recreate CSV files with relevant season, county, AQI, temperature, and FIPS columns given only a date, state, and city. We first visualized the monarch count, creating a script to normalize our data, then generate a heat map of the U.S. to visualize the dispersion of Monarch sightings by county over the last 15 years using Python libraries like matplotlib and geopandas. Next we mapped this information across relevant data such as Milkweed growth rates, temperature, and AQI by county to scan for similarities in migration patterns. We then started a deeper statistical analysis of relevant variables, beginning with a linear regression model, before pivoting to a negative binomial regression model due to our Y value being a count variable, and our data-set being heavily over-dispersed. This allowed us to adequately find a correlation between season, year, air quality, and temperature across 15 years to map and explain the decline of monarch butterflies, as well as generate a potential solution.

Challenges we ran into

Our original AI analysis was consistently finding no correlation, contrary to our pre-processing research. After analyzing our model choice, we realized we weren't using the best approach, and had to pivot last-minute. Our data pre-processing initiative was very long due to maintaining a grueling verification process of our data transformations, however this allowed us to verify datasets as we worked, rather than encountering errors and having to fix them later in our process.

Accomplishments that we're proud of

Created CSVs through batch processing so we could test things at a small level, and scale it up to find multiple correlation factors. Created several CSVs using Python, allowing us to visualize our data and more. Utilizing multiple AI models and visualization tactics.

What we learned

Python external models, LLM model creation, AI analysis, data processing and CSV manipulation, data modeling, and statistical analysis.

What's next for Desktop Desciples

If we continued this project, we would continue fine-tuning our model to explain a higher percentage of variance, while including more specific analysis on our X factors' effect on butterfly feeding patterns.

Built With

jupyter
matplotlib
pandas
python
scikit-learn

Submitted to

Rowdy Datathon 2024
- Winner Intermediate Track - 1st Place

Created by

Created the initial heat map generation script for counties' butterfly count.
Created scripts to scrape, format, and replicate datasets for butterfly, pollution, and temperature data sorted by day, month, season, and year. Created a custom LLM model via GPT to process datasets and apply relevant FIPS, county, AQI, and temperature data given only a date, state, and city. Designed and wrote scripts for multiple LLM models including a Linear Regression model, before pivoting to a Negative Binomial Regression model that better fit our dataset due to our Y value being a count variable, and our dataset being heavily over-dispersed.

Teagan Smith
Cloud / DevOps Engineering Intern with specialties in Security, Kubernetes, Terraform, and multiple cloud platforms. Security+ Certified.
Mei Sullum
Paolo Lay
Nathan Zuniga

Updates

Nathan Zuniga started this project — Oct 06, 2024 09:06 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.