Summary

For this challenge our team was tasked with a number of things like plotting children in poverty, as well as determining the distribution of particular features within a much larger collection of datasets. The larger picture at hand was a scenario of helping determine the best ways to allocate a surplus of government funding, by distributing it back to the people. While the goal for everyone was to find insights in the data and try to best determine good points for funding distribution, we made it our personal goal to make our findings as accessible as possible. A lot of good analysis get's done but never get's seen by the public. Here we wanted to make a well built and highly accessible dashboard that not only displays our findings but empowers others to do the same.

We approached the problem in a very standard data first way, digging into the data, data types, mistakes, errors, etc etc. Eventually getting the data into a PostgreSQL database hosted on Heroku to make it public for anyone to use. Following the same principle we decided upon at the beginning of accessibility, the data should be easily accessible as well as the analysis itself.

In the end we used mostly Python with SQL for database management. The db was hosted on Heroku. The website itself was built with and deployed with Streamlit. These tools are a great combination of quick to demo while still being well built and very interactive.

The biggest thing by far we learned is just how much time one can waste trying to get data in a perfect state.

Data

  1. CRDC and SAIPE dataset -csv (CRDC) -xls (SAIPE)
  2. HMDA -csv
  3. HMDA and SDGR and ACLF -csv(HMDA) -csv and sas7bdat versions(SDGR) -pdf(SDGR) -txt (ACLF)

Plan

[] Exploratory

  • Data Quality (nulls,dups,etc)
  • Data Types (continuous, categorical, etc)

[] Modeling

  • Start simple (lin reg no transform)
  • Do some feature importance or PCA
  • Determine better model based on the data

[] Evaluation

  • Performance metrics

[] Reporting

  • Introduce the data
  • Speak on the process throughout
  • Present the finalized model, performance, and what it means

Determination of the socioeconomic and or environmental factors that influence educational attainment A recommendation on how to allocate resources based on the need to address the most significant effectors

https://ocrdata.ed.gov/ https://www.census.gov/data/datasets/2017/demo/saipe

Data Summary

  • geolocation
  • census tracks and districts, segment by different combinations

TODO

  1. Load data to postgre
  2. Connect python to db with sqlalchemy
  3. Start querying

Built With

Share this project:

Updates