- Load data to postgre
- Connect python to db with sqlalchemy
- Start querying
- Plot # studen
- CRDC and SAIPE dataset -csv (CRDC) -xls (SAIPE)
- HMDA -csv
- HMDA and SDGR and ACLF -csv(HMDA) -csv and sas7bdat versions(SDGR) -pdf(SDGR) -txt (ACLF)
[] Exploratory
- Data Quality (nulls,dups,etc)
- Data Types (continuous, categorical, etc)
[] Modeling
- Start simple (lin reg no transform)
- Do some feature importance or PCA
- Determine better model based on the data
[] Evaluation
- Performance metrics
[] Reporting
- Introduce the data
- Speak on the process throughout
- Present the finalized model, performance, and what it means
Determination of the socioeconomin and or environmental factors that influence educational attainment A recommendation on how to allocate resources based on the need to address the most significant efectors
https://ocrdata.ed.gov/ https://www.census.gov/data/datasets/2017/demo/saipe
- geolocation
- census tracks and districts, segment by different combinations