- To generate the data, go to the course project directory and run
python execute.py emilymo.- In order to run the program, you will need to get an API key for Google Maps geocoding and put it into auth.json in the following format: {"services": {"googlemaps": {"key": }}}
- To run the web app (will only work AFTER running execute.py), go to the directory
emilymo/siteand runpython site.py.
Despite Boston’s cultural diversity, many of its sub-groups across different demographic boundaries struggle to interact with each other and remain socially isolated. This project modeled the problem of choosing a neighborhood in Boston in which to place a new community center. This would not be any ordinary community center, but one that would incorporate creative and inclusive programming given the needs and concerns of the surrounding communities; therefore, it would be important to place it in a neighborhood whose members feel a lack of togetherness, and a neighborhood with generational diversity in its population, because there may exist social segregation across generational lines as well. In order to choose a neighborhood that matched as many of these criteria as possible, I took into account the quantities of public and non-public schools in each neighborhood, the number of existing community centers and YMCA locations in each neighborhood, and the aggregate average by neighborhood of several variables from the 2010 Boston Neighborhood Survey, which polled Boston residents on their sense of togetherness and security within their neighborhoods.
On the web page generated through site.py, the all of the data gathered from the neighborhoods is visualized in a convenient plot, with circles sized to represent the quantities of community resources and the survey variable scores for each neighborhood, and with borders on the circles to represent whether a neighborhood satisfied the set constraints for the neighborhood. This design was implemented in order to provide an intuitive illustration of community resources and neighborhood togetherness in Boston, and make it easier to compare features across neighborhoods. Additionally, the page includes a scatterplot feature in which the user can choose which of those variables are plotted on the X and Y axes in order to visualize the relationships between each of the variables.
By obtaining data about the number of community resources and the student population of each neighborhood, it was possible to create a set of criteria with which to define a community that may benefit from a new community center. My original plan was to eventually compute the quantities of community centers, public schools, and non-public schools per neighborhood as the neighborhoods were defined by the city of Boston, and then join the Boston Neighborhood Survey data to each neighborhood. However, the City of Boston and the Boston Neighborhood Survey both segmented Boston into different neighborhoods, and the survey dataset only provided mean survey responses (each question was scored on a scale of 1 to 5) for each of the neighborhoods that they defined, so it was not possible to directly join the data from the two different sources. In real life, it would be necessary to contact the agency that conducted the survey and get the raw data to recalculate the means according to the neighborhood partitions provided by the city. However, since I did not have the time or resources to contact this agency, I consolidated regions in both the City of Boston set and the survey set according to my own knowledge of Boston’s geography, at the expense of representational accuracy. When consolidating multiple survey-defined neighborhoods with one city-defined neighborhood, I took the minimum survey variables out of all the survey-defined neighborhoods I was combining, and used those as the survey variables in my final dataset. This ensured that I at least had a “worst case scenario” for the survey average for that region, since I eventually added constraints so that a neighborhood with lower values on the survey were more likely to be selected by the selection criteria. For instance, if the survey measured statistics for multiple sub-communities of Dorchester, but the City of Boston included the entirety of “Dorchester” as one neighborhood, I took the minimum of the survey variables across all of its sub-communities. This is because the City of Boston did not provide more specific geographical data for the bounds of Dorchester’s sub-communities, so I would not be able to sort the community centers/schools into more specific sub-communities, whereas under my consolidation method, I would at least have a lower bound for the neighborhood averages of the survey variables, though more accuracy would be preferable. When I consolidated multiple city-defined neighborhoods with one survey-defined neighborhood, I attributed the survey averages for that neighborhood to all of the consolidated city-defined neighborhoods, and then joined the shapefile regions for the city-defined neighborhoods. This is a problematic approach because each of the city-defined regions within one survey-defined neighborhood would almost definitely have different averages that were weighted differently to contribute to the overall average for the survey-defined neighborhood. For instance, if the city of Boston defined separate neighborhoods for “Allston” and “Brighton”, but the survey only provided statistics on “Allston/Brighton”, I assigned the same survey variables to separate neighborhoods of “Allston” and “Brighton”.
To obtain the data that was used for the final constraints, I first used the City of Boston data portal to get an inventory of the community centers, public schools, and non-public schools in Boston, as well as the geographic boundaries of each neighborhood in Boston. Next, I scraped the YMCA locations from the YMCA Boston website, geocoded them using the Google Maps API, and combined these locations with the list of community centers. Then, I downloaded the Boston Neighborhood Survey data from the BARI data portal, and consolidated the neighborhoods from the City of Boston and the Boston Neighborhood Survey as previously detailed. The aggregate survey variables that I considered were social cohesion ("the strength of positive social relationships between people in the neighborhood"), social control ("the perceived ability of the neighborhood to enforce shared norms"), and reciprocal exchange ("the perceived degree to which neighbors interact with one another")—all of these variables seemed to give insight into the degree of integration and social connection within a neighborhood, as perceived by its residents. Using the neighborhood shapefiles provided by the City of Boston, I used a MapReduce sorting computation that output the number of community centers/YMCAs, public schools, and non-public schools in each neighborhood. This computation took the neighborhood shapefiles and the coordinates of the community centers/schools, and used the Shapely Geometry package in Python to determine how many objects of each category fell within the geographical bounds of each neighborhood. Writing code to extract the coordinates for each of the neighborhoods required careful inspection of the data, because some of the coordinates were nested within multiple lists whereas others were not. Finally, all 6 of these variables for each neighborhood were joined under one data set, and I produced another small data set with the mean values across all neighborhoods of each survey variable, so that criteria could be added based on each neighborhood’s survey variables with respect to the overall means.
Afterwards, I made a prioritized list of criteria to narrow the search for a suitable neighborhood, in the following order:
-
The selected neighborhood has at most 2 community centers.
-
The selected neighborhood’s aggregate score for social cohesion was below the average across all neighborhoods.
-
The selected neighborhood’s aggregate score for social control was below the average across all neighborhoods.
-
The selected neighborhood’s aggregate score for reciprocal exchange was below the average across all neighborhoods.
-
The selected neighborhood has at least 2 public schools.
-
The selected neighborhood has at least 1 non-public school.
These criteria and their prioritization were based on personal intuitions on what sort of neighborhood might benefit from a new community center (a neighborhood that is socially disconnected and lacks community centers at the moment, but has a solid population of children and young adults whom a community center could help). However, in real life, this criteria should be based on research supporting the premise that a neighborhood with such characteristics would benefit from, and improve togetherness from, the addition of a new community center. The selection algorithm, which uses the Z3 Solver, added these criteria in order one by one to the model and tested if there was a neighborhood that satisfies all criteria that have been added so far. If the addition of any new constraint made the model unsatisfactory, the algorithm dropped it in order to keep the formerly added higher-priority constraints intact. In the end, the only dropped constraint was that involving public schools, and the only neighborhood that satisfied the remaining criteria was Allston/Brighton.
One major issue was a strange bug in the MapReduce call to sort the schools/community centers into neighborhoods, in which the map function would emit NaNs instead of 1s and 0s, which were the only two values that were supposed to be emitted by the function. Because I could not determine the cause after several hours of debugging, I wrote the exact same code in Python rather than Pymongo (which functioned properly) to avoid further inaccuracy. This occurs in the sort.py algorithm, and the original problematic Pymongo code is preserved in a multi-line comment.
If time permitted, the neighborhood selection process should have added better and more specific criteria potentially taking into account financial limitations and the neighborhoods’ total populations, and using more up-to-date-data. The survey data was from 2010, so it may not be an accurate reflection of the current state of community cohesion in Boston neighborhoods. All of these improvements should involve reaching out to the agencies from which the data originated, and having conversations to make sure the data from different sources could joined and consolidated without concealing or distorting the truth. It would also be vital to communicate with residents of the neighborhoods themselves and to get a representative perspective of the issues that they felt mattered the most in their communities in order to come up with more fair selection criteria.