Stories by Jessica Ayodele on Medium

Data Analyst Interview Questions Pt. 1

Jessica Ayodele — Sun, 06 Aug 2023 14:25:28 GMT

40+ Questions I’ve been asked in interviews since 2021

Continue reading on Medium »

15 Volunteer Organizations to gain Tech and Data skills

Jessica Ayodele — Sun, 18 Sep 2022 19:46:21 GMT

Are you looking to gain real world Tech experience while learning and before landing a job?

Continue reading on Medium »

Analysis of New York City Motor Vehicles Collisions

Jessica Ayodele — Thu, 01 Jul 2021 05:26:05 GMT

Hands-on Tutorials

A Data analyst interview case study using Google BigQuery and Tableau

In my last article, I spoke about my transition into Data Analytics and how I recently landed a full-time Data Analyst position. Throughout the month of April ’21, I was breezing in and out of interviews with various North American companies. For some of these companies, I had to partake in Excel, SQL, or Python tests while a few others had me work on case studies. In this article, I will walk you through one of such case studies which I passed and my approach in tackling the problem.

Photo by Luke Stackpoole on Unsplash

The Task

First, case studies are a way for companies to test core skills before considering you for advanced interview stages. For this case study, I was tasked with analysing the New York City Motor Vehicles Collision dataset in Google BigQuery from Jan 2014 to Dec 2017 and provide recommendations to reduce occurrence of accidents in Brooklyn, a borough in New York. The entire dataset currently has over 1.7 million records from 2012 to date and can be accessed here.

Side note: Google BigQuery has several public datasets that are updated periodically and can be used to build projects for your portfolio.

My Approach

My first instinct was to search the web for articles related to the task because “there’s nothing new under the sun”. I found previous articles which I found useful in developing my approach. A summary of my approach is shown in the image below.

My Case Study Approach (Image by Author)

First Steps

Here are a few tricks and steps you should use to approach future case studies.

Understanding the task: This is relevant for any case study to ensure that your analysis does not go off-point. It is important to follow the instructions first before going the extra mile. In this case study, I almost missed where I was asked to analyse only 2014–2017 data in the brief.
Prepping the Data: Identifying the primary key and checking for duplicates and null values should be a no-brainer when exploring your dataset. Also look out for fields that might be relevant to your analysis, so you do not end up importing irrelevant fields into your Business Intelligence tool. This is where SQL came in handy.

Checking for Duplicates in Google BigQuery (P.S. Query returned no results is a good thing in this case.)

Deep Dive

To analyse the dataset, I made use of Tableau Public for two reasons: I wanted to create an interactive dashboard and Tableau was one of the skill sets mentioned in the job description. From exploring the dataset, I got ideas of key features to do an in-depth analysis on. Some are highlighted below while others can be explored in the final dashboard.

Collision Analysis: This was done to reveal top causes that led to collisions and fatalities. We can see here that most fatalities were caused by Driver Inattention/Distraction.

Top 7 Collision Contributing factors by fatalities (Image by Author using Tableau)

Time Series Analysis: Reveal what time of day or day of week have most collisions. We can see from the chart below that most collisions occurred during rush hour (4PM–5PM). We also see significant numbers at early hours of the day.

Collisions Time Series Analysis (Image by Author using Tableau)

Fatality Analysis: This revealed that pedestrians were killed more often than other road users whenever collisions occurred.

Total Annual Fatalities by road users (Image by Author using Tableau)

Bringing it all together

Using the insights gathered from my analysis, I prepared a slide deck to provide recommendations. An additional tip is to ensure any recommendation you provide is backed up by your analysis — not prior knowledge. Also, most companies would give a few hours to 5 business days to complete a case study. If you see you have more time, please try not to rush through it.

Recommendations provided based on my analysis (Image by Author)

The final submission for this case study was a slide deck and dashboard. The latter was an add-on because this was a major tech company and they loved it :). A preview of the interactive dashboard is shown below. I designed the background in Figma, and the rest of the magic happened in Tableau.

Final Tableau dashboard (Image by Author)

Relevant Links

Analysis of New York City Motor Vehicles Collisions was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

BRIDGERTON: An analysis of Netflix’s most-streamed TV series

Jessica Ayodele — Fri, 29 Jan 2021 20:41:46 GMT

An analysis of over 300,000 tweets on the Bridgerton TV series using NLP techniques in Python & Tableau

Continue reading on TDS Archive »

The Year 2020: Analyzing Twitter Users’ Reflections using NLP

Jessica Ayodele — Wed, 30 Dec 2020 18:50:26 GMT

A Sentiment Analysis Project using Python and Tableau

Continue reading on TDS Archive »

Analysis of Toronto Neighbourhoods using Machine Learning

Jessica Ayodele — Fri, 27 Nov 2020 01:19:46 GMT

A New Immigrant’s Guide to settling in the City of Toronto

Introduction

When I began this project, I came across a news article which read “Canada to welcome 1.2 million immigrants by 2023” [1]. This made me excited for the millions of people looking for a pathway to Canada since I recently relocated here. A 2020 US news ranking showed Canada as the 2nd best country in the world, so it is not a surprise that every year, thousands of people choose to migrate here [2]. Asides from having a stable economy and many growth opportunities, Canada has offered many immigrants a new home. In 2019, Canada opened its borders to 341,000 people with 35% of them settling in the City of Toronto [3]. Hence, it is safe to say that the City of Toronto is a top destination for most new immigrants.

Problem Statement

The City of Toronto has 140 neighbourhoods spanning 6 districts. As a new immigrant, a vital question to answer is “What neighbourhood do I settle in?”. The aim of this project is to group Toronto neighbourhoods in order of desirability using Machine Learning and Data Visualization techniques.

Photo by Matthew Lai on Unsplash

Basis

There are several factors to consider when settling down in any location. For this project, I performed my analysis using the following criteria:

Total number of Essential Venues in each neighbourhood
Primary and Secondary Benchmarks: Primary benchmarks considered were Unemployment rate, Crime rate and COVID-19 rates for each neighbourhood while the Secondary benchmark was housing price for a one-bedroom apartment in each neighbourhood.

Data Description

Most of the datasets were obtained from the City of Toronto Open Data Portal. Other datasets were scraped from the web. They include:

Neighbourhood Boundaries Map (GeoJSON): This file contains standard geospatial data and was critical for map visualizations
COVID-19 dataset for Toronto: Total cases as of October 22nd, 2020
Crime rates dataset for Toronto Neighbourhoods: for the Years 2014 to 2019
Neighbourhood Profiles/Census dataset: Based on data collected by Statistics Canada in the last Census campaign held in 2016
Housing rental prices: Contains median rental prices per neighbourhood

Methodology

The Python libraries used on this project were Numpy, Pandas, Geopandas, Plotly, Scikit learn, Requests and Geopy. All visualizations were done using Plotly library because the visualizations are very interactive and can be achieved with fewer lines of code.

The GitHub repo for this project can be found here while the Jupyter notebook can be viewed here.

The main steps for this project are summarized in the flowchart below:

Project Flowchart

Data Exploration

The Interactive Charts and Maps in the rest of this article are best viewed using a Computer or tablet

Exploring Venues in City of Toronto

Firstly, I obtained top 100 venues in each neighbourhood by sending a request via the Foursquare API. A total of 2118 venues and 291 unique venue categories were returned.

Using One-hot encoding, I converted the venue categories to numerical values for each neighbourhood to carry out further analysis. The total number of essential venues such as restaurants, schools, train stations, malls etc. were computed for each neighbourhood. From the Sunburst chart below, we can see all 6 Toronto districts and their respective neighbourhoods. The neighbourhoods are displayed based on proportion of the total number of essential venues present in them. Click/Tap on chart to explore further.

https://medium.com/media/5296cbaf173a147962b6e8c731bf8886/href

Quick Facts Check: There are more coffee shops and restaurants in Toronto than there are neighbourhoods with over 900 restaurants spanning across the city

Exploring Toronto Neighbourhoods using Primary benchmarks

After a clean-up of the individual datasets for the primary benchmarks, I merged them into one Pandas dataframe as shown below.

The dataframe was converted to an interactive bubble chart below. Crime rates represented by the bubble size. Click/Tap on the legend on the Bubble chart to isolate a district and explore further.

https://medium.com/media/b7e0e9ac209ebb0c9d95c79cfe3ed319/href

Quick Stats Check: Average Unemployment rate is 8.3%. Average number of crimes committed per 100,000 people is 1378 and 1 in 100 persons had contracted COVID-19 as at October 2020.

Machine Learning

Clustering Toronto Neighbourhoods

A clustering algorithm, “k-means”, was used to group the neighbourhoods in order of desirability for new immigrants. k-means is an Unsupervised Machine Learning algorithm that groups the data points such that all neighbourhoods with similar data points are in the same cluster.

Steps for Clustering Toronto Neighbourhoods

The steps below were used to segment the neighbourhoods:

Determine optimum number of clusters using the “Elbow” method
Group neighbourhoods using total number of essential venues. These essential venues included places such as Schools, Train stations, Restaurants, Banks, Shopping Malls, Bus Stations etc. This resulted in 3 distinct neighbourhood clusters and the outcome was represented in the final Choropleth map as “Venue Density”
Group neighbourhoods using the primary benchmarks — Unemployment, Crime and COVID-19 rates. The result of this clustering attempt is shown below
Group the neighbourhoods in the “Low” cluster from Step 3 using the secondary benchmark i.e. Housing prices

https://medium.com/media/acb88ea31bc065f81e868dad53a76ffa/href

Results

The outcome of the clustering steps above was used to rank the neighbourhoods into four categories. Neighbourhoods that belonged to the Mid & High clusters in Step 3 were named as the Least desirable while those with Low, Mid and High housing prices in Step 4 were named as Most Desirable, Desirable and Semi-Desirable respectively. The final neighbourhood desirability index was made into a choropleth map below using Plotly library.

https://medium.com/media/7e0631382ca5a6caf804470b808a1a88/href

Conclusion

From the results, we can make the following deductions:

Only 10% of Toronto neighbourhoods have high venue density with Mount Pleasant West, Church-Yonge Corridor, Yonge-St. Clair and Bay Street Corridor taking the lead
Most Desirable Neighbourhoods: Consider neighbourhoods in Scarborough area if searching for less pricey apartments. Other neighbourhoods to consider are Banbury Don-Mills and Annex in North York and York districts respectively
Looking for Entertainment: Look no further than Downtown Toronto which is also known as the Entertainment District. This area was classified Semi-desirable owing to the higher housing prices. However, if you’re looking for fun and have the $$$, it is a great place to settle in
Presence of Essential Venues: If you are keen on proximity to essential venues, the neighbourhoods to consider which are also in the Desirable category are Mount Pleasant West, Yonge-St, Clair and Greenwood-Coxwell
Avoid if you Can: Most neighbourhoods in the North-Western region of Toronto i.e. Etobicoke district were classified as the Least desirable due to the high crime and COVID-19 rates in those neighbourhoods. It is also interesting that this region is home to Jane and Finch which is a “red” neighbourhood.

References

All references used for this project have been hyperlinked within the write-up. For the complete Python code written on Jupyter Notebook, GitHub repo with the dataset and my social media pages, please use the links below:

Analysis of Toronto Neighbourhoods using Machine Learning was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.