Jupyter Notebooks on Various Topics

A collection of Jupyter Notebooks that explore and visualize datasets from a range of topics including economy, football, geography, knowledge bases, online communities and more.

You can download the notebooks published here from this GitHub repository. I use the Anaconda Python distribution which includes over 150 packages. Among them Numpy, Matplotlib and Pandas to name a few, that are frequently used in these notebook. You can see the versions of Python and the main libraries at the end of the notebooks in the signature output.

Setting a column based on another one and multiple conditions in pandas

Setting a column based on another one and multiple conditions in pandas

This notebook shows a way to set the value of one column in a CSV file, that satisfies multiple conditions, by extracting information from another column using the pandas library and the standard library re module.

Runtime comparison of pandas crosstab, groupby and pivot_table

Runtime comparison of pandas crosstab, groupby and pivot_table

In this notebook we'll compare the runtime of three different ways to group and summarize data using the pandas crosstab, groupby and pivot_table functions.

Word Cloud From Reddit Comments Gilded 10 Or More Times

Word Cloud From Reddit Comments Gilded 10 Or More Times

In this notebook we create a word cloud from reddit comments that were gilded 10 or more times from the start of reddit to February 2019.

Stack Overflow Survey 2018: Respondents World Map

Stack Overflow Survey 2018: Respondents World Map

In this notebook I create a choropleth map that shows how many people responded to the Stack Overflow Developer Survey 2018 in relation to their countries' populations.

Stack Overflow Survey 2018: Technology Preferences

Stack Overflow Survey 2018: Technology Preferences

This notebook looks at technologies, that respondents of the Stack Overflow Survey 2018 worked with in the past, want to work with in the future, where these overlap and where not.

Creating a world map of metal bands in Python

Creating a world map of metal bands in Python

This Jupyter notebook shows how to create a map of metal bands focusing on the issues that came up during the process and how they were solved.

Creating a Choropleth Map of the World in Python using GeoPandas

Creating a Choropleth Map of the World in Python using GeoPandas

This Jupyter notebook walks through the process of creating a choropleth map of the world in Python using the GeoPandas package.

What Software Engineers Earn Compared to the General Population

What Software Engineers Earn Compared to the General Population

In this notebook we'll compare the median annual income of software engineers to the average annual income (GDP per Capita) in 50 countries.

Word Cloud of the Most Frequent Words in the Canon of Sherlock Holmes

Word Cloud of the Most Frequent Words in the Canon of Sherlock Holmes

In this notebook I show how to create a shaped word cloud based on word frequencies in the canon of Sherlock Holmes by Sir Arthur Conan Doyle.

The Best Times to Post to reddit Revisited

The Best Times to Post to reddit Revisited

In this notebook we revisit previous analyses of the best times to post to reddit by looking at several individual subreddits.

How US Presidents Died According to Wikidata

How US Presidents Died According to Wikidata

This notebook shows how you can query the new Wikidata query service from Python to learn what caused the deaths of past US presidents.

Adding Branding Images to Plots in Matplotlib

Adding Branding Images to Plots in Matplotlib

This Jupyter notebook shows 2 methods of adding images to plots in matplotlib for branding purposes.

Creating a Choropleth Map of the World in Python using Basemap

Creating a Choropleth Map of the World in Python using Basemap

This Jupyter notebook walks through the process of creating a choropleth map of the world in Python using the Matplotlib Basemap Toolkit.

Ranking Subreddits by Comments, Authors and Comment/Author Ratios

Ranking Subreddits by Comments, Authors and Comment/Author Ratios

In this notebook we create charts ranking Subreddits by number of comments, authors and comments by authors based on a dataset aggregated from ~54 million comments posted in May 2015.

What are the Most Edited Pages in the English Wikipedia?

What are the Most Edited Pages in the English Wikipedia?

In this notebook we chart the top 30 most frequently edited article and talk pages in the English Wikipedia. Some of these pages could be expected but several of them are quite baffling.

Did the 3-Point Rule Affect Results in the German Fußball-Bundesliga?

Did the 3-Point Rule Affect Results in the German Fußball-Bundesliga?

In this IPython Notebook I address the question whether the introduction of the 3-point rule in the German Fußball-Bundesliga had a visible impact on match results.

Exploring the Top Incomes Database with Pandas and Matplotlib

Exploring the Top Incomes Database with Pandas and Matplotlib

In this IPython Notebook we explore data from the World Top Incomes Database using Pandas and Matplotlib. The database contains information for more than 20 countries and 100 years.

Drawing a Map from Pub Locations with the Matplotlib Basemap Toolkit

Drawing a Map from Pub Locations with the Matplotlib Basemap Toolkit

A tutorial on how to create a map of the Britain and Ireland drawn from pub locations extracted from OpenStreetMap using Pandas and the Matplotlib Basemap Toolkit.

Exploring Movie Body Counts

Exploring Movie Body Counts

A look at movie body counts based on information from the Website Movie Body Counts, a forum where users collect on-screen body counts for a selection of films and the characters and actors who appear in these films.

Creating Volcano Maps with Pandas and the Matplotlib Basemap Toolkit

Creating Volcano Maps with Pandas and the Matplotlib Basemap Toolkit

An IPython notebook that shows how to create maps of volcanoes with Python using Pandas and the Matplotlib Basemap Toolkit.