Python Pandas Visualization

Boost Your Career with In-demand Skills - Start Now!

For data interpretation, the power of deriving a sense or insight from raw data is a skill that is valued and sought after in the workplace. As the volumes of information available exponentially grow, data visualization has become more critical now than ever. Visualizing data beautifies the charts and transforms ‘complex’ series of numbers into effortless stories that everyone can understand.

It can get tricky to understand the symphony by considering each musical note. Data, just like music, is complex and precisely the same in its raw form. The humanized version of this sentence via data visualization is analogous to the conductor of the orchestra, who performs the role of conductor and runs all the hidden patterns, outliers, and trends across the music symphony.

Pandas aren’t the only ones capable of tabular data manipulation in Python; they can also serve as an ally of data visualization in the world of data visualization. Panda is an easy addition to Matplotlib, which is well known for plotting and allows users to generate different types of graphical representation of their data effortlessly and expressively.

In this detailed guide, we shall explore the range of data visualization using Pandas. Whether you’re a beginner at data visualization and analytics or you’re already a proficient data scientist, this guide will help you reveal valuable information from the data by analyzing and visualizing data. We will dive into many basic things, from the number tools for plotting to the very advanced visualizations and interactive exploration. Pandas will have everything and more required to present the data in a way that is well understood and impactful. It is time to go to the exciting field where data engulfs art, where Panda is your brush, and your creativity shines as artworks shine as actual art pieces.

Understanding the Basics

A Quick Start to Pandas DataFrame and Series

The two most critical systems in Pandas are the DataFrame and the Series. You can think of them as the building blocks that help you successfully arrange and work with your statistics.

A DataFrame is

A Data Frame is a three-dimensional, tabular data shape that resembles an Excel or SQL table. It has rows and columns, and every column holds distinct statistics. You can consider it a listing of Series, which might be all so.

To make a DataFrame, you only need to load a dataset, convert a dictionary, or change the shape of a few other facts.

import pandas as pd

# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)
print(df)

Output:

Series

# Creating a Series from a list
ages = pd.Series([25, 30, 35], name='Age')
print(ages)

A Series is a classified one-dimensional array that can preserve any data. It can be thought of as an unmarried DataFrame column.

Making a Series from either a listing or an array is clean.

Basic Plotting with Pandas

Now that we know how to work with Pandas’s fact structures, let’s examine how to expose information.

Basic Plotting Functions

Pandas’s ‘.Plot()’ characteristic makes smooth plots much less complicated. This function may be used immediately on a DataFrame or Series, making it a beneficial device for speedy fact display.

# Plotting a Series
ages.plot(kind='bar', title='Age Distribution')

For DataFrames, you can either use the “.Plot()” characteristic at the DataFrame itself or inform it which columns to show.

# Plotting a DataFrame
df.plot(x='Name', y='Age', kind='bar', title='Age Distribution by Name')

Showcasing Plot Types

# Line Plot
df.plot(x='Name', y='Age', kind='line', title='Age Trend')

# Bar Chart
df.plot(x='Name', y='Age', kind='bar', title='Age Comparison')

# Scatter Plot
df.plot(x='Age', y='City', kind='scatter', title='Age vs. City')

For more excellent superior statistics visualization with Pandas, those fundamental ideas and plotting strategies will help you get started. As we move on, we’ll communicate how to improve your records storytelling skills through customization, superior plot types, and interactive visualizations.

An example of a Bar Plot

import pandas as pd
import matplotlib.pyplot as plt

# Example dataset
data = {'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
        'Population': [8398748, 3990456, 2705994, 2325502, 1680992],
        'Area (sq. mi)': [302.6, 468.7, 227.3, 637.5, 517.6]}

df = pd.DataFrame(data)

# Plotting a bar plot
df.plot(kind='bar', x='City', y='Population', color='skyblue')
plt.title('Population of Major US Cities')
plt.xlabel('City')
plt.ylabel('Population')
plt.xticks(rotation=45)
plt.show()

Output:

Another Example of a line plot is given below

import pandas as pd
import matplotlib.pyplot as plt

# Sample data
data = {'Year': [2010, 2011, 2012, 2013, 2014],
        'Sales': [100, 120, 140, 160, 180]}

# Create a DataFrame
df = pd.DataFrame(data)

# Plotting the data
df.plot(x='Year', y='Sales', kind='line', marker='o', color='skyblue', linestyle='-')

# Adding title and labels
plt.title('Yearly Sales')
plt.xlabel('Year')
plt.ylabel('Sales')

# Displaying the plot
plt.grid(True)
plt.show()

Output:

Diving Deeper

Customizing Plots

If you exchange your plots, they could pass from being informative to having an impact. Let’s look at a few specific ways to personalize the usage of Pandas.

Colors, Labels, and Titles

# Customizing colors, labels, and titles
df.plot(x='Name', y='Age', kind='bar', color=['skyblue', 'salmon', 'lightgreen'], 
        title='Customized Age Comparison', legend=False)

Pandas plotting makes it smooth to exchange colors, labels, and titles. This is an instance:

We selected our colors, removed the legend, and brought a name to this situation.

With Axes and Grids

Changing the axes and adding gridlines can make your visualizations more straightforward.

# Customizing axes and adding gridlines
ax = df.plot(x='Name', y='Age', kind='bar', title='Age Comparison with Customized Axes')
ax.set_ylabel('Age (years)')
ax.set_xlabel('Names')
ax.grid(axis='y', linestyle='--', alpha=0.7)

Multiple Plots and Subplots

Making multiple plots or subplots may be helpful when working with complex datasets or evaluating variables.

Multiple Plots

Adding several plots into one determination makes it simpler to compare them.

# Creating multiple plots in a single figure
ax1 = df.plot(x='Name', y='Age', kind='bar', title='Age Comparison')
ax2 = df.plot(x='Name', y='Salary', kind='bar', title='Salary Comparison', ax=ax1, secondary_y=True)

Subplots

When working with exclusive elements of your facts, subplots help you arrange your visualizations.

# Creating subplots
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))

df.plot(x='Name', y='Age', kind='bar', ax=axes[0, 0], title='Age Comparison')
df.plot(x='Name', y='Salary', kind='bar', ax=axes[0, 1], title='Salary Comparison')
df.plot(x='Name', y='Experience', kind='bar', ax=axes[1, 0], title='Experience Comparison')
df.plot(x='Name', y='Productivity', kind='bar', ax=axes[1, 1], title='Productivity Comparison')

plt.tight_layout()

With those methods to exchange matters and organize things, you have more significant manipulation over your visualizations. As you analyze Pandas’ plotting alternatives, you’ll be able to explain complicated ideas precisely and cleanly.

Advanced Visualizations

Histograms and Density Plots

Understanding Data Distribution

It is impossible to mention approximately how crucial it is to recognize how records are shipped in data analysis. Data analysis reveals a dataset’s main developments, differences, and patterns. By drawing the distribution, you can determine if the data is symmetric, skewed, or has more than one peak. These statistics could be crucial for making clever picks and choosing the proper statistical techniques. Histograms and density plots are notable methods to observe how information is sent.

Creating Histograms and Density Plots with Pandas

Pandas make it less complicated to peer how data is distributed using histograms and density plots.

A histogram

A histogram suggests how the frequencies of a non-stop variable unfold. It separates the variety of values into ” boxes ” agencies and indicates how many observations are in each organization.

# Creating a histogram
df['Age'].plot(kind='hist', bins=10, edgecolor='black', title='Age Distribution Histogram')

Density Plot

A density plot is like a histogram but with more significant clean strains. It shows the distribution as a continuous function and may be very helpful whilst running with small datasets.

# Creating a density plot
df['Age'].plot(kind='density', title='Age Distribution Density Plot')

By examining how the “Age” variable unfolds in this case, we can discover the maximum not unusual age businesses and inform if the distribution is skewed or symmetric.

Here is an example of a Histogram Plot

import pandas as pd
import matplotlib.pyplot as plt

# Example dataset
data = {'Age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]}

df = pd.DataFrame(data)

# Plotting a histogram
df['Age'].plot(kind='hist', bins=10, color='salmon', edgecolor='black')
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

Output:

Box Plots and Violin Plots

Here are a few plots that can be used to examine and distribute information:
Container and violin plots help you see how the facts are spread out and examine variables throughout distinct companies.

Box Plot

A container plot suggests the main facts of a hard and fast of records, like the median, quartiles, and possible outliers. It’s especially beneficial for comparing a couple of variables’ spread and critical tendency.

# Creating a box plot
df[['Age', 'Salary', 'Experience']].plot(kind='box', title='Box Plot of Age, Salary, and Experience')

Violin Plot

A violin plot combines components of a box plot and a kernel density plot to create an extra-complete picture of how the statistics unfold.

# Creating a violin plot
df[['Age', 'Salary', 'Experience']].plot(kind='violin', title='Violin Plot of Age, Salary, and Experience')

Here is an Example of a Violin Plot

import pandas as pd
import matplotlib.pyplot as plt

# Example dataset
data = {'Category': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'Value': [10, 15, 20, 25, 30, 35, 40, 45, 50]}

df = pd.DataFrame(data)

# Plotting a violin plot
plt.figure(figsize=(8, 6))
plt.title('Distribution of Values by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.grid(True)
plt.violinplot(dataset=[df[df['Category'] == 'A']['Value'],
                        df[df['Category'] == 'B']['Value'],
                        df[df['Category'] == 'C']['Value']],
               showmedians=True)
plt.xticks([1, 2, 3], ['A', 'B', 'C'])
plt.show()

Output:

These plots are accessible when working with a couple of variables because they make comparing how the variables unfold easy.

Time Series Visualizations

How to Handle Time Series Data with Pandas: Time collection records include observations that can be recorded over the years, like sales information, stock expenses, or temperature readings. Pandas has tools designed to work with time collection records.

Time Series Plot

It helps to visualize how a variable changes over time

# Creating a time series plot
time_series_df = pd.read_csv('time_series_data.csv', parse_dates=['Date'], index_col='Date')
time_series_df['Sales'].plot(title='Time Series Plot of Sales')

Rolling Statistics

Discovering traits and styles in time series records is feasible by computing and plotting rolling facts, like rolling mean and rolling well-known deviation.

# Creating rolling statistics plot
rolling_mean = time_series_df['Sales'].rolling(window=30).mean()
rolling_std = time_series_df['Sales'].rolling(window=30).std()

plt.plot(time_series_df['Sales'], label='Original')
plt.plot(rolling_mean, label='Rolling Mean', color='orange')
plt.plot(rolling_std, label='Rolling Std', color='green')

plt.title('Rolling Statistics for Sales')
plt.legend()

To make predictions and clever picks approximately for special time durations, you need to recognize time series facts and the patterns built into it. With those visualizations, you could look extra intently at how your information modifies and tends through the years.

Here’s an Example of an Area Plot

import pandas as pd
import matplotlib.pyplot as plt

# Sample data
data = {'Year': [2010, 2011, 2012, 2013, 2014],
        'Sales': [100, 120, 140, 160, 180],
        'Expenses': [80, 90, 100, 110, 120]}

# Create a DataFrame
df = pd.DataFrame(data)

# Plotting the data as an area plot
df.plot(x='Year', y=['Sales', 'Expenses'], kind='area', stacked=False)

# Adding title and labels
plt.title('Yearly Sales and Expenses')
plt.xlabel('Year')
plt.ylabel('Amount')

# Displaying the plot
plt.grid(True)
plt.show()

Output:

Interactive Visualization with Pandas

Visualizations you can engage with make looking at and analysing records more enjoyable. Interactive plotting libraries like Plotly or Bokeh may be introduced to Pandas without problems to make users enjoy themselves at higher levels.

Introduction to Plotly and Bokeh

Plotly or Bokeh must be mounted before you can operate with an instance. You can position them using:

pip install plotly
#or
pip install bokeh

Now, let’s use Plotly to expose what an interactive plot looks like.

Creating Interactive Plots (scatter Plot)

import pandas as pd
import plotly.express as px

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 75000]}
df = pd.DataFrame(data)

# Create an interactive scatter plot with Plotly
fig = px.scatter(df, x='Age', y='Salary', text='Name', title='Interactive Scatter Plot')

# Customize the layout
fig.update_layout(
    xaxis_title='Age',
    yaxis_title='Salary',
    hovermode='closest',  # Show data points on hover
    showlegend=False      # Hide legend for simplicity
)

# Show the interactive plot
fig.show()

Output:

We’re making an interactive scatter plot with Plotly Express in this situation. You can pass your mouse over record points to look for extra information, and the plot may be changed in many ways. Plotly gives you quite a few special options and capabilities that you can use to make interactive visualizations that suit you.

You can get an equal level of interactivity with an exclusive set of syntax if you like Bokeh.

Here’s an easy example of how to use Bokeh:

import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import HoverTool, ColumnDataSource

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 75000]}
df = pd.DataFrame(data)

# Create a ColumnDataSource
source = ColumnDataSource(df)

# Create an interactive scatter plot with Bokeh
p = figure(title='Interactive Scatter Plot', x_axis_label='Age', y_axis_label='Salary', tools='hover,pan,box_zoom,reset')
p.circle('Age', 'Salary', size=10, source=source)

# Add hover tool
hover = HoverTool()
hover.tooltips = [('Name', '@Name'), ('Age', '@Age'), ('Salary', '@Salary')]
p.add_tools(hover)

# Show the interactive plot
show(p)

Output:

In this Bokeh instance, a scatter plot with hover tooltips is created. Bokeh offers many ways to interact with it, and you can customise the plot to fit your needs.

You can use more of Plotly and Bokeh’s capabilities and functions to make interactive visualizations that meet your fact analysis needs.

Summary

Through this guide, we covered a lot of ground in data visualization with the Pandas libraries, providing enough food for thought for those learning the basics and at all skill levels. From building the basic structures with Pandas DataFrames and Series to adopting simple charts like line charts and scatter plots, we have provided you with the essential details for impact data communication.

By diving deeper, we talked about customization and organization. These will help you vividly and cognitively represent your ideas through precise and well-organized presentation. Some tools, like “histograms, density plots, box plots, and violin plots,” enabled me to think of data in a different spatial dimension, hence providing a better understanding of the data distribution and variable comparisons.

Finally, using interactivity in visualization, we explained that Pandas has modalities that can be combined with Plotly and Bokeh to give details of a deep and interesting analysis.

Lastly, the period tour through time series visualizations proved the key to time travel, enabling you to witness those same insights coming straight from the trends and fluctuations and whetting your appetite for more rolling statistics.

Apart from the data analysis you carry out, console yourself to the level that pandas will always be the most trusted friend that transforms raw data into beautiful stories. The implications of this training become evident no matter how much experience you have or what level of data science you are at. It will enhance your skills in data analysis, and hence, your abilities to uncover hidden stories from the data sets will automatically be heightened.

Exert the power of Pandas’ visualization technique and let your data speak for you brightly and concisely. Happy visualizing!

Python Pandas Visualization

Understanding the Basics

A Quick Start to Pandas DataFrame and Series

A DataFrame is

Series

Basic Plotting with Pandas

Basic Plotting Functions

Showcasing Plot Types

Diving Deeper

Customizing Plots

Multiple Plots and Subplots

Advanced Visualizations

Histograms and Density Plots

Creating Histograms and Density Plots with Pandas

Box Plots and Violin Plots

Box Plot

Violin Plot

Time Series Visualizations

Time Series Plot

Rolling Statistics

Interactive Visualization with Pandas

Introduction to Plotly and Bokeh

Creating Interactive Plots (scatter Plot)

Summary

Leave a Reply Cancel reply