How to Sort Pandas DataFrame?

Upgrade Your Skills, Upgrade Your Career - Learn more

When working with large amounts of data, one of the most important skills in data analysis is being able to use structured data sets and turn statistics into information. Think about how you’ll handle getting a huge number of records all the time.

Any kind of dataset could be what you need to work with, like financial records or survey responses. You probably won’t have enough time or space to put them in the right order either. In this case, sorting is the most useful tool that every scientist or data analyst holds. Specifically, it is the basic way to organize data in a certain way, which lets you find trends, patterns, and decisions in large amounts of complex data.

Next up is Pandas, a popular Python library known for how hard it is to work with large amounts of data. We will learn about sorting in Pandas because it is an important part of organizing and making sense of both large and small data sets. This helps us to make sense of the chaos in the data by sorting. This is the tool that will show you how to Sort valuable Data.

Basic Sorting in Pandas:

The’ sort_values ()’ function in Pandas makes sorting very clean. It’s a flexible device that helps you arrange your DataFrame without difficulty. Let’s go over the basics of the feature to show you how to use simple sorting to arrange your information based on one or more columns.

# Importing the Pandas library
import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 32, 28, 22],
    'Salary': [50000, 60000, 75000, 45000]
}

df = pd.DataFrame(data)

# Sorting the DataFrame by the 'Name' column in ascending order
df_sorted_name_asc = df.sort_values(by='Name')

print("DataFrame sorted by Name in ascending order:")
print(df_sorted_name_asc)

Output :

basic sorting

This code uses the ‘sort_values()’ function on the DataFrame ‘df’ to put the rows in ascending order via the ‘Name’ column. The prepared facts are shown in the ensuing DataFrame, ‘df_sorted_name_asc’. Sorting may be extended to consist of multiple columns to give you extra alternatives. Here’s an example of sorting with the aid of both “Age” and “Salary,” which suggests that you can type hierarchically:

# Sorting the DataFrame by 'Age' in ascending order and then by 'Salary' in descending order
df_sorted_multi_cols = df.sort_values(by=['Age', 'Salary'], ascending=[True, False])

print("\nDataFrame sorted by Age in ascending order, then by Salary in descending order:")
print(df_sorted_multi_cols)

sort_values() first sorts the DataFrame with the aid of Age in ascending order, then by way of ‘Salary’ in descending order within every age group. You can select the sorting order with the “ascending” parameter. If you set it to “True,” which is the default, the order will increase, but if you set it to “False,” it will decrease.

This lesson on fundamental sorting gives you the skills you need to use the’sort_values()’ function to organize your Pandas DataFrame. You can now easily type matters, whether or not you are placing names in alphabetical order or breaking down record hierarchies.

Sorting by Index:

Sorting a DataFrame by its index is an essential operation, specifically when the records wish to be hooked up in a sure order. The’sort_index()’ characteristic in Pandas makes this easy by letting you arrange records primarily based on the index of the DataFrame. Let’s look at a manner to use this feature and research more about the versions in ascending and descending order for index sorting.

# Importing the Pandas library
import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 32, 28, 22],
    'Salary': [50000, 60000, 75000, 45000]
}

df = pd.DataFrame(data)

# Custom sorting based on the length of names in descending order
df_custom_sorted_length = df.sort_values(by='Name', key=lambda x: x.str.len(), ascending=False)

print("DataFrame sorted by the length of names in descending order:")
print(df_custom_sorted_length)

Output :

sorting by index

It is done by running the’sort_index()’ function on the DataFrame ‘df’ and indicating that the rows have to be in ascending order based on activating the index. ‘df_sorted_index_asc’. Unraveling the process makes it obvious how the piece of information is given by the index. If you want to change the sorting order, set the “ascending” parameter to “False.”

Let’s use the example of descending order to show this: If you want to change the sorting order, set the “ascending” parameter to “False.” Let’s use the example of descending order to show this:

# Sorting the DataFrame by index in descending order
df_sorted_index_desc = df.sort_index(ascending=False)

print("\nDataFrame sorted by index in descending order:")
print(df_sorted_index_desc)

The ‘sort_index()’ function is then used with ‘descending’ having a ‘False’ value assigned to it. When the rows are sorted with an index from higher to lower order, this creates a new DataFrame called ‘df_sorted_index_desc’. While sorting a data set, the index can facilitate the order of sorting or display essential data. This is where indexing becomes vital to sorting data. ‘sort_index()’ is a few features that come in handy when you want to trade the order of your records to adapt to your analytical needs. You need it to be alphabetical, chronological, or something else for that matter.

Custom Sorting:

Pandas has a powerful feature that helps you sort statistics based on custom regulations or attributes. This lets you alternate how your DataFrame is prepared to fit your needs. ‘sort_values()’ has a ‘key’ parameter that lets you set your own sorting guidelines. Let’s determine how to make the most of this option by looking at a few examples of sorting primarily based on our personal standards or abilities.

# Importing the Pandas library
import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 32, 28, 22],
    'Salary': [50000, 60000, 75000, 45000]
}

df = pd.DataFrame(data)

# Custom sorting based on the length of names in descending order
df_custom_sorted_length = df.sort_values(by='Name', key=lambda x: x.str.len(), ascending=False)

print("DataFrame sorted by the length of names in descending order:")
print(df_custom_sorted_length)

Output :

custom sorting

As shown above, the “sort_values()” function is used to sort the DataFrame “df” by way of the “Name” column. The lambda characteristic ‘x.Str.Len()’ is used with the ‘key’ parameter to find out how long every call is. ‘df_custom_sorted_length’, the resulting DataFrame, suggests the statistics that turned into looked after with the aid of call period in descending order.

In this case, let’s now not neglect custom sorting primarily based on the sum of “Age” and “Salary”:

# Custom sorting based on the sum of 'Age' and 'Salary' in ascending order
df_custom_sorted_sum = df.sort_values(by=['Age', 'Salary'], key=lambda x: x['Age'] + x['Salary'])

print("\nDataFrame sorted by the sum of Age and Salary in ascending order:")
print(df_custom_sorted_sum)

The’sort_values()’ method is used to kind the DataFrame ‘df’ primarily based on each the ‘Age’ and ‘Salary’ columns. A lambda function is used with the “key” parameter to locate the sum of every row’s “Age” and “Salary.” The subsequent DataFrame, ‘df_custom_sorted_sum’, shows the information that was sorted by using the custom standards.

Custom sorting opens up an international of alternatives by letting you set up your statistics in ways that are useful in precise conditions or can be tailor-made to your precise analytical desires. The ‘key’ parameter inside the’sort_values ()’ function is a beneficial device for changing facts.

Sorting Strategies:

The’sort_values()’ characteristic in Pandas lets you select each of the sorting standards and the algorithm that is used. Depending on the type of information and its size, one-of-a-kind sorting algorithms can be used. Each has its own professionals and cons that need to be taken into consideration. The’sort_algorithm’ parameter within the’sort_values()’ feature helps you to select a way to sort the values. Let’s have a look at some of the most not unusual methods to kind records in Pandas:

1. Quicksort:

Many humans use Quicksort as their sorting set of rules because it works well most of the time. The divide-and-overcome set of rules breaks the dataset into smaller components, sorts them, and then returns them collectively.

df_quicksort = df.sort_values(by='Column_Name', algorithm='quicksort')

2. Mergesort:

Mergesort is a strong sorting set of rules that breaks the dataset into smaller pieces and kinds each piece separately, after which the portions are placed lower back collectively. It is a great choice for huge datasets.

df_mergesort = df.sort_values(by='Column_Name', algorithm='mergesort')

3. Heapsort:

Heapsort is essentially building a binary heap data shape and repeatedly eliminating the bottom element from the heap. Although it is now not used as often as quicksort or mergesort, heapsort may be useful in a few situations.

df_heapsort = df.sort_values(by='Column_Name', algorithm='heapsort')

4. Default Sorting Algorithm:

Pandas will use the default sorting set of rules, which is quicksort right now if the “algorithm” parameter isn’t always given. It’s crucial to keep in thoughts, though, that Pandas might also trade the default algorithm in future releases. To keep your code regular, you have to specify the algorithm directly.

df_default_sort = df.sort_values(by='Column_Name')  # Uses the default sorting algorithm (quicksort as of now)

It is possible to trade how your DataFrame is sorted by explicitly putting the “algorithm” parameter inside the “sort_values()” characteristic. When picking a sorting algorithm, think about the type of statistics you have and how large it’s miles. This will assist you to get high-quality performance for your needs. You can choose a sorting method, which gives you even greater control over how you figure out facts in Pandas.

Handling Missing Values during Sorting:

In Pandas, a way to deal with lacking values throughout sorting is a critical component to think about to ensure the sorting procedure goes easily and meets your evaluation goals. The’sort_values()’ function defaults to treating NaN (Not a Number) values as the largest values and placing them at the end of the listing. By default, this makes it positive that values that are not null are sorted first.

1. Sorting with Missing Values:

‘sort_values()’ is a DataFrame, and by default, rows with missing values inside the taken care of column(s) are put on the end of the taken care of end result.

# Importing the Pandas library
import pandas as pd

# Creating a sample DataFrame with missing values
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 32, 28, None, 22],
    'Salary': [50000, 60000, 75000, 45000, None]
}

df = pd.DataFrame(data)

# Sorting the DataFrame by 'Age'
df_sorted_age = df.sort_values(by='Age')

print("DataFrame sorted by 'Age' with missing values:")
print(df_sorted_age)

Output :

sorting with missing values

In this case, the row without an age price (NaN) is placed at the end of the DataFrame that has been taken care of.

2. Controlling the Placement of NaN Values:

You can change where NaN values are placed and how the na_position parameter is used within the sort_values() function. It can be set to “first,” “final,” or “hold,” which is the default.

First: place “NaN” values at the pinnacle of the taken care of list.

Last: This alternative puts NaN values at the end of the sorted list, which is how it works by default.

Keep: This approach continues NaN values where they had been in the DataFrame.

# Sorting the DataFrame by 'Salary' with NaN values placed at the beginning
df_sorted_salary_first = df.sort_values(by='Salary', na_position='first')

print("\nDataFrame sorted by 'Salary' with NaN values at the beginning:")
print(df_sorted_salary_first)

Output :

controlling the placement of nan values

In this example, the ‘na_position=’first” parameter is utilized to place rows with NaN values for salaries on top of the taken care of DataFrame.

Making the right choice for “na_position” relies upon what you need to analyze. For example, putting information with values lacking on top of the list would possibly help you make a decision on which of them want to be checked out extra carefully. But in case you need to know complete and non-null records first, placing them on the give up is probably better.

Summary

When it comes to managing and manipulating data, you need to learn how to take sorted data with Pandas and become an expert at it. This gives analysts the power to turn raw data into factual results that can be used in many situations. Pandas can organize data in a number of ways, including alphabetical order, chronological order, and custom-based sorting. The sort_values() function makes this possible. Depending on how our data behaves, we can choose between different levels of sorting by indices and the sorting method. This article only talks about all of the Pandas’ sorting processes.

Remember that you have the power to make changes. You can use in-place sorting to improve performance and reduce memory usage, or you can create a new sorted DataFrame to improve data visualization and comparison. These same ideas also ensure that any missing values are taken into account during sorting and that the “na_position” parameter is used for better accuracy.

As you start to analyze data today, remember what you’ve learned in this article. May your data be well organized, your insights be broad, and your analysis be felt deeply. Have fun sorting!

Did you like our efforts? If Yes, please give PythonGeeks 5 Stars on Google | Facebook

PythonGeeks Team

The PythonGeeks Team delivers expert-driven tutorials on Python programming, machine learning, Data Science, and AI. We simplify Python concepts for beginners and professionals to help you master coding and advance your career.

Leave a Reply

Your email address will not be published. Required fields are marked *