Replace Multiple Values In Pandas DataFrame Using Str.Replace()

While I was working on a data analysis project, I needed to replace multiple values in a Pandas DataFrame. This is a common task when cleaning and preparing data for analysis.

In this article, I’ll share five useful methods to replace multiple values in Pandas DataFrames. These techniques will help you clean your data faster and more effectively.

Let us start..!

Read Fix “Function Not Implemented for This Dtype” Error in Python

Table of Contents

Method 1: Use the replace() Method

Python replace() method is the simplest way to replace multiple values in a Pandas DataFrame.

Here’s a simple example with US state abbreviations:

import pandas as pd

# Sample data of US customer information
data = {
    'State': ['CA', 'NY', 'TX', 'FL', 'CA', 'NY'],
    'Sales': [1200, 1500, 900, 1100, 1300, 1400]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Replace state abbreviations with full names
df['State'] = df['State'].replace({
    'CA': 'California',
    'NY': 'New York',
    'TX': 'Texas',
    'FL': 'Florida'
})

print("\nDataFrame after replacing state abbreviations:")
print(df)

Output:

Original DataFrame:
  State  Sales
0    CA   1200
1    NY   1500
2    TX    900
3    FL   1100
4    CA   1300
5    NY   1400

DataFrame after replacing state abbreviations:
        State  Sales
0  California   1200
1          NY   1500
2          TX    900
3     Florida   1100
4  California   1300
5          NY   1400

I executed the above example code and added the screenshot below.

In this example, I’m replacing state abbreviations with their full names using a dictionary that maps old values to new values.

The replace() method can also work on the entire DataFrame:

# Replace values across the entire DataFrame
df_replaced = df.replace({
    'California': 'CA',
    'New York': 'NY',
    1200: 'Low Sales',
    1500: 'High Sales'
})

print("\nDataFrame after multiple replacements:")
print(df_replaced)

Check out Convert DataFrame To NumPy Array Without Index in Python

Method 2: Use loc[] for Conditional Replacement

The loc[] method in Python allows you to replace values based on conditions, which gives you more flexibility.

Here’s an example with sales data categorization:

import pandas as pd

# Sample US sales data
data = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Monitor', 'Keyboard'],
    'Sales': [1200, 1800, 950, 500, 300]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Replace values based on conditions
df.loc[df['Sales'] >= 1500, 'Category'] = 'High Value'
df.loc[(df['Sales'] < 1500) & (df['Sales'] >= 800), 'Category'] = 'Medium Value'
df.loc[df['Sales'] < 800, 'Category'] = 'Low Value'

print("\nDataFrame after conditional replacement:")
print(df)

Output:

Original DataFrame:
      Product  Sales
0      Laptop   1200
1  Smartphone   1800
2      Tablet    950
3     Monitor    500
4    Keyboard    300

DataFrame after conditional replacement:
      Product  Sales      Category
0      Laptop   1200  Medium Value
1  Smartphone   1800    High Value
2      Tablet    950  Medium Value
3     Monitor    500     Low Value
4    Keyboard    300     Low Value

I executed the above example code and added the screenshot below.

This method is particularly useful when you need to create new categories based on existing values.

Method 3: Use map() Function

The map() function in Python is another elegant way to replace values in a Series (single column):

import pandas as pd

# Sample US customer data
data = {
    'CustomerID': [101, 102, 103, 104, 105],
    'State': ['California', 'New York', 'Texas', 'Florida', 'California'],
    'Status': ['Active', 'Inactive', 'Active', 'Active', 'Inactive']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Create a mapping dictionary
status_map = {
    'Active': 'Current Customer', 
    'Inactive': 'Former Customer'
}

# Apply mapping to the Status column
df['Status'] = df['Status'].map(status_map)

print("\nDataFrame after mapping:")
print(df)

Output:

Original DataFrame:
   CustomerID       State    Status
0         101  California    Active
1         102    New York  Inactive
2         103       Texas    Active
3         104     Florida    Active
4         105  California  Inactive

DataFrame after mapping:
   CustomerID       State            Status
0         101  California  Current Customer
1         102    New York   Former Customer
2         103       Texas  Current Customer
3         104     Florida  Current Customer
4         105  California   Former Customer

I executed the above example code and added the screenshot below.

The advantage of map() is its simplicity and readability. However, it only works on a single column at a time.

Method 4: Use numpy.where() for Conditional Replacement

For more complex conditional replacements, Python numpy.where() provides an efficient solution:

import pandas as pd
import numpy as np

# Sample US election data
data = {
    'State': ['California', 'Texas', 'New York', 'Florida', 'Ohio'],
    'Votes_2016': [8753788, 9000000, 7721453, 9420039, 5496487],
    'Votes_2020': [9420039, 8753788, 8804012, 7721453, 5496487]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Add a column showing vote change trend
df['Trend'] = np.where(
    df['Votes_2020'] > df['Votes_2016'], 
    'Increased', 
    np.where(
        df['Votes_2020'] < df['Votes_2016'], 
        'Decreased', 
        'No Change'
    )
)

print("\nDataFrame with vote trend analysis:")
print(df)

The numpy.where() function works like an if-else statement: if the condition is true, return the first value; otherwise, return the second value. You can nest these functions for more complex conditions.

Check how to Read a CSV into a dictionary using Pandas in Python

Method 5: Use apply() with a Custom Function

For the most complex replacements, you can use the apply() method in Python with a custom function:

import pandas as pd

# Sample US housing data
data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Price': [1200000, 900000, 450000, 350000, 420000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Define a function to categorize housing prices
def price_category(price):
    if price > 1000000:
        return 'Luxury'
    elif price > 500000:
        return 'Premium'
    elif price > 300000:
        return 'Standard'
    else:
        return 'Budget'

# Apply the function to create a new column
df['Category'] = df['Price'].apply(price_category)

print("\nDataFrame with price categories:")
print(df)

The apply() method is extremely flexible because you can define any logic in your custom function.

You can also use lambda functions for simpler transformations:

# Using a lambda function for simpler transformations
df['Price_in_K'] = df['Price'].apply(lambda x: f"${x/1000:.0f}K")

print("\nDataFrame with formatted prices:")
print(df)

Multiple Column Replacements in Pandas

Sometimes you may need to replace values in multiple columns. Here’s how you can do it:

import pandas as pd

# Sample US demographics data
data = {
    'Gender': ['M', 'F', 'M', 'F', 'M'],
    'Education': ['HS', 'BA', 'MA', 'PHD', 'HS'],
    'Employment': ['FT', 'PT', 'UN', 'FT', 'PT']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Define replacement dictionaries for each column
replacements = {
    'Gender': {'M': 'Male', 'F': 'Female'},
    'Education': {'HS': 'High School', 'BA': 'Bachelor', 'MA': 'Master', 'PHD': 'Doctorate'},
    'Employment': {'FT': 'Full-time', 'PT': 'Part-time', 'UN': 'Unemployed'}
}

# Apply replacements to multiple columns
for column, mapping in replacements.items():
    df[column] = df[column].replace(mapping)

print("\nDataFrame after multiple column replacements:")
print(df)

This approach allows you to organize your replacements by column, making your code more maintainable.

Check out Convert Pandas Dataframe to Tensor Dataset

Replace Values with Regular Expressions in Pandas

For more complex pattern matching, you can use regular expressions with the replace() method in Python:

import pandas as pd

# Sample US phone numbers data
data = {
    'Name': ['John Smith', 'Jane Doe', 'Robert Johnson', 'Mary Williams'],
    'Phone': ['(212) 555-1234', '312-555-9876', '(415) 555-5678', '702.555.1234']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Standardize phone number format using regex
df['Phone'] = df['Phone'].replace({
    r'\((\d{3})\)[\s-]?': r'\1-',  # Replace (XXX) with XXX-
    r'\.': '-'                     # Replace dots with dashes
}, regex=True)

print("\nDataFrame with standardized phone numbers:")
print(df)

Regular expressions are useful but can be complex. Use them when you need to match patterns rather than exact values.

I hope you found this article helpful. The methods that I explained in this tutorial are using the replace() method, loc[], numpy.where() for conditional replacement, and apply() with a custom function.

You may also read:

Bijay Kumar

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.

enjoysharepoint.com/