As a Python developer working on web scraping projects, I often encounter strings that contain unwanted HTML tags. These tags can interfere with further processing and analysis of the extracted data. In this tutorial, I will explain how to remove HTML tags from a string in Python. After researching, I found two important methods to achieve this task, Let us learn them with the help of examples.
Remove HTML Tags from a String in Python
Let’s say you’re working on a project for a client in the United States. You’ve scraped some data from a website, but the extracted strings contain HTML tags. For example:
html_string = "<p>John Doe, a <strong>renowned scientist</strong> from <a href='https://example.com'>New York</a>, discovered a new species.</p>"Your task is to remove all the HTML tags from this string while preserving the text content.
Check out How to Reverse a String in Python?
Method 1: Use Regular Expressions
One common approach to remove HTML tags is by using regular expressions. Python’s re module provides powerful tools for pattern matching and string manipulation. Here’s how you can use regular expressions to remove HTML tags:
import re
def remove_html_tags(text):
clean = re.compile('<.*?>')
return re.sub(clean, '', text)
html_string = "<p>John Doe, a <strong>renowned scientist</strong> from <a href='https://example.com'>New York</a>, discovered a new species.</p>"
clean_text = remove_html_tags(html_string)
print(clean_text)Output:
John Doe, a renowned scientist from New York, discovered a new species.I have executed the above example code and added the screenshot below.

In this example, we define a function called remove_html_tags() that takes a string text as input. We compile a regular expression pattern '<.*?>' using re.compile(). This pattern matches any text enclosed within angle brackets < > , typically representing HTML tags.
We then use the re.sub() function to substitute all occurrences of the pattern with an empty string '' , effectively removing the HTML tags from the string. The resulting clean text is stored in the clean_text variable and printed.
Read Find the First Number in a String in Python
Method 2: Use BeautifulSoup
Another popular approach to remove HTML tags is by using the BeautifulSoup library. BeautifulSoup is a useful library for parsing HTML and XML documents. It provides a convenient way to extract data from web pages. Here’s how you can use BeautifulSoup to remove HTML tags:
from bs4 import BeautifulSoup
def remove_html_tags(text):
soup = BeautifulSoup(text, "html.parser")
return soup.get_text()
html_string = "<p>John Doe, a <strong>renowned scientist</strong> from <a href='https://example.com'>New York</a>, discovered a new species.</p>"
clean_text = remove_html_tags(html_string)
print(clean_text)Output:
John Doe, a renowned scientist from New York, discovered a new species.I have executed the above example code and added the screenshot below.

In this example, we import the BeautifulSoup class from the bs4 module. We define a function remove_html_tags() that takes a string text as input.
Inside the function, we create a BeautifulSoup object by passing the text and specifying the HTML parser as “html.parser”. This creates a parsed representation of the HTML document.
We then use the get_text() method of the BeautifulSoup object to extract all the text content from the parsed HTML, effectively removing the HTML tags. The resulting clean text is returned by the function.
Check out How to Compare Strings in Python?
Conclusion
In this tutorial, I explained how to remove HTML tags from a string in Python. I discussed some methods, such as using regular expression and using BeautifulSoup.
You may also like to read:
- How to Create a String of N Characters in Python?
- How to Split a String into Equal Parts in Python?
- How to Insert a Python Variable into a String?

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.