Sentiment Analysis with Python

Get ready to crack interviews of top MNCs with Placement-ready courses Learn More!

Films have been a crucial component of our existence for a long time. We all love watching movies and have our opinions about them. Sentiment analysis is a machine learning technique that is used to analyze and classify opinions expressed in text. In this blog, we will learn how to perform sentiment analysis on movie reviews using Python.

Sentiment analysis is a technique that involves determining the sentiment of a piece of text. The emotional tone conveyed in movies can vary from positive to negative to neutral. In the context of movie reviews, sentiment analysis can be used to determine whether a review is positive, negative, or neutral.

Sentiment analysis can be performed using various techniques such as rule-based systems, machine learning, and deep learning. In this blog, we will focus on using machine learning to perform sentiment analysis on movie reviews.

Prerequisites:

Before diving into the project, it is essential to have some prior knowledge of the following:

1. Python basics – understanding of data types, operators, functions, and loops.
2. Libraries – NLTK, Pandas, NumPy, Scikit-learn, Matplotlib.
3. Understanding of Natural Language Processing (NLP) concepts.
4. Basic knowledge of Machine Learning algorithms.

Steps to Perform Sentiment Analysis on Movie Reviews:

1. Data Collection: We need to collect the data to train our machine learning model. We can use various datasets available online, such as the IMDb movie reviews dataset.
2. Data Preprocessing: The collected data needs to be preprocessed before training the machine learning model. This involves removing stop words, converting the text to lowercase, and stemming.
3. Feature Extraction: We need to extract the features from the preprocessed text. To achieve this, methods such as bag-of-words, n-grams, and word embeddings can be utilized.
4. Model Training: We need to train the machine learning model using the extracted features and the corresponding sentiment labels.
5. Model Evaluation: We need to evaluate the performance of the machine learning model using various metrics such as accuracy, precision, recall, and F1-score.

Python Libraries for Sentiment Analysis:

Python provides various libraries that can be used for performing sentiment analysis on movie reviews. Some of the popular libraries are:

1. TextBlob: A Python library for processing textual data. It provides a simple API for performing common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, and sentiment analysis.
2. NLTK: A Python library for natural language processing. It provides various tools and resources such as corpora, lexicons, and algorithms for performing various natural language processing tasks.
3. Scikit-learn: A Python library for machine learning. It provides various tools and algorithms for performing machine learning tasks such as classification, regression, and clustering.

Code Implementation:

Let us now implement sentiment analysis on movie reviews using Python. We will be using the TextBlob library for performing sentiment analysis.

First, we need to install the TextBlob library using the following command:

pip install textblob

Next, we need to import the TextBlob library and create a TextBlob object as shown below:

from textblob import TextBlob

text = "This movie is very good. I enjoyed it a lot."

blob = TextBlob(text)

We can then use the sentiment property of the TextBlob object to get the sentiment polarity and subjectivity as shown below:

polarity = blob.sentiment.polarity
subjectivity = blob.sentiment.subjectivity

print("Sentiment Polarity: ", polarity)
print("Sentiment Subjectivity: ", subjectivity)

Output:

Sentiment Polarity: 0.705
Sentiment Subjectivity: 0.74

Example of a Popular Movie Review:

To understand Sentiment Analysis, let’s take an example of a popular movie review. The first step is to extract the data from the web page or dataset. We will use BeautifulSoup, a Python package for web scraping, to extract the review data.

After extracting the data, the next step is to clean the data by removing unnecessary characters, stop words, and HTML tags. Once the data is clean, we can tokenize the text into individual words. Tokenizing is the process of breaking a large text into smaller chunks or tokens.

Once we have the tokens, we can normalize the data by converting everything to lowercase and removing punctuation. After normalization, we can label each sentiment as positive, negative, or neutral. This is known as supervised learning, where we train a model to classify the sentiments.

Steps to Perform Sentiment Analysis on Movie Reviews:

1. Data Collection – Scrape the reviews from a website or dataset.
Code:

import requests
from bs4 import BeautifulSoup

url = 'https://www.imdb.com/title/tt1375666/reviews?ref_=tt_urv'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
reviews = soup.findAll('div', {'class': 'text show-more__control'})

2. Data Cleaning – Clean the data by removing unwanted characters and HTML tags.
Code:

import re

def clean_review(text):
    # Remove HTML tags
    text = re.sub('<.*?>', '', text)
    # Remove punctuation and digits
    text = re.sub('[^a-zA-Z]+', ' ', text)
    # Remove stopwords
    text = ' '.join([word.lower() for word in text.split() if word.lower() not in stopwords])
    return text

3. Tokenizing – Tokenize the text into individual words.
Code:

from nltk.tokenize import word_tokenize

def tokenize(text):
    return word_tokenize(text)

4. Normalizing – Normalize the text by converting it to lowercase and removing punctuation.
Code:

def normalize(tokens):
    normalized_tokens = []
    for token in tokens:
        # Convert to lowercase
        token = token.lower()
        # Remove punctuation
        token = re.sub('[^a-zA-Z]+', '', token)
        # Ignore stopwords
        if token not in stopwords:
            normalized_tokens.append(token)
    return normalized_tokens

5. Labelling – Label the sentiments as positive, negative, or neutral.
Code:

import pandas as pd

# Load the data
data = pd.read_csv('movie_reviews.csv')

# Label the sentiments
def label_sentiment(score):
    if score > 6:
        return 'positive'
    elif score < 5:
        return 'negative'
    else:
        return 'neutral'

data['sentiment'] = data['score'].apply(label_sentiment)

Conclusion

In conclusion, Sentiment Analysis is an essential tool for analyzing text data and extracting meaningful insights from it. Python provides several libraries that make it easy to perform sentiment analysis on textual data, with the NLTK library being one of the most popular.

In this blog, we learned about the basics of sentiment analysis, how it works, and how to implement it in Python using the NLTK library. We also saw a practical example of using sentiment analysis to classify movie reviews as positive or negative. With the increasing availability of textual data, Sentiment Analysis is a valuable tool for businesses and individuals alike, and learning how to use it can provide valuable insights into customer opinions, trends, and behaviour.

Your 15 seconds will encourage us to work even harder
Please share your happy experience on Google | Facebook


PythonGeeks Team

PythonGeeks Team is dedicated to creating beginner-friendly and advanced tutorials on Python programming, AI, ML, Data Science and more. From web development to machine learning, we help learners build strong foundations and excel in their Python journey.

Leave a Reply

Your email address will not be published. Required fields are marked *