This project analyzes YouTube comments related to Jake Paul and Mike Tyson using natural language processing (NLP) to classify sentiments and detect emotions. The objective is to reveal public opinions and emotional responses toward these figures through data-driven insights.
Sentiment analysis, also known as opinion mining, is the process of evaluating text to determine its emotional toneβcategorized as positive, negative, or neutral. Businesses use it to understand customer feedback, and here, we apply it to decode YouTube comments, offering a window into viewer perceptions.
To replicate this analysis, install the following Python libraries:
pandas: For managing and structuring datanumpy: For numerical computationsseaborn: For advanced data visualizationmatplotlib: For creating plots and chartsnltk: For text preprocessing and NLP taskswordcloud: For generating visual word clouds
The dataset is cleaned and prepared through these steps:
- Remove Duplicates: Eliminates redundant comments for accurate analysis.
- Handle Missing Values: Drops rows with incomplete data to ensure quality.
- Drop Unnecessary Columns: Removes published_at and author as theyβre not relevant.
The processed dataset includes:
comment: The text of each YouTube commentlike_count: Number of likes, reflecting engagementsentiment: Classified as Positive, Neutral, or Negativeemotions: Detected emotions (e.g., Joy, Anger, Trust)- Emotion intensity scores: Numeric values for anger, fear, joy, trust, anticipation, surprise, sadness, disgust, positive, and negative
The analysis leverages multiple visualization techniques to interpret the data:
A visual representation of frequently used words in comments, where word size indicates frequency. This highlights prominent themes or terms.
A plot showing the proportion of positive, neutral, and negative comments, providing a snapshot of overall sentiment.
A color-coded matrix revealing relationships between emotions. Key observations:
angerandnegative: Strong correlation (0.79)positiveandtrust: High correlation (0.81)joyandpositive: Strong link (0.80)sadnessandjoy: Weak correlation (0.28)
Ranks emotions by their average intensity across comments, identifying the most prevalent feelings.
A grid of plots showing pairwise relationships between emotions, with kernel density estimations (KDE) on the diagonal. Insights include wider distributions for positive and negative scores, and lower intensity for sadness and anger.
- Sentiment Breakdown: Uncovers the dominant sentiment among viewers.
- Emotional Patterns: Highlights connections, such as the strong tie between positivity and trust.
- Dominant Emotions: Identifies which emotions resonate most in the comment dataset.