📖 About the Project

Inspiration

This project was inspired by the growing trend of misinformation in NBA Twitter. Fake accounts like @NBACentel closely imitate credible sources like @TheNBACentral, and they frequently go viral with made-up trades and fake news. As a basketball fan and a Data Science student, I wanted to build something practical that would help users quickly determine whether a tweet is real or fake, especially during trade deadlines and high-profile events.


What it does

Central or Centel? is a real-time sports tweet verifier that uses machine learning to determine whether an NBA-related tweet is real or fake.

  • It analyzes tweet content using a trained Naive Bayes classifier
  • Displays a verdict (real/fake) and a confidence score
  • Includes formatting instructions so users paste tweets correctly
  • Logs every prediction (tweet, label, confidence) to a CSV file

- Offers a clean, interactive web app interface via Streamlit

How I Built It

  • Used placeholder tweets similar to that of real NBA journalists and parody accounts to create a balanced dataset of 1000 tweets (500 real, 500 fake)
  • Used CountVectorizer to convert tweet text into word frequency vectors
  • Trained a Multinomial Naive Bayes classifier using scikit-learn
  • Built a Streamlit app where users can paste any tweet and see whether it’s likely real or fake, along with a confidence score
  • Added a logging feature that records each prediction to a CSV (log.csv) and displays it in the UI

- Ensured the app validates tweet formatting and provides helpful feedback to the user

Challenges I Faced

  • Due to Twitter's scraping limitations, I couldn't use snscrape or access live tweets without verified Twitter Developer credentials
  • Had to use placeholder tweets that caused the model to be slightly less accurate

- Designing a reliable UI and ensuring clear input formatting for model predictions

Accomplishments that I'm proud of

  • Implemented a logging system to track predictions over time
  • Developed a project with a meaningful connection to pop culture
  • Learned to work around real-world scraping limitations (like Twitter’s API restrictions) while still delivering a fully functioning product

What I Learned

  • How to build a machine learning pipeline from scratch for text classification
  • The use of CountVectorizer to turn tweets into numerical data
  • Training and deploying a Multinomial Naive Bayes model
  • Designing a beginner-friendly UI with Streamlit
  • Implementing a logging system that tracks every prediction with confidence scores
  • Importance of curating a meaningful dataset when real-time data collection is limited

What's next for Central or Centel?

-Add Twitter API integration to scrape live tweets -Expand dataset to include other sports like football, baseball, tennis, and cricket to cover misinformation across all major leagues -Add a feedback feature so users can report incorrect predictions

Built With

Share this project:

Updates