📖 About the Project
Inspiration
This project was inspired by the growing trend of misinformation in NBA Twitter. Fake accounts like @NBACentel closely imitate credible sources like @TheNBACentral, and they frequently go viral with made-up trades and fake news. As a basketball fan and a Data Science student, I wanted to build something practical that would help users quickly determine whether a tweet is real or fake, especially during trade deadlines and high-profile events.
What it does
Central or Centel? is a real-time sports tweet verifier that uses machine learning to determine whether an NBA-related tweet is real or fake.
- It analyzes tweet content using a trained Naive Bayes classifier
- Displays a verdict (real/fake) and a confidence score
- Includes formatting instructions so users paste tweets correctly
- Logs every prediction (tweet, label, confidence) to a CSV file
- Offers a clean, interactive web app interface via Streamlit
How I Built It
- Used placeholder tweets similar to that of real NBA journalists and parody accounts to create a balanced dataset of 1000 tweets (500 real, 500 fake)
- Used
CountVectorizerto convert tweet text into word frequency vectors - Trained a Multinomial Naive Bayes classifier using
scikit-learn - Built a Streamlit app where users can paste any tweet and see whether it’s likely real or fake, along with a confidence score
- Added a logging feature that records each prediction to a CSV (
log.csv) and displays it in the UI
- Ensured the app validates tweet formatting and provides helpful feedback to the user
Challenges I Faced
- Due to Twitter's scraping limitations, I couldn't use
snscrapeor access live tweets without verified Twitter Developer credentials - Had to use placeholder tweets that caused the model to be slightly less accurate
- Designing a reliable UI and ensuring clear input formatting for model predictions
Accomplishments that I'm proud of
- Implemented a logging system to track predictions over time
- Developed a project with a meaningful connection to pop culture
- Learned to work around real-world scraping limitations (like Twitter’s API restrictions) while still delivering a fully functioning product
What I Learned
- How to build a machine learning pipeline from scratch for text classification
- The use of CountVectorizer to turn tweets into numerical data
- Training and deploying a Multinomial Naive Bayes model
- Designing a beginner-friendly UI with Streamlit
- Implementing a logging system that tracks every prediction with confidence scores
- Importance of curating a meaningful dataset when real-time data collection is limited
What's next for Central or Centel?
-Add Twitter API integration to scrape live tweets -Expand dataset to include other sports like football, baseball, tennis, and cricket to cover misinformation across all major leagues -Add a feedback feature so users can report incorrect predictions
Built With
- naive-bayes
- python
- streamlit
Log in or sign up for Devpost to join the conversation.