Central or Centel?

Know if it's fact or cap before you react!

📖 About the Project

Inspiration

This project was inspired by the growing trend of misinformation in NBA Twitter. Fake accounts like @NBACentel closely imitate credible sources like @TheNBACentral, and they frequently go viral with made-up trades and fake news. As a basketball fan and a Data Science student, I wanted to build something practical that would help users quickly determine whether a tweet is real or fake, especially during trade deadlines and high-profile events.

What it does

Central or Centel? is a real-time sports tweet verifier that uses machine learning to determine whether an NBA-related tweet is real or fake.

It analyzes tweet content using a trained Naive Bayes classifier
Displays a verdict (real/fake) and a confidence score
Includes formatting instructions so users paste tweets correctly
Logs every prediction (tweet, label, confidence) to a CSV file

- Offers a clean, interactive web app interface via Streamlit

How I Built It

Used placeholder tweets similar to that of real NBA journalists and parody accounts to create a balanced dataset of 1000 tweets (500 real, 500 fake)
Used CountVectorizer to convert tweet text into word frequency vectors
Trained a Multinomial Naive Bayes classifier using scikit-learn
Built a Streamlit app where users can paste any tweet and see whether it’s likely real or fake, along with a confidence score
Added a logging feature that records each prediction to a CSV (log.csv) and displays it in the UI

- Ensured the app validates tweet formatting and provides helpful feedback to the user

Challenges I Faced

Due to Twitter's scraping limitations, I couldn't use snscrape or access live tweets without verified Twitter Developer credentials
Had to use placeholder tweets that caused the model to be slightly less accurate

- Designing a reliable UI and ensuring clear input formatting for model predictions

Accomplishments that I'm proud of

Implemented a logging system to track predictions over time
Developed a project with a meaningful connection to pop culture
Learned to work around real-world scraping limitations (like Twitter’s API restrictions) while still delivering a fully functioning product

What I Learned

How to build a machine learning pipeline from scratch for text classification
The use of CountVectorizer to turn tweets into numerical data
Training and deploying a Multinomial Naive Bayes model
Designing a beginner-friendly UI with Streamlit
Implementing a logging system that tracks every prediction with confidence scores
Importance of curating a meaningful dataset when real-time data collection is limited

What's next for Central or Centel?

-Add Twitter API integration to scrape live tweets -Expand dataset to include other sports like football, baseball, tennis, and cricket to cover misinformation across all major leagues -Add a feedback feature so users can report incorrect predictions

Built With

naive-bayes
python
streamlit

Updates

krishang khandelwal started this project — Apr 06, 2025 05:03 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.