This project is a complete and ready-to-use application that classifies customer reviews into predefined categories. It leverages a fine-tuned DistilBERT model for high-accuracy predictions and features a user-friendly Streamlit frontend. The entire application is designed for seamless deployment on the Hugging Face Hub.
-
Data Preprocessing: Scripts to clean and prepare raw customer review data.
-
Model Training: A robust training pipeline for fine-tuning a DistilBERT model for multi-class classification.
-
Model Hosting: The trained model is hosted on the Hugging Face Hub, enabling direct use without local storage.
-
Comprehensive Evaluation: The
evaluate.pyscript generates a full suite of metrics, including a confusion matrix and visualizations of model performance. -
Streamlit App: A single-file, production-ready Streamlit application that handles both the frontend and backend logic.
-
Professional UI: The app includes a responsive design, clear metric explanations, and professional-looking charts.
-
Data Preparation: The
prepare_data.pyscript cleans and preprocesses the raw data fromdata/sample_data.csv. -
Model Training: The
train.pyscript fine-tunes the DistilBERT model. The output model and tokenizer are saved locally. -
Model Upload: The
upload_to_hf.pyscript pushes the trained model and tokenizer to the Hugging Face Hub, making it accessible to the app. -
Model Evaluation: The
evaluate.pyscript evaluates the model's performance on a test set and saves the results (summary_report.json,confusion_matrix.png, etc.) tooutput/results/. -
App Deployment: The
app.pyscript serves as the main application. It loads the model directly from the Hugging Face Hub and provides a user interface for classification and metric visualization.
The application uses a simplified architecture where the Streamlit frontend directly loads and runs the model. This eliminates the need for a separate FastAPI backend and is the standard practice for deploying a single-model app on Hugging Face Spaces.
Customer-Review-Classifier/
├── data/
│ ├── sample\_data.csv \# Raw customer review data
│ └── sample\_data\_cleaned.csv \# Cleaned and encoded data
├── output/
│ ├── distilbert\_model/ \# Local copy of the trained model (optional)
│ ├── results/
│ │ ├── summary\_report.json \# JSON report of all evaluation metrics
│ │ ├── f1\_scores.png \# F1 scores visualization
│ │ ├── confidence\_distribution.png \# Confidence distribution visualization
│ │ └── confusion\_matrix.png \# Visual representation of model confusion
├── src/
│ ├── prepare\_data.py \# Script for data cleaning and encoding
│ ├── train.py \# Script for fine-tuning the model
│ ├── evaluate.py \# Script for model evaluation and reporting
│ ├── app.py \# The main Streamlit application
│ └── upload\_to\_hf.py \# Script to push the model to the Hugging Face Hub
├── requirements.txt \# Project dependencies
├── .gitignore \# Specifies files to ignore in Git (e.g., large models)
└── README.md
-
Install dependencies:
pip install -r requirements.txt
-
Prepare data:
python src/prepare_data.py
-
Train model:
python src/train.py
-
Evaluate model:
python src/evaluate.py
-
Upload model to Hugging Face Hub:
huggingface-cli login python src/upload_to_hf.py
-
Start the app:
python src/app.py
-
Access the app:
- Open http://localhost:8501 in your browser after running the command above.
- Large model files are not tracked in Git. See
.gitignore. - For custom categories, update
cat_mapinapp.pyandapi.py. - For issues or improvements, open an issue or pull request.
Author: Sayantan Ghosh (Lazycoder03)