Inspiration
We were inspired by the paper associated with the original dataset, which aimed to predict quality through a different method (e.g. training models on red and white wine data separately).
What it does
Our classification model predicts the quality of wine based on 12 physicochemical properties (including color). Quality is classified as good, ok, or bad based on intervals of ratings (7-10 is good, 5-6 is ok, 0-4 is bad).
How we built it
We stacked a lot of classifiers from scikit-learn (and added XGBoost as well).
Challenges we ran into
There were quite a few outliers; we removed them for columns with few outliers, and capped them for other columns to preserve data. Our model was also harder to train because we combined both red and white wine quality datasets, and used three classes instead of two (at first, we approached it as a regression task, which was even harder). We also didn't have time to do as much hyperparameter tuning as we would have liked. Lastly, we almost ran out of time to submit the video, hence why it is a bit hectic and chipmunky.
Accomplishments that we're proud of
We achieved an accuracy of 84% on the test data (varies a little from run to run due to randomness of some models) and implemented an interactive widget for user input at the end.
What we learned
XGBoost is pretty powerful by itself, but stacking still helps mitigate overfitting. We also learned that it is important to consider whether a discrete numeric target calls for a regression model or a classification model, and if it is the latter, how many classes we should assign.
What's next for Wine Quality Classifier
We would have liked to integrate a proper front-end rather than embedded widgets, but we lacked the time to learn such unfamiliar technologies. We also had ideas to integrate an AI API, but lacked the resources to do so.
Built With
- github
- ipywidgets
- matplotlib
- numpy
- pandas
- python
- scikit-learn
- seaborn
- singlestore
- sklearn
- xgboost
Log in or sign up for Devpost to join the conversation.