Wine Quality Classifier

Inspiration

We were inspired by the paper associated with the original dataset, which aimed to predict quality through a different method (e.g. training models on red and white wine data separately).

What it does

Our classification model predicts the quality of wine based on 12 physicochemical properties (including color). Quality is classified as good, ok, or bad based on intervals of ratings (7-10 is good, 5-6 is ok, 0-4 is bad).

How we built it

We stacked a lot of classifiers from scikit-learn (and added XGBoost as well).

Challenges we ran into

There were quite a few outliers; we removed them for columns with few outliers, and capped them for other columns to preserve data. Our model was also harder to train because we combined both red and white wine quality datasets, and used three classes instead of two (at first, we approached it as a regression task, which was even harder). We also didn't have time to do as much hyperparameter tuning as we would have liked. Lastly, we almost ran out of time to submit the video, hence why it is a bit hectic and chipmunky.

Accomplishments that we're proud of

We achieved an accuracy of 84% on the test data (varies a little from run to run due to randomness of some models) and implemented an interactive widget for user input at the end.

What we learned

XGBoost is pretty powerful by itself, but stacking still helps mitigate overfitting. We also learned that it is important to consider whether a discrete numeric target calls for a regression model or a classification model, and if it is the latter, how many classes we should assign.

What's next for Wine Quality Classifier

We would have liked to integrate a proper front-end rather than embedded widgets, but we lacked the time to learn such unfamiliar technologies. We also had ideas to integrate an AI API, but lacked the resources to do so.