Add Model Ensembling Tutorial#640
Conversation
|
Looking to receive feedback for implementation and structure of this tutorial. Let me know on how we can improve this. Thanks! |
|
Thanks Darryl for the tutorial. My first suggestion is that we can start simple without the need of training any model, maybe based on a simple voting mechanism to identify top-K ranked list from two models (BPR and WMF). We can then use that as a baseline for more sophisticated ensembling techniques. |
|
Simplest bagging approach could be as follows:
|
|
For a more sophisticated approach, think about this as a meta-learning problem. We treat predictions of M base models as input features for another meta-model to learn on top. This meta-model could be any ML model -- linear-regression/random-forests/etc... We can structure this part to be flexible so anyone could experiment with other libraries (e.g., scikit-learn, lightgbm, xgboost). |
|
Thanks Darryl. This looks great! Here are some comments:
|
qtuantruong
left a comment
There was a problem hiding this comment.
LGTM. Let’s merge when ready
* add model ensembling tutorial * update ensembling notebook * refractor codes * Revamped tutorial to include simple borda count * Restructured model ensembling tutorial * Update tutorial result representation * Update tutorial * Update tutorial * Update model ensembling tutorial based on feedback * Update linear regression/random forest inference data set * WIP Initial experimental calculation * preliminary calculation of experimental comparison * Update recall@K and precision@K evaluation * Add borda count, enhanced wmf to comparison * Revised model ensembling tutorial * Fix bug * Revised markdown and description of tutorial * Enhance tutorial content * Simplify introduction * Enhance tutorial * Update tutorial * Updated model ensembling tutorial * Optimize inference code * Optimize inference code --------- Co-authored-by: tqtg <[email protected]>
* add model ensembling tutorial * update ensembling notebook * refractor codes * Revamped tutorial to include simple borda count * Restructured model ensembling tutorial * Update tutorial result representation * Update tutorial * Update tutorial * Update model ensembling tutorial based on feedback * Update linear regression/random forest inference data set * WIP Initial experimental calculation * preliminary calculation of experimental comparison * Update recall@K and precision@K evaluation * Add borda count, enhanced wmf to comparison * Revised model ensembling tutorial * Fix bug * Revised markdown and description of tutorial * Enhance tutorial content * Simplify introduction * Enhance tutorial * Update tutorial * Updated model ensembling tutorial * Optimize inference code * Optimize inference code --------- Co-authored-by: tqtg <[email protected]>
Description
In this PR, a model ensembling tutorial is added. This tutorial utilizes scikit-learn to perform ensembling on top of trained models on Cornac.
Related Issues
Checklist:
README.md(if you are adding a new model).examples/README.md(if you are adding a new example).datasets/README.md(if you are adding a new dataset).