{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T13:19:47Z","timestamp":1777295987382,"version":"3.51.4"},"reference-count":73,"publisher":"Wiley","issue":"5","license":[{"start":{"date-parts":[[2022,8,2]],"date-time":"2022-08-02T00:00:00Z","timestamp":1659398400000},"content-version":"am","delay-in-days":365,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#am"},{"start":{"date-parts":[[2021,8,2]],"date-time":"2021-08-02T00:00:00Z","timestamp":1627862400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DMS\u20101554804"],"award-info":[{"award-number":["DMS\u20101554804"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Statistical Analysis"],"published-print":{"date-parts":[[2021,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>A common issue for classification in scientific research and industry is the existence of imbalanced classes. When sample sizes of different classes are imbalanced in training data, naively implementing a classification method often leads to unsatisfactory prediction results on test data. Multiple resampling techniques have been proposed to address the class imbalance issues. Yet, there is no general guidance on when to use each technique. In this article, we provide a paradigm\u2010based review of the common resampling techniques for binary classification under imbalanced class sizes. The paradigms we consider include the classical paradigm that minimizes the overall classification error, the cost\u2010sensitive learning paradigm that minimizes a cost\u2010adjusted weighted type I and type II errors, and the Neyman\u2013Pearson paradigm that minimizes the type II error subject to a type I error constraint. Under each paradigm, we investigate the combination of the resampling techniques and a few state\u2010of\u2010the\u2010art classification methods. For each pair of resampling techniques and classification methods, we use simulation studies and a real dataset on credit card fraud to study the performance under different evaluation metrics. From these extensive numerical experiments, we demonstrate under each classification paradigm, the complex dynamics among resampling techniques, base classification methods, evaluation metrics, and imbalance ratios. We also summarize a few takeaway messages regarding the choices of resampling techniques and base classification methods, which could be helpful for practitioners.<\/jats:p>","DOI":"10.1002\/sam.11538","type":"journal-article","created":{"date-parts":[[2021,8,2]],"date-time":"2021-08-02T14:47:49Z","timestamp":1627915669000},"page":"383-406","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":38,"title":["Imbalanced classification: A paradigm\u2010based review"],"prefix":"10.1002","volume":"14","author":[{"given":"Yang","family":"Feng","sequence":"first","affiliation":[{"name":"Department of Biostatistics School of Global Public Health, New York University New York New York USA"}]},{"given":"Min","family":"Zhou","sequence":"additional","affiliation":[{"name":"Division of Science and Technology Beijing Normal University\u2010Hong Kong Baptist University United International College Zhuhai China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8534-3827","authenticated-orcid":false,"given":"Xin","family":"Tong","sequence":"additional","affiliation":[{"name":"Department of Data Sciences and Operations Marshall School of Business, University of Southern California Los Angeles California USA"}]}],"member":"311","published-online":{"date-parts":[[2021,8,2]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1142\/S021964922040016X"},{"issue":"3","key":"e_1_2_10_3_1","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An introduction to kernel and nearest\u2010neighbor nonparametric regression","volume":"46","author":"Altman N. S.","year":"1992","journal-title":"Am. Stat."},{"key":"e_1_2_10_4_1","doi-asserted-by":"publisher","DOI":"10.5539\/mas.v14n7p92"},{"key":"e_1_2_10_5_1","first-page":"89","article-title":"Efficient surface finish defect detection using reduced rank spline smoothers and probabilistic classifiers","volume":"18","author":"Arnqvist N. P.","year":"2021","journal-title":"Econom. Stat."},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ajhg.2014.05.003"},{"key":"e_1_2_10_7_1","doi-asserted-by":"crossref","unstructured":"J. P.Bradford C.Kunz R.Kohavi C.Brunk andC. E.Brodley Pruning decision trees with misclassification costs European Conference on Machine Learning Springer 1998 pp. 131\u2013136.","DOI":"10.1007\/BFb0026682"},{"key":"e_1_2_10_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(96)00142-2"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_10_10_1","unstructured":"A.Cannon J.Howse D.Hush andC.Scovel Learning with the Neyman\u2013Pearson and min\u2013max criteria Los Alamos National Laboratory Tech. Rep. LA\u2010UR 02\u20102951 2002."},{"key":"e_1_2_10_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compmedimag.2013.12.003"},{"key":"e_1_2_10_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2013.2246188"},{"key":"e_1_2_10_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2013.05.059"},{"key":"e_1_2_10_14_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"T.ChenandC.Guestrin XGBoost: A scalable tree boosting system Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM 2016 pp. 785\u2013794.","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_10_16_1","unstructured":"T.Chen T.He M.Benesty V.Khotilovich Y.Tang H.Cho K.Chen R.Mitchell I.Cano T.Zhou M.Li J.Xie M.Lin Y.Geng andY.Li xgboost: Extreme gradient boosting. R package version 0.90.0.2 2019 available athttps:\/\/CRAN.R\u2010project.org\/package=xgboost."},{"key":"e_1_2_10_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11517-016-1482-0"},{"key":"e_1_2_10_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"e_1_2_10_19_1","doi-asserted-by":"crossref","unstructured":"J.DavisandM.Goadrich The relationship between precision\u2013recall and ROC curves Proceedings of the 23rd International Conference on Machine Learning ACM 2006 pp. 233\u2013240.","DOI":"10.1145\/1143844.1143874"},{"key":"e_1_2_10_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2015.04.022"},{"key":"e_1_2_10_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/312129.312220"},{"key":"e_1_2_10_22_1","unstructured":"C.Elkan The foundations of cost\u2010sensitive learning International Joint Conference on Artificial Intelligence vol. 17 Lawrence Erlbaum Associates Ltd 2001 pp. 973\u2013978."},{"key":"e_1_2_10_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13748-012-0027-5"},{"key":"e_1_2_10_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2011.06.013"},{"key":"e_1_2_10_25_1","doi-asserted-by":"crossref","unstructured":"M.Goadrich L.Oliphant andJ.Shavlik Learning ensembles of first\u2010order clauses for recall\u2010precision curves: A case study in biomedical information extraction International Conference on Inductive Logic Programming Springer 2004 pp. 98\u2013115.","DOI":"10.1007\/978-3-540-30109-7_11"},{"key":"e_1_2_10_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.12.035"},{"key":"e_1_2_10_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/11538059_91"},{"key":"e_1_2_10_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7"},{"key":"e_1_2_10_29_1","unstructured":"H.He Y.Bai E. A.Garcia andS.Li ADASYN: Adaptive synthetic sampling approach for imbalanced learning 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) IEEE 2008 pp. 1322\u20131328."},{"key":"e_1_2_10_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.50"},{"key":"e_1_2_10_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-7138-7"},{"key":"e_1_2_10_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22147-7"},{"key":"e_1_2_10_33_1","first-page":"3","article-title":"Supervised machine learning: A review of classification techniques","volume":"160","author":"Kotsiantis S. B.","year":"2007","journal-title":"Emerg. Artif. Intell. Appl. Comput. Eng."},{"key":"e_1_2_10_34_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007452223027"},{"key":"e_1_2_10_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-6849-3"},{"key":"e_1_2_10_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13748-014-0045-6"},{"issue":"3","key":"e_1_2_10_37_1","first-page":"18","article-title":"Classification and regression by randomforest","volume":"2","author":"Liaw A.","year":"2002","journal-title":"R News"},{"key":"e_1_2_10_38_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1012406528296"},{"key":"e_1_2_10_39_1","doi-asserted-by":"crossref","unstructured":"C. X.Ling Q.Yang J.Wang andS.Zhang Decision trees with minimal costs Proceedings of the Twenty\u2010First International Conference on Machine Learning ACM 2004 p. 69.","DOI":"10.1145\/1015330.1015369"},{"key":"e_1_2_10_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.fss.2014.01.015"},{"key":"e_1_2_10_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2013.07.007"},{"key":"e_1_2_10_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2011.12.043"},{"key":"e_1_2_10_43_1","volume-title":"Discriminant analysis and statistical pattern recognition","author":"McLachlan G. J.","year":"2004"},{"key":"e_1_2_10_44_1","unstructured":"D.Meyer E.Dimitriadou K.Hornik A.Weingessel andF.Leisch e1071: Misc functions of the Department of Statistics Probability Theory Group (formerly: E1071) TU Wien. R package version 1.7\u20102 2019 available athttps:\/\/CRAN.R\u2010project.org\/package=e1071."},{"key":"e_1_2_10_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2015.10.031"},{"key":"e_1_2_10_46_1","doi-asserted-by":"publisher","DOI":"10.2307\/2344614"},{"issue":"4","key":"e_1_2_10_47_1","first-page":"761","article-title":"Infinitely imbalanced logistic regression","volume":"8","author":"Owen A. B.","year":"2007","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_10_48_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1541-0420.2008.01017.x"},{"key":"e_1_2_10_49_1","doi-asserted-by":"publisher","DOI":"10.1198\/jasa.2010.tm08487"},{"key":"e_1_2_10_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7908-2628-9_14"},{"key":"e_1_2_10_51_1","first-page":"2831","article-title":"Neyman\u2013Pearson classification, convexity and stochastic constraints","volume":"12","author":"Rigollet P.","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_10_52_1","unstructured":"I.Rish An empirical study of the naive Bayes classifier IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence vol. 3 2001 pp. 41\u201346."},{"key":"e_1_2_10_53_1","doi-asserted-by":"crossref","unstructured":"D. E.Rumelhart G. E.Hinton andR. J.Williams Learning internal representations by error propagation California Univ San Diego La Jolla Inst for Cognitive Science 1985.","DOI":"10.21236\/ADA164453"},{"key":"e_1_2_10_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.08.051"},{"key":"e_1_2_10_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/21.97458"},{"key":"e_1_2_10_56_1","first-page":"868","article-title":"Discriminative training of Markov logic networks","volume":"5","author":"Singla P.","year":"2005","journal-title":"AAAI"},{"key":"e_1_2_10_57_1","volume-title":"smotefamily: A collection of oversampling techniques for class imbalance problem based on SMOTE. R package version 1.3.1","author":"Siriseriwan W.","year":"2019"},{"key":"e_1_2_10_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2007.04.009"},{"key":"e_1_2_10_59_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001409007326"},{"key":"e_1_2_10_60_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.11.014"},{"issue":"1","key":"e_1_2_10_61_1","first-page":"3011","article-title":"A plug\u2010in approach to Neyman\u2013Pearson classification","volume":"14","author":"Tong X.","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_10_62_1","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.aao1659"},{"key":"e_1_2_10_63_1","doi-asserted-by":"publisher","DOI":"10.1002\/wics.1376"},{"key":"e_1_2_10_64_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.120"},{"key":"e_1_2_10_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11634-014-0167-5"},{"key":"e_1_2_10_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-009-0198-y"},{"key":"e_1_2_10_67_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-012-0178-0"},{"key":"e_1_2_10_68_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.06.108"},{"key":"e_1_2_10_69_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4020-6264-3_67"},{"key":"e_1_2_10_70_1","first-page":"435","article-title":"Cost\u2010sensitive learning by cost\u2010proportionate example weighting","volume":"3","author":"Zadrozny B.","year":"2003","journal-title":"ICDM"},{"key":"e_1_2_10_71_1","doi-asserted-by":"crossref","unstructured":"C.Zhang W.Gao J.Song andJ.Jiang An imbalanced data classification algorithm of improved autoencoder neural network 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI) IEEE 2016 pp. 95\u201399.","DOI":"10.1109\/ICACI.2016.7449810"},{"key":"e_1_2_10_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007741"},{"key":"e_1_2_10_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.17"},{"key":"e_1_2_10_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.bdr.2015.12.001"}],"container-title":["Statistical Analysis and Data Mining: The ASA Data Science Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/sam.11538","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/sam.11538","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/am-pdf\/10.1002\/sam.11538","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/sam.11538","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T17:00:59Z","timestamp":1725555659000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/sam.11538"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,2]]},"references-count":73,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,10]]}},"alternative-id":["10.1002\/sam.11538"],"URL":"https:\/\/doi.org\/10.1002\/sam.11538","archive":["Portico"],"relation":{},"ISSN":["1932-1864","1932-1872"],"issn-type":[{"value":"1932-1864","type":"print"},{"value":"1932-1872","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,2]]},"assertion":[{"value":"2020-08-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-30","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}