{"id":11687,"date":"2021-01-12T15:45:57","date_gmt":"2021-01-12T15:45:57","guid":{"rendered":"https:\/\/www.askpython.com\/?p=11687"},"modified":"2022-08-06T13:29:28","modified_gmt":"2022-08-06T13:29:28","slug":"roc-curves-machine-learning","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/roc-curves-machine-learning","title":{"rendered":"ROC curves in Machine Learning"},"content":{"rendered":"\n<p>The ROC curve stands for <strong>Receiver Operating Characteristic curve<\/strong>. ROC curves display the performance of a classification model. <\/p>\n\n\n\n<p>ROC tells us how good the model is for distinguishing between the given classes, in terms of the predicted probability.<\/p>\n\n\n\n<p>In this article, we will understand ROC curves, what is AUC, and implement a binary classification problem to understand how to plot the ROC curve for a model. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction to ROC Curves<\/h2>\n\n\n\n<p>Suppose we have a <a href=\"https:\/\/www.askpython.com\/python\/examples\/logistic-regression-from-scratch\" class=\"rank-math-link\">Logistic regression model<\/a> that classifies an event as True or False. We know that the default threshold value for classifying a point as True or False is 0.5 in Logistic regression but we can alter this threshold value to match according to our need.<\/p>\n\n\n\n<p>So, the ROC curve is a plot of the false positive rate (FPR) (x-axis) vs. the true positive rate(TPR) (y-axis) for a number of different candidate threshold values between 0.0 and 1.0.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understanding TPR and FPR<\/h2>\n\n\n\n<p>As mentioned, a ROC curve is dependent on True Positive Rate and False Positive Rate let&#8217;s see what they are.<\/p>\n\n\n\n<p><strong>True Positive Rate:<\/strong> The true positive rate is calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nTrue Positive Rate = True Positives \/ (True Positives + False Negatives)\n<\/pre><\/div>\n\n\n<p> <strong>False Positive Rate:<\/strong> The false-positive rate is calculated as the number of false positives divided by the sum of the number of false positives and the number of true negatives.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nFalse Positive Rate = False Positives \/ (False Positives + True Negatives)\n<\/pre><\/div>\n\n\n<p>For different threshold values we will get different TPR and FPR. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"526\" height=\"409\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-Curve.jpeg\" alt=\"ROC Curve\" class=\"wp-image-11889\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-Curve.jpeg 526w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-Curve-300x233.jpeg 300w\" sizes=\"auto, (max-width: 526px) 100vw, 526px\" \/><figcaption><strong>ROC Curve<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Why do we use ROC Curves?<\/h2>\n\n\n\n<p><strong>ROC Curves are useful for the following reasons:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The <strong><em>area under the curve (AUC)<\/em><\/strong> can be used as an indicator of the performance of the model.<\/li><li>Different models can be compared against each other based on their ROC curves.<\/li><\/ul>\n\n\n\n<p><strong>To get the best model we want to increase our True Positive Rate and Reduce our False Positive Rate (TPR = 1, FPR = 0). <\/strong><\/p>\n\n\n\n<p>This means that our model will be able to separate the classes correctly. Such models are known as skillful models. In real life, this is never achieved. <\/p>\n\n\n\n<p> A model with no skill at each threshold is represented by a diagonal line from the bottom left of the plot to the top right (Blue line in the above figure). Such models have AUC 0.5. Such models have equal TPR and FPR for every value of the threshold. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Plotting ROC Curves in Python<\/h2>\n\n\n\n<p>Let&#8217;s now build a binary classifier and plot it&#8217;s ROC curve to better understand the process.<\/p>\n\n\n\n<p>We will use a Logistic Regression model for this example. We&#8217;re working with three important libraries here &#8211; <a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/python-matplotlib\" class=\"rank-math-link\">Matplotlib<\/a>, <a href=\"https:\/\/www.askpython.com\/python-modules\/numpy\/python-numpy-module\" class=\"rank-math-link\">Numpy<\/a>, and sklearn.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#Importing Required Modules\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import make_classification\nfrom  sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import roc_curve\n\n##Creating a Dataset\nX, label = make_classification(n_samples=500, n_classes=2, weights=&#x5B;1,1], random_state=100)\n\n#Splitting the data into train and test sets\nX_train, X_test, y_train, y_test = train_test_split(X, label, test_size=0.3, random_state=1)\n\n#Creating the class object and \nmodel = LogisticRegression()\nmodel.fit(X_train, y_train)\n\n#predict probabilities\nprobs = model.predict_proba(testX)\n\n#Keeping only positive class\nprobs = probs&#x5B;:, 1]\n\n#Calculating the FPR and TPR\nfpr, tpr, thresholds = roc_curve(testy, probs)\n\n#Plotting the figure\nplt.figure(figsize = (10,6))\nplt.plot(fpr, tpr, color=&#039;red&#039;, label=&#039;ROC&#039;)\nplt.plot(&#x5B;0, 1], &#x5B;0, 1], color=&#039;darkblue&#039;, linestyle=&#039;--&#039;)\nplt.xlabel(&#039;False Positive Rate&#039;)\nplt.ylabel(&#039;True Positive Rate&#039;)\nplt.title(&#039;Receiver Operating Characteristic Curve&#039;)\nplt.legend()\nplt.show()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1536\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-scaled.jpeg\" alt=\"ROC Curve Of Logistic Regression Model\" class=\"wp-image-11901\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-scaled.jpeg 2560w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-300x180.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-1024x614.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-768x461.jpeg 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-1536x922.jpeg 1536w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/12\/ROC-curve-of-Logistic-Regression-Model-2048x1229.jpeg 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption>ROC Curve Of Logistic Regression Model<\/figcaption><\/figure><\/div>\n\n\n\n<p>The sklearn module provides us with <code>roc_curve<\/code> function that returns False Positive Rates and True Positive Rates as the output. <\/p>\n\n\n\n<p>This function takes in actual probabilities of both the classes and a the predicted positive probability array calculated using <code>.predict_proba( )<\/code> method of <code>LogisticRegression<\/code> class.<\/p>\n\n\n\n<p>There you go, now we know how to plot ROC curve for a binary classification model.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>In this article we gained some information about ROC curves and why it is important. We also got some idea about True Positive Rates and False Positive Rates and how ROC curves are dependent on them. Finally we looked into the code to plot ROC curves for a Logistic Regression model.<\/p>\n\n\n\n<p>Happy Learning!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The ROC curve stands for Receiver Operating Characteristic curve. ROC curves display the performance of a classification model. ROC tells us how good the model is for distinguishing between the given classes, in terms of the predicted probability. In this article, we will understand ROC curves, what is AUC, and implement a binary classification problem [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":11905,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-11687","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/11687","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=11687"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/11687\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/11905"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=11687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=11687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=11687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}