{"id":11006,"date":"2020-11-30T14:57:41","date_gmt":"2020-11-30T14:57:41","guid":{"rendered":"https:\/\/www.askpython.com\/?p=11006"},"modified":"2022-08-06T13:15:20","modified_gmt":"2022-08-06T13:15:20","slug":"logistic-regression","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/logistic-regression","title":{"rendered":"Logistic Regression &#8211; Simple Practical Implementation"},"content":{"rendered":"\n<p>Hello, readers! In this article, we will be focusing on the Practical Implementation of <strong>Logistic Regression<\/strong> in Python.<\/p>\n\n\n\n<p>In our series of Machine Learning with Python, we have already understood about various Supervised ML models such as <a aria-label=\"Linear Regression (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/www.askpython.com\/python\/examples\/linear-regression-in-python\" target=\"_blank\" class=\"rank-math-link\">Linear Regression<\/a>, <a aria-label=\"K Nearest Neighbor (opens in a new tab)\" href=\"https:\/\/www.askpython.com\/python\/examples\/knn-in-python\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"rank-math-link\">K Nearest Neighbor<\/a>, etc. Today, we will be focusing on Logistic Regression and will be solving a real-life problem with the same! Excited? Yea! \ud83d\ude42<\/p>\n\n\n\n<p>Let us begin!<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">First, what is Logistic Regression?<\/h2>\n\n\n\n<p>Before beginning with Logistic Regression, let us understand where do we need it.<\/p>\n\n\n\n<p>As we all know, Supervised Machine Learning models work on continuous as well as categorical data values. Out of which, <a href=\"https:\/\/www.askpython.com\/python\/examples\/label-encoding\" class=\"rank-math-link\">categorical data values<\/a> are the data elements that comprise groups and categories. <\/p>\n\n\n\n<p>So, to make out predictions when we have categorical data variable as the dependent variable is when Logistic Regression comes into picture.<\/p>\n\n\n\n<p><strong>Logistic Regression<\/strong> is a Supervised Machine Learning model which works on <strong>binary <\/strong>or <strong>multi categorical data variables<\/strong> as the dependent variables. That is, it is a <strong>Classification algorithm<\/strong> which segregates and classifies the binary or multilabel values separately.<\/p>\n\n\n\n<p>For example, if a problem wants us to predict the outcome as &#8216;Yes&#8217; or &#8216;No&#8217;, it is then the Logistic regression to classify the dependent data variables and figure out the outcome of the data.<\/p>\n\n\n\n<p>Logistic Regression makes us of the <strong>logit function<\/strong> to categorize the training data to fit the outcome for the dependent binary variable. Further, the logit function solely depends upon the <strong>odds value and chances of probability<\/strong> to predict the binary response variable. <\/p>\n\n\n\n<p>Let us now have a look at the implementation of Logistic Regression.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Practical Approach &#8211; Logistic Regression<\/h2>\n\n\n\n<p>In this article, we will be making the use of <strong>Bank Loan Defaulter problem<\/strong> wherein we are expected to predict which customers are loan defaulters or not.<\/p>\n\n\n\n<p>You can find the dataset <strong><a href=\"https:\/\/github.com\/Safa1615\/Dataset--loan\/blob\/main\/bank-loan.csv\" target=\"_blank\" aria-label=\"here (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\">here<\/a><\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">1. Loading the dataset<\/h3>\n\n\n\n<p>At the initial step, we need to load the dataset into the environment using <a href=\"https:\/\/www.askpython.com\/python-modules\/python-csv-module\" target=\"_blank\" aria-label=\"pandas.read_csv() (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\">pandas.read_csv()<\/a> function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport numpy as np\ndata = pd.read_csv(&quot;bank-loan.csv&quot;) # dataset\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">2. Sampling of the dataset<\/h3>\n\n\n\n<p>Having loaded the dataset, let us now split the dataset into training and testing dataset using the <a href=\"https:\/\/www.askpython.com\/python\/examples\/split-data-training-and-testing-set\" target=\"_blank\" aria-label=\"train_test_split() (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\">train_test_split()<\/a> function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom sklearn.model_selection import train_test_split \nX = loan.drop(&#x5B;&#039;default&#039;],axis=1) \nY = loan&#x5B;&#039;default&#039;].astype(str)\nX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)\n<\/pre><\/div>\n\n\n<p>Here, X is the training dataset that contains all the variables except the response\/target value and Y refers to the testing dataset which contains only the response variable.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Defining Error metrics for the model<\/h3>\n\n\n\n<p>Now, before moving towards the model building, let us define the error metrics which would help us analyze the model in a better manner.<\/p>\n\n\n\n<p>Here, we have created a <a aria-label=\"Confusion Matrix (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/www.askpython.com\/python\/examples\/confusion-matrix\" target=\"_blank\" class=\"rank-math-link\">Confusion Matrix<\/a> and have calculated the Precision, Recall, Accuracy, and F1 score as well.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndef err_metric(CM): \n    \n    TN = CM.iloc&#x5B;0,0]\n    FN = CM.iloc&#x5B;1,0]\n    TP = CM.iloc&#x5B;1,1]\n    FP = CM.iloc&#x5B;0,1]\n    precision =(TP)\/(TP+FP)\n    accuracy_model  =(TP+TN)\/(TP+TN+FP+FN)\n    recall_score  =(TP)\/(TP+FN)\n    specificity_value =(TN)\/(TN + FP)\n    \n    False_positive_rate =(FP)\/(FP+TN)\n    False_negative_rate =(FN)\/(FN+TP)\n\n    f1_score =2*(( precision * recall_score)\/( precision + recall_score))\n\n    print(&quot;Precision value of the model: &quot;,precision)\n    print(&quot;Accuracy of the model: &quot;,accuracy_model)\n    print(&quot;Recall value of the model: &quot;,recall_score)\n    print(&quot;Specificity of the model: &quot;,specificity_value)\n    print(&quot;False Positive rate of the model: &quot;,False_positive_rate)\n    print(&quot;False Negative rate of the model: &quot;,False_negative_rate)\n    print(&quot;f1 score of the model: &quot;,f1_score)\n    \n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">4. Apply the model on the dataset<\/h3>\n\n\n\n<p>Now it&#8217;s finally the time to perform model building on the datasets. Have a look at the below code!<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nlogit= LogisticRegression(class_weight=&#039;balanced&#039; , random_state=0).fit(X_train,Y_train)\ntarget = logit.predict(X_test)\nCM_logit = pd.crosstab(Y_test,target)\nerr_metric(CM_logit)\n<\/pre><\/div>\n\n\n<p><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Initially, we have applied the <code>LogisticRegression()<\/code> function on the training dataset.<\/li><li>Further, we have fed the above output to predict the values of the test dataset using <a href=\"https:\/\/www.askpython.com\/python\/examples\/python-predict-function\" target=\"_blank\" aria-label=\"predict() (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\">predict()<\/a> function.<\/li><li>At last, we have created a correlation matrix using <code>crosstab()<\/code> and then called the error metrics customized function (previously created) to judge the outcome.<\/li><\/ul>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nPrecision value of the model:  0.30158730158730157\nAccuracy of the model:  0.6382978723404256\nRecall value of the model:  0.7307692307692307\nSpecificity of the model:  0.6173913043478261\nFalse Positive rate of the model:  0.3826086956521739\nFalse Negative rate of the model:  0.2692307692307692\nf1 score of the model:  0.42696629213483145\n<\/pre><\/div>\n\n\n<p>So, as witnessed above, we have got <strong>63%<\/strong> accuracy by our model.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more such posts related to Python and ML, stay tuned and till then,<\/p>\n\n\n\n<p>Happy Learning!! \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello, readers! In this article, we will be focusing on the Practical Implementation of Logistic Regression in Python. In our series of Machine Learning with Python, we have already understood about various Supervised ML models such as Linear Regression, K Nearest Neighbor, etc. Today, we will be focusing on Logistic Regression and will be solving [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":11016,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-11006","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/11006","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=11006"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/11006\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/11016"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=11006"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=11006"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=11006"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}