{"id":36435,"date":"2022-11-19T08:05:48","date_gmt":"2022-11-19T08:05:48","guid":{"rendered":"https:\/\/www.askpython.com\/?p=36435"},"modified":"2022-11-19T08:33:04","modified_gmt":"2022-11-19T08:33:04","slug":"linear-discriminant-analysis","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/linear-discriminant-analysis","title":{"rendered":"Linear Discriminant Analysis in Python &#8211; A Detailed Guide"},"content":{"rendered":"\n<p>Linear Discriminant Analysis is a Dimensional Reduction Technique to solve multi-classifier problems. It is been also used for most supervised classification problems. It provides a method to find the linear combination between features of objects or classes. We can better understand this well by analyzing the steps involved in this analysis process.<\/p>\n\n\n\n<p><strong><em>Also read: <a href=\"https:\/\/www.askpython.com\/python\/examples\/latent-dirichlet-allocation-lda\" data-type=\"post\" data-id=\"32588\">Latent Dirichlet Allocation (LDA) Algorithm in Python<\/a><\/em><\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Calculating Means of features of different classes.<\/li>\n\n\n\n<li>Calculating <strong>Within the Scatter class<\/strong> and <strong>Between the Scatter class<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p>The Scatter Class Matrix is determined by the formula <img loading=\"lazy\" decoding=\"async\" width=\"183\" height=\"80\" class=\"wp-image-36478\" style=\"width: 150px\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/10\/Capture-10.png\" alt=\"Capture\">Where c is the total no. of classes and<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"233\" height=\"80\" class=\"wp-image-36814\" style=\"width: 150px\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/11\/Capture-5.png\" alt=\"Capture\"> , <img loading=\"lazy\" decoding=\"async\" width=\"124\" height=\"75\" class=\"wp-image-36480\" style=\"width: 124px\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/10\/Capture-12.png\" alt=\"Capture\"> where X<sub>k<\/sub> is the sample and n is the no. of the sample.<\/p>\n\n\n\n<p>Between class scatter matrix is determined by <img loading=\"lazy\" decoding=\"async\" width=\"257\" height=\"80\" class=\"wp-image-36484\" style=\"width: 150px\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/10\/Capture-13.png\" alt=\"Capture\"> ; where <img loading=\"lazy\" decoding=\"async\" width=\"160\" height=\"89\" class=\"wp-image-36485\" style=\"width: 150px\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/10\/Capture-14.png\" alt=\"Capture\">and <img loading=\"lazy\" decoding=\"async\" width=\"189\" height=\"88\" class=\"wp-image-36487\" style=\"width: 150px\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/10\/Capture-15.png\" alt=\"Capture\"><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Calculating Eigen values and Eigen vectors using algorithms.<\/li>\n\n\n\n<li>transforming the eigenvalue and eigen vectr sinto matrix.<\/li>\n\n\n\n<li>once the matrix is formed, it can be used for classification ad dimensionality reduction.<\/li>\n<\/ul>\n\n\n\n<p>Let us take we are having two classes of different features as we know the classification of these two classes using just a single feature is difficult. So we need to maximize the features to make our classification easier. That&#8217;s our goal in this topic as well.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Applications of Linear Discriminant Analysis<\/h2>\n\n\n\n<p><strong>Let us have a look at the applications of linear discriminant analysis<\/strong>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Classification such as classifying emails as spam, important, or anything else.<\/li>\n\n\n\n<li>Face recognition.<\/li>\n\n\n\n<li>Barcode and QR code scanning.<\/li>\n\n\n\n<li>Customer Identification using Artificial Intelligence in shopping platforms.<\/li>\n\n\n\n<li>Decision Making.<\/li>\n\n\n\n<li>Prediction of future prediction.<\/li>\n<\/ul>\n\n\n\n<p>We can much better understand this by creating a model and using the same. We are taking our preloaded dataset in python which is the biochemist dataset. We will classify the marital status based on features in this dataset. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implementing the Linear Discriminant Analysis Algorithm in Python<\/h2>\n\n\n\n<p>To do so, from this dataset, we will fetch some data and load it into our variables as independent and dependent respectively. then we will apply the linear discriminant analysis as well to reduce the dimensionality of those variables and plot the same in the graph. Let&#8217;s follow our code snippet below.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Importing Modules<\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nfrom pydataset import data\nfrom matplotlib import pyplot as plt\nfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysis\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import classification_report\nfrom sklearn import metrics\n<\/pre><\/div>\n\n\n<p>In the above code snippet, we imported our required modules as well. in case it shows any errors while importing the above modules or libraries you can install them manually in your command prompt using your pip installer.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Loading Dataset<\/h3>\n\n\n\n<p>In our code today, We will use a preloaded dataset to work on. We will fetch a dataset and load the same into a data frame df. Have a quick look at the codes below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#loading our preloaded dataset into a dataframe df\ndf = data(&#039;bioChemists&#039;)\n\n#printing our dataframe\ndf.head()\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\">Step 3: Assigning values for Independent and dependent variables respectively<\/h4>\n\n\n\n<p>We will assign some value or data to our independent and dependent variables respectively. Before doing so, We will create columns of some required data and add them all into our data frame for analysis well.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#creating columns for each value in fem and assigning 1 for positve and 0 for negative\ndummy = pd.get_dummies(df&#x5B;&#039;fem&#039;])\n#adding the resultant icolumns to our dataframe using concat() method\ndf = pd.concat(&#x5B;df, dummy], axis = 1)\n\n#repeat the same for the values of mar columns\ndummy = pd.get_dummies(df&#x5B;&#039;mar&#039;])\ndf = pd.concat(&#x5B;df, dummy], axis = 1)\n\n#independent variable\nx = df&#x5B;&#x5B;&#039;Men&#039;, &#039;kid5&#039;, &#039;phd&#039;, &#039;ment&#039;, &#039;art&#039;]]\n\n#dependent variable\ny = df&#x5B;&#039;Married&#039;]\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 4: Splitting<\/h3>\n\n\n\n<p>We will use the <strong><a href=\"https:\/\/www.askpython.com\/python\/examples\/split-data-training-and-testing-set\" data-type=\"post\" data-id=\"9234\">train_test_split()<\/a> <\/strong>method to <strong>split arrays and metrics into data subsets as trains and tests <\/strong>respectively (2-D array into Linear). We used the parameter<strong> <code>random_state=0<\/code> to get the same train and test sets after each execution<\/strong>. <\/p>\n\n\n\n<p>We have passed <strong><code>test_size=0.3<\/code> which means 30% of the data will be in the test set and the rest will be in the train set.<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nx_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 5: Creating Model<\/h3>\n\n\n\n<p>We will create our required<strong> linear_discriminantanalysis model<\/strong> and we will check <strong>its accuracy of it&#8217;s working<\/strong>.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#creating our linear discrimanant analysis model \nclf = LinearDiscriminantAnalysis()\n\n#checking for the model accuracy using score method\nclf.fit(x_train, y_train).score(x_train, y_train)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 6: ROC (Receiver&nbsp;Operating&nbsp;Characteristic)<\/h3>\n\n\n\n<p><strong>A ROC curve (Reciever Operating Characteristics) is a graph showing the performance of a classification model at all threshold levels<\/strong>. <\/p>\n\n\n\n<p>This\u00a0curve\u00a0plots\u00a0two\u00a0parameters:\u00a0<strong>True\u00a0Positive\u00a0Rate.\u00a0False\u00a0Positive\u00a0Rate<\/strong>. <\/p>\n\n\n\n<p>The function below computes the <strong>Reciever Operating Characteristics using the two reduced dimensions<\/strong>. Instead of plotting the linear reduced variables, We will only plot the ROC curve for the same. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfpr, tpr, threshold = metrics.roc_curve(y_test, y_pred)\n\n#calculating area under the curve(AUC)\nauc = metrics.auc(fpr, tpr)\nauc\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">Step 7: Plotting the data using pyplot<\/h3>\n\n\n\n<p>Now We will plot the <strong>Receiver\u00a0Operating\u00a0Characteristic curve<\/strong> for the True positive rate and false positive rate obtained from the reduced dimensions for both dependent and independent variables respectively.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#plotting our ROC curve using above terminologies\nplt.title(&quot;Linear Discriminant Analysis&quot;)\n\nplt.clf()\n\n#plotting for roc curve\nplt.plot(fpr, tpr, color=&quot;navy&quot;, linestyle=&quot;--&quot;, label = &quot;roc_curve = %0.2f&quot;% auc)\nplt.legend(loc = &quot;upper center&quot;)\n\n#assigning the axis values\nplt.plot(&#x5B;0,1.5], &#x5B;0,1.5], ls = &#039;-&#039;, color=&quot;red&quot;)\n\nplt.xlabel(&quot;False Positive Rate&quot;)\nplt.ylabel(&quot;True Positive Rate&quot;)\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"356\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/11\/Capture-4.png\" alt=\"\" class=\"wp-image-36811\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/11\/Capture-4.png 614w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/11\/Capture-4-300x174.png 300w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>Today We covered a sample Linear Discriminant Analysis Model. Hope you guys must have learned it with our code snippet. We must visit again with some exciting topics.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Linear Discriminant Analysis is a Dimensional Reduction Technique to solve multi-classifier problems. It is been also used for most supervised classification problems. It provides a method to find the linear combination between features of objects or classes. We can better understand this well by analyzing the steps involved in this analysis process. Also read: Latent [&hellip;]<\/p>\n","protected":false},"author":47,"featured_media":36817,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-36435","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/36435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/47"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=36435"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/36435\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/36817"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=36435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=36435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=36435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}