{"id":17650,"date":"2021-06-02T23:33:54","date_gmt":"2021-06-02T23:33:54","guid":{"rendered":"https:\/\/www.askpython.com\/?p=17650"},"modified":"2021-06-08T04:55:14","modified_gmt":"2021-06-08T04:55:14","slug":"iris-dataset-classification","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/iris-dataset-classification","title":{"rendered":"Iris Dataset Classification with Multiple ML Algorithms"},"content":{"rendered":"\n<p>Hello there! Today we are going to learn about a new dataset &#8211; the iris dataset. The dataset is very interesting and fun as it deals with the various properties of the flowers and then classifies them according to their properties.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Importing Modules<\/h2>\n\n\n\n<p>The first step in any project is to import the basic modules which include <a href=\"https:\/\/www.askpython.com\/python-modules\/numpy\/python-numpy-module\" data-type=\"post\" data-id=\"7694\">numpy<\/a>, <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-pandas-module-tutorial\" data-type=\"post\" data-id=\"2986\">pandas<\/a> and <a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/python-matplotlib\" data-type=\"post\" data-id=\"3182\">matplotlib<\/a>.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">2. Loading and Preparing the Iris Dataset<\/h2>\n\n\n\n<p>To load the data we will download the dataset from Kaggle. You can download the dataset <a class=\"rank-math-link\" href=\"https:\/\/www.kaggle.com\/uciml\/iris\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> but make sure that the file is in the same directory as the code file.<\/p>\n\n\n\n<p>We will also be separating the data and labels from each other by using the slicing operation on the data.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\ndata = pd.read_csv(&#039;Iris.csv&#039;)\ndata_points = data.iloc&#x5B;:, 1:5]\nlabels = data.iloc&#x5B;:, 5]\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">3. Split Data Into Testing and Training Data<\/h2>\n\n\n\n<p>Before training any kind of ML model, we first need to <a href=\"https:\/\/www.askpython.com\/python\/examples\/split-data-training-and-testing-set\" data-type=\"post\" data-id=\"9234\">split data into testing and training data<\/a> using the <code>train_test_split<\/code> function from sklearn.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom sklearn.model_selection import train_test_split\nx_train,x_test,y_train,y_test = train_test_split(data_points,labels,test_size=0.2)\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">4. Normalization\/Standardization of Data<\/h2>\n\n\n\n<p>Before we work on the ML modeling and the data processing, we need to normalize the data for which the code is mentioned below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.model_selection import cross_val_score\nStandard_obj = StandardScaler()\nStandard_obj.fit(x_train)\nx_train_std = Standard_obj.transform(x_train)\nx_test_std = Standard_obj.transform(x_test)\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">5. Applying Classification ML model<\/h2>\n\n\n\n<p>Now that our data is prepared and is ready to go into the various ML models we will be testing and comparing the efficiency of various classification models<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5.1 SVM (Support Vector Machine)<\/h3>\n\n\n\n<p>The first model we are going to test the SVM Classifier. The code for the same is mentioned below. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom sklearn.svm import SVC\nsvm = SVC(kernel=&#039;rbf&#039;, random_state=0, gamma=.10, C=1.0)\nsvm.fit(x_train_std, y_train)\nprint(&#039;Training data accuracy {:.2f}&#039;.format(svm.score(x_train_std, y_train)*100))\nprint(&#039;Testing data accuracy {:.2f}&#039;.format(svm.score(x_test_std, y_test)*100))\n<\/pre><\/div>\n\n\n<p>On successful execution, the classifier gave a training and testing accuracy of about 97% and 93% respectively which is pretty decent. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5.2 KNN (K-Nearest Neighbors)<\/h3>\n\n\n\n<p><a href=\"https:\/\/www.askpython.com\/python\/examples\/knn-in-python\" data-type=\"post\" data-id=\"9333\">KNN algorithm<\/a> is one of the most basic, simple, and beginner-level classifying models in the world of ML. The code to directly execute the same is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom sklearn.neighbors import KNeighborsClassifier\nknn = KNeighborsClassifier(n_neighbors = 7, p = 2, metric=&#039;minkowski&#039;)\nknn.fit(x_train_std,y_train)\nprint(&#039;Training data accuracy {:.2f}&#039;.format(knn.score(x_train_std, y_train)*100))\nprint(&#039;Testing data accuracy {:.2f}&#039;.format(knn.score(x_test_std, y_test)*100))\n<\/pre><\/div>\n\n\n<p>The testing accuracy in this case is just about 80% which is less when compared to other models but its justified as the model is very basic and has several limitations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5.3 Decision Tree<\/h3>\n\n\n\n<p>Next, we will be implementing the Decision Tree Model which is one of the simple yet complex ML model. The code for the same is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom sklearn import tree\ndecision_tree = tree.DecisionTreeClassifier(criterion=&#039;gini&#039;)\ndecision_tree.fit(x_train_std, y_train)\nprint(&#039;Training data accuracy {:.2f}&#039;.format(decision_tree.score(x_train_std, y_train)*100))\nprint(&#039;Testing data accuracy {:.2f}&#039;.format(decision_tree.score(x_test_std, y_test)*100))\n<\/pre><\/div>\n\n\n<p>The testing accuray in this model as well is still around 80%, hence so far SVM gives the best results.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5.4 Random Forest<\/h3>\n\n\n\n<p>Random Forest is a more complex and better decision tree in Machine Learning. The implementation of same is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom sklearn.ensemble import RandomForestClassifier\nrandom_forest = RandomForestClassifier()\nrandom_forest.fit(x_train_std, y_train)\nprint(&#039;Training data accuracy {:.2f}&#039;.format(random_forest.score(x_train_std, y_train)*100))\nprint(&#039;Testing data accuracy {:.2f}&#039;.format(random_forest.score(x_test_std, y_test)*100))\n<\/pre><\/div>\n\n\n<p>The accuracy levels are very good here where the training data is 100% which is awesome! while the testing data accuracy is 90% which is decent as well.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Congratulations! This tutorial mentioned a lot of different algorithms on the same dataset and we obtained different results for each model. Hope you liked it! Keep reading to learn more! <\/p>\n\n\n\n<p>Thank you for reading!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello there! Today we are going to learn about a new dataset &#8211; the iris dataset. The dataset is very interesting and fun as it deals with the various properties of the flowers and then classifies them according to their properties. 1. Importing Modules The first step in any project is to import the basic [&hellip;]<\/p>\n","protected":false},"author":28,"featured_media":17651,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-17650","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/17650","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=17650"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/17650\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/17651"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=17650"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=17650"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=17650"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}