{"id":11001,"date":"2020-11-30T12:17:10","date_gmt":"2020-11-30T12:17:10","guid":{"rendered":"https:\/\/www.askpython.com\/?p=11001"},"modified":"2023-02-16T19:56:57","modified_gmt":"2023-02-16T19:56:57","slug":"label-encoding","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/label-encoding","title":{"rendered":"Label Encoding in Python &#8211; A Quick Guide!"},"content":{"rendered":"\n<p>Hello, readers! In this article, we will be focusing on <strong>Label Encoding<\/strong> in Python.<\/p>\n\n\n\n<p>In our last article, we understood the working and implementation of <a aria-label=\"One hot Encoding (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/www.askpython.com\/python\/examples\/one-hot-encoding\" target=\"_blank\" class=\"rank-math-link\">One hot Encoding<\/a> wherein Label Encoding is the initial step of the process. <\/p>\n\n\n\n<p>Today, we&#8217;ll have a look at one of the most fundamental steps in the categorical encoding of data values.<\/p>\n\n\n\n<p>So, without any further delay, let us begin!<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">What is Label Encoding in Python?<\/h2>\n\n\n\n<p>Before diving deep into the concept of Label Encoding, let us understand the impact of the concept of &#8216;Label&#8217; on the dataset.<\/p>\n\n\n\n<p>A <strong>label <\/strong>is actually a number or a string that represents a particular set of entities. Labels helps the model in better understanding of the dataset and enables the model to learn more complex structures.<\/p>\n\n\n\n<p><em>Recommended &#8211; <a href=\"https:\/\/www.askpython.com\/python\/examples\/standardize-data-in-python\" class=\"rank-math-link\">How to standardize datasets for Machine learning?<\/a><\/em><\/p>\n\n\n\n<p><strong>Label Encoder<\/strong> performs the conversion of these labels of categorical data into a numeric format.<\/p>\n\n\n\n<p>For example, if a dataset contains a variable &#8216;Gender&#8217; with labels &#8216;Male&#8217; and &#8216;Female&#8217;, then the label encoder would convert these labels into a number format and the resultant outcome would be [0,1].<\/p>\n\n\n\n<p>Thus, by converting the labels into the integer format, the machine learning model can have a better understanding in terms of operating the dataset.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Label Encoding &#8211; Syntax to know!<\/h2>\n\n\n\n<p>Python <strong>sklearn library<\/strong> provides us with a pre-defined function to carry out Label Encoding on the dataset.<\/p>\n\n\n\n<p><strong>Syntax:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom sklearn import preprocessing  \nobject = preprocessing.LabelEncoder() \n<\/pre><\/div>\n\n\n<p>Here, we create an object of the LabelEncoder class and then utilize the object for applying label encoding on the data.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">1. Label Encoding with sklearn<\/h3>\n\n\n\n<p>Let&#8217;s get right into the process on label encoding. The first step to encoding a dataset is to have a dataset. <\/p>\n\n\n\n<p>So, we&#8217;ll create a simple dataset here. <strong>Example: Creation of a dataset<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd \ndata = {&quot;Gender&quot;:&#x5B;&#039;M&#039;,&#039;F&#039;,&#039;F&#039;,&#039;M&#039;,&#039;F&#039;,&#039;F&#039;,&#039;F&#039;], &quot;NAME&quot;:&#x5B;&#039;John&#039;,&#039;Camili&#039;,&#039;Rheana&#039;,&#039;Joseph&#039;,&#039;Amanti&#039;,&#039;Alexa&#039;,&#039;Siri&#039;]}\nblock = pd.DataFrame(data)\nprint(&quot;Original Data frame:\\n&quot;)\nprint(block)\n<\/pre><\/div>\n\n\n<p>Here, we have created a <a aria-label=\"dictionary (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\" href=\"https:\/\/www.askpython.com\/python\/dictionary\/python-dictionary-dict-tutorial\" target=\"_blank\">dictionary<\/a> &#8216;data&#8217; and then transformed it into a DataFrame using <code>pandas.DataFrame() <\/code>function.<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nOriginal Data frame:\n\n  Gender    NAME\n0      M    John\n1      F  Camili\n2      F  Rheana\n3      M  Joseph\n4      F  Amanti\n5      F   Alexa\n6      F    Siri\n<\/pre><\/div>\n\n\n<p>From the above dataset, it is clear that the variable &#8216;Gender&#8217; has labels as &#8216;M&#8217; and &#8216;F&#8217;.<\/p>\n\n\n\n<p>Further, now let us import the <strong>LabelEncoder<\/strong> class and applying it on the &#8216;Gender&#8217; variable of the dataset.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom sklearn import preprocessing \nlabel = preprocessing.LabelEncoder() \n\nblock&#x5B;&#039;Gender&#039;]= label.fit_transform(block&#x5B;&#039;Gender&#039;]) \nprint(block&#x5B;&#039;Gender&#039;].unique())\n<\/pre><\/div>\n\n\n<p>We have used <code>fit_transform() method<\/code> to apply the functionality of the label encoder pointed by the object to the data variable.<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n&#x5B;1 0]\n<\/pre><\/div>\n\n\n<p>So, you see, the data has been transformed into integer labels of [0,1].<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nprint(block)\n<\/pre><\/div>\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nGender    NAME\n0       1    John\n1       0  Camili\n2       0  Rheana\n3       1  Joseph\n4       0  Amanti\n5       0   Alexa\n6       0    Siri\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Label Encoding using Category codes<\/strong><\/h3>\n\n\n\n<p>Let us first check the data type of the variables of our dataset.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nblock.dtypes\n<\/pre><\/div>\n\n\n<p><strong>Data type<\/strong>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nGender    object\nNAME      object\ndtype: object\n<\/pre><\/div>\n\n\n<p>Now, transform and convert the datatype of the variable &#8216;Gender&#8217; to <strong>category<\/strong> type.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nblock&#x5B;&#039;Gender&#039;] = block&#x5B;&#039;Gender&#039;].astype(&#039;category&#039;)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nblock.dtypes\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nGender    category\nNAME        object\ndtype: object\n<\/pre><\/div>\n\n\n<p>Now, let us transform the labels to integer types using <code>pandas.DataFrame.cat.codes<\/code> function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nblock&#x5B;&#039;Gender&#039;] = block&#x5B;&#039;Gender&#039;].cat.codes\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nprint(block)\n<\/pre><\/div>\n\n\n<p>As seen below, the variable &#8216;Gender&#8217; has been encoded to integer values [0,1].<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nGender    NAME\n0       1    John\n1       0  Camili\n2       0  Rheana\n3       1  Joseph\n4       0  Amanti\n5       0   Alexa\n6       0    Siri\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. <\/p>\n\n\n\n<p>For a deeper understanding of the topic, try implementing the concept of Label Encoder on different dataset and variables. Do let us know your experience in the comment section ! \ud83d\ude42<\/p>\n\n\n\n<p>For more such posts related to Python, Stay tuned and till then, Happy Learning!! \ud83d\ude42<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.LabelEncoder.html\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\">Label Encoder &#8211; Documentation<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Hello, readers! In this article, we will be focusing on Label Encoding in Python. In our last article, we understood the working and implementation of One hot Encoding wherein Label Encoding is the initial step of the process. Today, we&#8217;ll have a look at one of the most fundamental steps in the categorical encoding of [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":11011,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-11001","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/11001","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=11001"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/11001\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/11011"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=11001"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=11001"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=11001"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}