{"id":60762,"date":"2024-03-30T11:25:35","date_gmt":"2024-03-30T11:25:35","guid":{"rendered":"https:\/\/www.askpython.com\/?p=60762"},"modified":"2025-04-10T20:32:27","modified_gmt":"2025-04-10T20:32:27","slug":"pyjanitor-miscellaneous-functions","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python-modules\/pyjanitor-miscellaneous-functions","title":{"rendered":"10 PyJanitor&#8217;s Miscellaneous Functions for Enhancing Data Cleaning"},"content":{"rendered":"\n<p>In the previous post, we reviewed some of the basic data-cleaning functions available in PyJanitor. This post aims to understand some of the miscellaneous functions offered by the data-cleaning clean API. <\/p>\n\n\n\n<p>For starters, PyJanitor is a data cleaning and processing API inspired by the R package&#8217;s Janitor and built on top of the <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-pandas-module-tutorial\" data-type=\"post\" data-id=\"2986\"><strong>Pandas<\/strong><\/a> library that makes the data cleaning job easy and enjoyable. It also has several miscellaneous functions that can be used for different domains such as finance, engineering, biology, time series analysis, etc.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>PyJanitor, a data cleaning and processing API built on top of the Pandas library, offers a wide range of miscellaneous functions for various domains such as finance, engineering, biology, and time series analysis. These functions include general utilities like counting cumulative unique values, dropping constant or duplicate columns, finding and replacing elements, and introducing noise with jitter. Apart from them, PyJanitor provides math functions for computing empirical cumulative distribution, exponentiation, sigmoid, softmax, and z-score standardization, making it a versatile tool for data cleaning and manipulation tasks.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p><a href=\"https:\/\/www.askpython.com\/python-modules\/pyjanitor\" data-type=\"post\" data-id=\"60190\"><strong><em>You can read the previous post here!<\/em><\/strong><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Introduction to PyJanitor&#8217;s Miscellaneous Functions<\/strong><\/h2>\n\n\n\n<p>Let us discuss the miscellaneous functions offered by PyJanitor.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>General Functions in PyJanitor<\/strong><\/h3>\n\n\n\n<p>In this section, we will talk about a few important miscellaneous functions listed under the <a href=\"https:\/\/pyjanitor-devs.github.io\/pyjanitor\/api\/functions\/\" target=\"_blank\" rel=\"noopener\">functions <\/a>menu of the Pyjanitor documentation.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>1. Count Cumulative Unique<\/strong><\/h4>\n\n\n\n<p>Remember that in lower-grade mathematics, we used to compute the cumulative frequency? The same concept applies here. The count cumulative unique function returns a column containing the cumulative sum of unique values in the specified column. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ncount_cumulative_unique(df, column_name, dest_column_name, case_sensitive=True)\n\n<\/pre><\/div>\n\n\n<p>The parameters passed to this function are the dataframe, the column name for which the filter has to be applied, and the destination column name where you want to store the cumulative values. If the case-sensitive parameter is set to True, the function will treat lower and upper case characters as different(a!=A), resulting in the count being different.  <\/p>\n\n\n\n<p><a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/dynamically-create-dataframe\">Learn how to create a data frame here dynamically!<\/a><\/p>\n\n\n\n<p>Let us see an example.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport janitor\ndf = pd.DataFrame({\n    &quot;letters&quot;: list(&quot;abABcdef&quot;),\n    &quot;numbers&quot;: range(4, 12),\n})\ndf\n<\/pre><\/div>\n\n\n<p>In this code snippet, we create a data frame with two columns  &#8211; letters and numbers. The letters column consists of the values a,b, A, B,c,d,e, and f. The numbers column consists of the numbers in the range 4 to 12. This dataframe is printed in the next line.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"217\" height=\"310\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-12.png\" alt=\"Data Frame 1\" class=\"wp-image-60855\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-12.png 217w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-12-210x300.png 210w\" sizes=\"auto, (max-width: 217px) 100vw, 217px\" \/><figcaption class=\"wp-element-caption\">Data Frame 1<\/figcaption><\/figure>\n\n\n\n<p>Now, we attempt to count the unique values in the letters column. First, let us see what will happen if the case-sensitive parameter is set to True.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.count_cumulative_unique(\n    column_name=&quot;letters&quot;,\n    dest_column_name=&quot;letters_count&quot;,\n    case_sensitive = True,\n)\n<\/pre><\/div>\n\n\n<p>The column name in which we want to count the unique values is the letters column. The column in which the results are displayed is the letters_count column, and the case-sensitive parameter is set to True. <\/p>\n\n\n\n<p>If you notice the output, we have encountered the letter <code>a<\/code> in the first row. Hence the unique count becomes 1. Next, we have which is unique so the count is increased to 2. And then, we have the upper case A. Since the case-sensitive parameter is set to True, a is not equal to A. So A is treated as a unique character. The count is increased from 2 to 3. Similarly, B is also treated as unique and the count is increased.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"329\" height=\"320\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-13.png\" alt=\"Case matters\" class=\"wp-image-60858\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-13.png 329w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-13-300x292.png 300w\" sizes=\"auto, (max-width: 329px) 100vw, 329px\" \/><figcaption class=\"wp-element-caption\">Case matters<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.count_cumulative_unique(\n    column_name=&quot;letters&quot;,\n    dest_column_name=&quot;letters_count&quot;,\n    case_sensitive = False,\n)\n<\/pre><\/div>\n\n\n<p>Here, when the case-sensitive parameter is set to False, the upper and lower case alphabets are treated the same(a==A). <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"341\" height=\"311\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-14.png\" alt=\"Case doesn't matter\" class=\"wp-image-60860\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-14.png 341w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-14-300x274.png 300w\" sizes=\"auto, (max-width: 341px) 100vw, 341px\" \/><figcaption class=\"wp-element-caption\">Case doesn&#8217;t matter<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>2. Drop Constant Columns<\/strong><\/h4>\n\n\n\n<p>This function is used to drop or remove all the columns that have constant(same) values.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndrop_constant_columns(df)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport janitor \ndata = {&#039;A&#039;:&#x5B;3,3,3,],\n        &#039;B&#039;:&#x5B;3,2,1],\n        &#039;C&#039;:&#x5B;3,1,2],\n        &#039;D&#039;:&#x5B;&quot;Noodles&quot;,&quot;China&quot;,&quot;Japan&quot;],\n        &#039;E&#039;:&#x5B;&quot;Pao&quot;,&quot;China&quot;,&quot;Kimchi&quot;],\n        &#039;F&#039;:&#x5B;&quot;Japan&quot;,&quot;China&quot;,&quot;Korea&quot;]}\ndf = pd.DataFrame(data)\ndf\n<\/pre><\/div>\n\n\n<p>We have created a dictionary with a bunch of numbers, countries, and food items. This dictionary called data is converted into a data frame called df.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"327\" height=\"150\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-15.png\" alt=\"Data Frame 2\" class=\"wp-image-60864\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-15.png 327w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-15-300x138.png 300w\" sizes=\"auto, (max-width: 327px) 100vw, 327px\" \/><figcaption class=\"wp-element-caption\">Data Frame 2<\/figcaption><\/figure>\n\n\n\n<p>Now we use the drop constants function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.drop_constant_columns()\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"271\" height=\"141\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-16.png\" alt=\"New data frame\" class=\"wp-image-60865\" style=\"width:271px;height:auto\"\/><figcaption class=\"wp-element-caption\">New data frame<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. Drop Duplicate Columns<\/strong><\/h4>\n\n\n\n<p>This method is useful when there are multiple columns with the same name. In such cases, we can specify the column name and the index of the column such that the repetitive column at that index will be dropped.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ndrop_duplicate_columns(df, column_name, nth_index=0)\n<\/pre><\/div>\n\n\n<p>The example is given below. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport janitor\ndf = pd.DataFrame({\n    &quot;a&quot;: range(2, 5),\n    &quot;b&quot;: range(3, 6),\n    &quot;A&quot;: range(4, 7),\n    &quot;b*&quot;: range(6, 9),\n}).clean_names(remove_special=True)\ndf\n<\/pre><\/div>\n\n\n<p>In this data frame called df, we have four columns &#8211; a,b, A,b*. the clean names function is applied to remove the special character(*) at the end of the last column &#8211; b*.Now we have a duplicate column in the dataframe. <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"140\" height=\"152\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-17.png\" alt=\"Data frame 3\" class=\"wp-image-60876\"\/><figcaption class=\"wp-element-caption\">Data frame 3<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.drop_duplicate_columns(column_name=&quot;b&quot;, nth_index=0)\n<\/pre><\/div>\n\n\n<p>Since the index specified is 0, the first occurrence of the column b will be dropped.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"874\" height=\"216\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-18.png\" alt=\"Data frame with no duplicate columns\" class=\"wp-image-60877\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-18.png 874w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-18-300x74.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-18-768x190.png 768w\" sizes=\"auto, (max-width: 874px) 100vw, 874px\" \/><figcaption class=\"wp-element-caption\">Data frame with no duplicate columns<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>4.Find_Replace<\/strong><\/h4>\n\n\n\n<p>The find_replace function just as its name suggests, is used to find an element in the dataframe and replace it with some other element.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfind_replace(df, match=&#039;exact&#039;, **mappings)\n<\/pre><\/div>\n\n\n<p>By default, the match is exact which means when the element is encountered the same as the element given in the function, it is replaced. We can choose the matching method to be exact, full-value matching, or regular-expression-based fuzzy matching, which allows for replacing the element even if the substring is identical.<\/p>\n\n\n\n<p>Let us see an example. In the following example, the data frame has four popular songs owned by popular singers. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf = pd.DataFrame({\n    &quot;song&quot;: &#x5B;&quot;We don&#039;t talk anymore&quot;,&quot;Euphoria&quot;,&quot;Dangerously&quot;,&quot;As It Was&quot;],\n    &quot;singer&quot;: &#x5B;&quot;C.Puth&quot;,&quot;JK&quot;,&quot;C.Puth&quot;,&quot;Harry Styles&quot;]\n})\ndf\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"310\" height=\"168\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-19.png\" alt=\"Singer Data frame\" class=\"wp-image-60887\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-19.png 310w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-19-300x163.png 300w\" sizes=\"auto, (max-width: 310px) 100vw, 310px\" \/><figcaption class=\"wp-element-caption\">Singer Data frame<\/figcaption><\/figure>\n\n\n\n<p>Now, let us try to replace the names C.Puth with Charlie and JK with Jungkook.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf = find_replace(\n    df,\n    match=&quot;exact&quot;,\n    singer={&quot;C.Puth&quot;:&quot;Charile&quot;,&quot;JK&quot;:&quot;Jungkook&quot;},\n)\ndf\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"322\" height=\"179\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-20.png\" alt=\"Data frame with replaced singer names \" class=\"wp-image-60888\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-20.png 322w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-20-300x167.png 300w\" sizes=\"auto, (max-width: 322px) 100vw, 322px\" \/><figcaption class=\"wp-element-caption\">Data frame with replaced singer names <\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>5. Jitter<\/strong><\/h4>\n\n\n\n<p>Jitter is a function of PyJanitor that can be used to introduce noise to the values of the data frame. If the data frame has NaN values, they are ignored and the jitter value corresponding to this element will also be NaN.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\njitter(df, column_name, dest_column_name, scale, clip=None, random_state=None)\n<\/pre><\/div>\n\n\n<p>We are required to pass the column name we need the jitter for, the destination column name in which the jitter values must be stored, and the scale at which we need the noise. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport pandas as pd\nimport janitor\ndf1 = pd.DataFrame({&quot;a&quot;: &#x5B;3, 4, 5, np.nan],\n                    &quot;b&quot;:&#x5B;1,2,3,4]})\ndf1\n<\/pre><\/div>\n\n\n<p>We are creating a dataframe called df1 that has two columns a and b. Column a has one missing value and we will introduce jitter for column a.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf1.jitter(&quot;a&quot;, dest_column_name=&quot;jit&quot;, scale=2,random_state=0)\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"230\" height=\"171\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-21.png\" alt=\"Jitter\" class=\"wp-image-60912\"\/><figcaption class=\"wp-element-caption\">Jitter<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Math Functions in PyJanitor<\/strong><\/h3>\n\n\n\n<p>Let us discuss some of the math functions available under the <a href=\"https:\/\/pyjanitor-devs.github.io\/pyjanitor\/api\/math\/\" target=\"_blank\" rel=\"noopener\">math<\/a> menu of the documentation.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>1. Ecdf<\/strong><\/h4>\n\n\n\n<p>The ecdf is a function used to obtain the empirical cumulative distribution of values in a series. Given a series as an input, this function generates a sorted array of values in the series and computes a cumulative fraction of data points with values less or equal to the array.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\necdf(s)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport janitor\ns = pd.Series(&#x5B;5,1,3,4,2])\nx,y= janitor.ecdf(s)\nprint(&quot;The sorted array of values:&quot;,x)\nprint(&quot;The values less than equal to x:&quot;,y)\n<\/pre><\/div>\n\n\n<p>In this code, we have defined a series object called s. The series is passed to function ecdf which sorts the values in the series and stores them in an array called x. It generates the cumulative distribution values that are either less than or equal to the values in the array x. <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"577\" height=\"64\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-22.png\" alt=\"ecdf\" class=\"wp-image-60915\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-22.png 577w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-22-300x33.png 300w\" sizes=\"auto, (max-width: 577px) 100vw, 577px\" \/><figcaption class=\"wp-element-caption\">ecdf<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>2. Exponent<\/strong><\/h4>\n\n\n\n<p>The exp(s) takes a series as input and returns the exponential for each value in the series.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nexp(s)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport janitor\ns = pd.Series(&#x5B;1,2,7,6,5])\nexp_values = s.exp()\nprint(exp_values)\n<\/pre><\/div>\n\n\n<p>We have defined a series called s that contains the values 1,2,7,6,5.The exp_value variable stores the result of applying the function exp on the series s. <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"456\" height=\"358\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-23.png\" alt=\"Exponent\" class=\"wp-image-60917\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-23.png 456w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-23-300x236.png 300w\" sizes=\"auto, (max-width: 456px) 100vw, 456px\" \/><figcaption class=\"wp-element-caption\">Exponent<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. Sigmoid<\/strong><\/h4>\n\n\n\n<p>The sigmoid function of pyjanitor is used to compute the sigmoid values for each element in the series. <\/p>\n\n\n\n<p>The sigmoid function is given below:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsigmoid(x) = 1 \/ (1 + exp(-x))\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"423\" height=\"289\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-24.png\" alt=\"Sigmoid \" class=\"wp-image-60919\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-24.png 423w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-24-300x205.png 300w\" sizes=\"auto, (max-width: 423px) 100vw, 423px\" \/><figcaption class=\"wp-element-caption\">Sigmoid <\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>4. Softmax<\/strong><\/h4>\n\n\n\n<p>The softmax function, just as the name suggests is used to compute the softmax values for the elements in a series or a one-dimensional numpy array. <\/p>\n\n\n\n<p>The softmax function can be defined as follows.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsoftmax(x) = exp(x)\/sum(exp(x))\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd \ns = pd.Series(&#x5B;1,-2,5])\ns.softmax()\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"356\" height=\"173\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-25.png\" alt=\"Softmax\" class=\"wp-image-60920\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-25.png 356w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-25-300x146.png 300w\" sizes=\"auto, (max-width: 356px) 100vw, 356px\" \/><figcaption class=\"wp-element-caption\">Softmax<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>5. Z-Score<\/strong><\/h4>\n\n\n\n<p>Z-score is an important parameter in statistics and even in the field of machine learning. Also called the standards score, it is used to describe the relationship of a value to the mean of the group of values. <\/p>\n\n\n\n<p>The z-score function in pyjanitor is used to compute the standard score of each element in a series.<\/p>\n\n\n\n<p>The z-score formula is given below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nz = (s - s.mean()) \/ s.std()\n<\/pre><\/div>\n\n\n<p>Let us see an example.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nimport janitor\ns = pd.Series(&#x5B;0, 1, 3,9,-2])\ns.z_score()\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"442\" height=\"256\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-26.png\" alt=\"Z Score\" class=\"wp-image-60922\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-26.png 442w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/image-26-300x174.png 300w\" sizes=\"auto, (max-width: 442px) 100vw, 442px\" \/><figcaption class=\"wp-element-caption\">Z Score<\/figcaption><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>All of the above discussed functions can be used in many areas like statistics, engineering, machine learning and data visualization which makes them miscellaneous and pretty much useful in data cleaning and visualization process.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Summing It Up <\/strong><\/h2>\n\n\n\n<p>To recapitulate, we have discussed a few functions from the domains &#8211;  general functions and math from the pyjanitor documentation, their syntaxes, and examples. These functions are just a drop in the ocean and the pyjanitor library offers many functions in the domains of finance, engineering, biology, and chemistry. <\/p>\n\n\n\n<p>Can you find out all of the PyJanitor functions from each domain that are being used and not depreciated?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>References<\/strong><\/h2>\n\n\n\n<p><a href=\"https:\/\/pyjanitor-devs.github.io\/pyjanitor\/api\/functions\/#janitor.functions\/\" target=\"_blank\" rel=\"noopener\">Pyjanitor functions<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous post, we reviewed some of the basic data-cleaning functions available in PyJanitor. This post aims to understand some of the miscellaneous functions offered by the data-cleaning clean API. For starters, PyJanitor is a data cleaning and processing API inspired by the R package&#8217;s Janitor and built on top of the Pandas library [&hellip;]<\/p>\n","protected":false},"author":55,"featured_media":63892,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-60762","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python-modules"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/60762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/55"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=60762"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/60762\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/63892"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=60762"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=60762"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=60762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}