{"id":9815,"date":"2020-10-26T22:19:17","date_gmt":"2020-10-26T22:19:17","guid":{"rendered":"https:\/\/www.askpython.com\/?p=9815"},"modified":"2021-02-12T12:33:22","modified_gmt":"2021-02-12T12:33:22","slug":"normal-distribution","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/normal-distribution","title":{"rendered":"Normal Distribution in Python"},"content":{"rendered":"\n<p>Even if you are not in the field of statistics, you must have come across the term \u201c<strong>Normal Distribution<\/strong>\u201d. <\/p>\n\n\n\n<p>A <a aria-label=\"probability distribution (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Probability_distribution\" target=\"_blank\" class=\"rank-math-link\">probability distribution<\/a> is a statistical function that describes the likelihood of obtaining the possible values that a random variable can take. By this, we mean the range of values that a parameter can take when we randomly pick up values from it. <\/p>\n\n\n\n<p><strong>A probability distribution can be discrete or continuous.<\/strong><\/p>\n\n\n\n<p>Suppose in a city we have heights of adults between the age group of 20-30 years ranging from 4.5 ft. to 7 ft. <\/p>\n\n\n\n<p>If we were asked to pick up 1 adult randomly and asked what his\/her (assuming gender does not affect height) height would be? There\u2019s no way to know what the height will be. But if we have the distribution of heights of adults in the city, we can bet on the most probable outcome.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is Normal Distribution?<\/h2>\n\n\n\n<p>A <strong>Normal Distribution<\/strong> is also known as a <strong>Gaussian distribution<\/strong> or famously <strong>Bell Curve<\/strong>. People use both words interchangeably, but it means the same thing. It is a continuous probability distribution.<\/p>\n\n\n\n<p>The probability density function (pdf) for Normal Distribution:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"703\" height=\"156\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Probability-density-function-of-Normal-Distribution.jpg\" alt=\"Probability Density Function Of Normal Distribution\" class=\"wp-image-9817\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Probability-density-function-of-Normal-Distribution.jpg 703w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Probability-density-function-of-Normal-Distribution-300x67.jpg 300w\" sizes=\"auto, (max-width: 703px) 100vw, 703px\" \/><figcaption><strong>Probability Density Function Of Normal Distribution<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>where, \u03bc = Mean , \u03c3 = Standard deviation , x = input value.<\/p>\n\n\n\n<p><strong>Terminology:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Mean<\/strong> &#8211; The <a href=\"https:\/\/www.askpython.com\/python\/examples\/mean-and-standard-deviation-python\" class=\"rank-math-link\">mean<\/a> is the usual average. The sum of total points divided by the total number of points.<\/li><li><strong>Standard Deviation<\/strong> &#8211; <a aria-label=\"Standard deviation (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Standard_deviation\" target=\"_blank\" class=\"rank-math-link\">Standard deviation<\/a> tells us how &#8220;spread out&#8221; the data is. It is a measure of how far each observed value is from the mean.<\/li><\/ul>\n\n\n\n<p>Looks daunting, isn&#8217;t it?  But it is very simple.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Example Implementation of Normal Distribution<\/h3>\n\n\n\n<p>Let&#8217;s have a look at the code below.  We&#8217;ll use <a href=\"https:\/\/www.askpython.com\/python-modules\/numpy\/python-numpy-module\" class=\"rank-math-link\">numpy<\/a> and <a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/python-matplotlib\" class=\"rank-math-link\">matplotlib<\/a> for this demonstration:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Importing required libraries\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# Creating a series of data of in range of 1-50.\nx = np.linspace(1,50,200)\n\n#Creating a Function.\ndef normal_dist(x , mean , sd):\n    prob_density = (np.pi*sd) * np.exp(-0.5*((x-mean)\/sd)**2)\n    return prob_density\n\n#Calculate mean and Standard deviation.\nmean = np.mean(x)\nsd = np.std(x)\n\n#Apply function to the data.\npdf = normal_dist(x,mean,sd)\n\n#Plotting the Results\nplt.plot(x,pdf , color = &#039;red&#039;)\nplt.xlabel(&#039;Data points&#039;)\nplt.ylabel(&#039;Probability Density&#039;)\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Normal-Curve.jpeg\" alt=\"Normal Curve\" class=\"wp-image-9823\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Normal-Curve.jpeg 1200w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Normal-Curve-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Normal-Curve-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Normal-Curve-768x384.jpeg 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption>Normal Curve<\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2. Properties of Normal Distribution<\/h3>\n\n\n\n<p>The normal distribution density function simply accepts a data point along with a mean value and a standard deviation and throws a value which we call <strong>probability density<\/strong>. <\/p>\n\n\n\n<p>We can alter the shape of the bell curve by changing the mean and standard deviation. <\/p>\n\n\n\n<p>Changing the mean will shift the curve towards that mean value, this means we can change the position of the curve by altering the mean value while the shape of the curve remains intact. <\/p>\n\n\n\n<p>The shape of the curve can be controlled by the value of Standard deviation. A smaller standard deviation will result in a closely bounded curve while a high value will result in a more spread out curve.<\/p>\n\n\n\n<p><strong>Some excellent properties of a normal distribution:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The mean, mode, and median are all equal.<\/li><li>The total area under the curve is equal to 1.<\/li><li>The curve is symmetric around the mean.<\/li><\/ul>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Standard-deviation-around-mean.jpg\" alt=\"Percentage Distribution of Data Around Mean\" class=\"wp-image-9829\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Standard-deviation-around-mean.jpg 1200w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Standard-deviation-around-mean-300x150.jpg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Standard-deviation-around-mean-1024x512.jpg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Standard-deviation-around-mean-768x384.jpg 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption>Percentage Distribution of Data Around Mean<\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>Empirical rule tells us that:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>68% of the data falls within one standard deviation of the mean.<\/li><li>95% of the data falls within two standard deviations of the mean.<\/li><li>99.7% of the data falls within three standard deviations of the mean.<\/li><\/ul>\n\n\n\n<p>It is by far one of the most important distributions in all of the Statistics. The normal distribution is magical because most of the naturally occurring phenomenon follows a normal distribution. For example, blood pressure, IQ scores, heights follow the normal distribution.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Calculating Probabilities with Normal Distribution<\/h2>\n\n\n\n<p>To find the probability of a value occurring within a range in a normal distribution, we just need to find the area under the curve in that range. i.e. we need to integrate the density function. <\/p>\n\n\n\n<p>Since the normal distribution is a continuous distribution, the area under the curve represents the probabilities. <\/p>\n\n\n\n<p>Before getting into details first let&#8217;s just know what a Standard Normal Distribution is.<\/p>\n\n\n\n<p><strong>A standard normal distribution<\/strong> is just similar to a normal distribution with mean = 0 and standard deviation = 1. <\/p>\n\n\n\n<p><code>Z = (x-\u03bc)\/ \u03c3<\/code><\/p>\n\n\n\n<p>The z value above is also known as a<strong> <a aria-label=\"z-score (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Standard_score\" target=\"_blank\" class=\"rank-math-link\">z-score<\/a><\/strong>. A z-score gives you an idea of how far from the mean a data point is. <\/p>\n\n\n\n<p>If we intend to calculate the probabilities manually we will need to lookup our z-value in a <a aria-label=\"z-table (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Standard_normal_table\" target=\"_blank\" class=\"rank-math-link\">z-table<\/a> to see the cumulative percentage value. Python provides us with modules to do this work for us. Let&#8217;s get into it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Creating the Normal Curve<\/h3>\n\n\n\n<p id=\"0ad9\">We&#8217;ll use <code>scipy.norm<\/code> class function to calculate probabilities from the normal distribution.<\/p>\n\n\n\n<p>Suppose we have data of the heights of adults in a town and the data follows a normal distribution, we have a sufficient sample size with mean equals 5.3 and the standard deviation is 1. <\/p>\n\n\n\n<p><strong>This information is sufficient to make a normal curve.<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# import required libraries\nfrom scipy.stats import norm\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sb\n\n# Creating the distribution\ndata = np.arange(1,10,0.01)\npdf = norm.pdf(data , loc = 5.3 , scale = 1 )\n\n#Visualizing the distribution\n\nsb.set_style(&#039;whitegrid&#039;)\nsb.lineplot(data, pdf , color = &#039;black&#039;)\nplt.xlabel(&#039;Heights&#039;)\nplt.ylabel(&#039;Probability Density&#039;)\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Heights-distribution.jpeg\" alt=\"Heights Distribution\" class=\"wp-image-9875\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Heights-distribution.jpeg 1200w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Heights-distribution-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Heights-distribution-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Heights-distribution-768x384.jpeg 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption>Heights Distribution<\/figcaption><\/figure><\/div>\n\n\n\n<p>The <code>norm.pdf( )<\/code> class method requires <code>loc<\/code> and <code>scale<\/code> along with the data as an input argument and gives the probability density value. <code>loc<\/code> is nothing but the mean and the <code>scale<\/code> is the standard deviation of data. the code is similar to what we created in the prior section but much shorter. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Calculating Probability of Specific Data Occurance<\/h3>\n\n\n\n<p>Now, if we were asked to pick one person randomly from this distribution, then what is the probability that the height of the person will be smaller than 4.5 ft. ?<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-as-probability.jpeg\" alt=\"Area Under The Curve As Probability\" class=\"wp-image-9877\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-as-probability.jpeg 1200w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-as-probability-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-as-probability-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-as-probability-768x384.jpeg 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption>Area Under The Curve As Probability<\/figcaption><\/figure><\/div>\n\n\n\n<p>The area under the curve as shown in the figure above will be the probability that the height of the person will be smaller than 4.5 ft if chosen randomly from the distribution. Let&#8217;s see how we can calculate this in python.<\/p>\n\n\n\n<p>The area under the curve is nothing but just the Integration of the density function with limits equals -\u221e to 4.5. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nnorm(loc = 5.3 , scale = 1).cdf(4.5)\n<\/pre><\/div>\n\n\n<pre class=\"wp-block-preformatted\"><strong>0.211855 or 21.185 %<\/strong><\/pre>\n\n\n\n<p>The single line of code above finds the probability that there is a 21.18% chance that if a person is chosen randomly from the normal distribution with a mean of 5.3 and a <a href=\"https:\/\/www.askpython.com\/python\/examples\/mean-and-standard-deviation-python\" class=\"rank-math-link\">standard deviation<\/a> of 1, then the height of the person will be below 4.5 ft.<\/p>\n\n\n\n<p>We initialize the object of class <code>norm<\/code> with mean and standard deviation, then using <code>.cdf( )<\/code> method passing a value up to which we need to find the cumulative probability value. The cumulative distribution function (CDF) calculates the cumulative probability for a given x-value.<\/p>\n\n\n\n<p>Cumulative probability value from -\u221e to \u221e will be equal to 1.<\/p>\n\n\n\n<p>Now, again we were asked to pick one person randomly from this distribution, then what is the probability that the height of the person will be between 6.5 and 4.5 ft. ?<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-4.5-and-6.5-ft.jpeg\" alt=\"Area Under The Curve as a probability calculation\" class=\"wp-image-9881\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-4.5-and-6.5-ft.jpeg 1200w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-4.5-and-6.5-ft-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-4.5-and-6.5-ft-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-4.5-and-6.5-ft-768x384.jpeg 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption>Area Under The Curve Between 4.5 And 6.5 Ft<\/figcaption><\/figure><\/div>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ncdf_upper_limit = norm(loc = 5.3 , scale = 1).cdf(6.5)\ncdf_lower_limit = norm(loc = 5.3 , scale = 1).cdf(4.5)\n\nprob = cdf_upper_limit - cdf_lower_limit\nprint(prob)\n<\/pre><\/div>\n\n\n<pre class=\"wp-block-preformatted\"><strong>0.673074 or 67.30 %<\/strong><\/pre>\n\n\n\n<p>The above code first calculated the cumulative probability value from -\u221e to 6.5 and then the cumulative probability value from -\u221e to 4.5. if we subtract cdf of 4.5 from cdf of 6.5 the result we get is the area under the curve between the limits 6.5 and 4.5.<\/p>\n\n\n\n<p>Now, what if we were asked about the probability that the height of a person chosen randomly will be above 6.5ft?<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1200\" height=\"600\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-6.5-and-infinity.jpeg\" alt=\"Area Under The Curve Between a value And Infinity\" class=\"wp-image-9888\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-6.5-and-infinity.jpeg 1200w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-6.5-and-infinity-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-6.5-and-infinity-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Area-under-the-curve-between-6.5-and-infinity-768x384.jpeg 768w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><figcaption>Area Under The Curve Between 6.5ft and Infinity<\/figcaption><\/figure><\/div>\n\n\n\n<p>It&#8217;s simple, as we know the total area under the curve equals 1, and if we calculate the cumulative probability value from -\u221e to 6.5 and subtract it from 1, the result will be the probability that the height of a person chosen randomly will be above 6.5ft.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ncdf_value = norm(loc = 5.3 , scale = 1).cdf(6.5)\nprob = 1- cdf_value\nprint(prob)\n<\/pre><\/div>\n\n\n<pre class=\"wp-block-preformatted\"><strong>0.115069 or 11.50 %.<\/strong><\/pre>\n\n\n\n<p>That&#8217;s a lot to sink in, but I encourage all to keep practicing this essential concept along with the implementation using python.<\/p>\n\n\n\n<p><strong>The complete code from above implementation:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# import required libraries\nfrom scipy.stats import norm\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sb\n\n# Creating the distribution\ndata = np.arange(1,10,0.01)\npdf = norm.pdf(data , loc = 5.3 , scale = 1 )\n\n#Probability of height to be under 4.5 ft.\nprob_1 = norm(loc = 5.3 , scale = 1).cdf(4.5)\nprint(prob_1)\n\n#probability that the height of the person will be between 6.5 and 4.5 ft.\n\ncdf_upper_limit = norm(loc = 5.3 , scale = 1).cdf(6.5)\ncdf_lower_limit = norm(loc = 5.3 , scale = 1).cdf(4.5)\n\nprob_2 = cdf_upper_limit - cdf_lower_limit\nprint(prob_2)\n\n#probability that the height of a person chosen randomly will be above 6.5ft\n\ncdf_value = norm(loc = 5.3 , scale = 1).cdf(6.5)\nprob_3 = 1- cdf_value\nprint(prob_3)\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>In this article, we got some idea about Normal Distribution, what a normal Curve looks like, and most importantly its implementation in Python. <\/p>\n\n\n\n<p>Happy Learning !<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Even if you are not in the field of statistics, you must have come across the term \u201cNormal Distribution\u201d. A probability distribution is a statistical function that describes the likelihood of obtaining the possible values that a random variable can take. By this, we mean the range of values that a parameter can take when [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":9892,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-9815","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=9815"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9815\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/9892"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=9815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=9815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=9815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}