{"id":60964,"date":"2024-03-29T12:55:43","date_gmt":"2024-03-29T12:55:43","guid":{"rendered":"https:\/\/www.askpython.com\/?p=60964"},"modified":"2025-04-10T20:34:19","modified_gmt":"2025-04-10T20:34:19","slug":"empirical-distribution-in-python","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/empirical-distribution-in-python","title":{"rendered":"Empirical Distribution in Python: Histograms, CDFs, and PMFs"},"content":{"rendered":"\n<p>Empirical distribution in Python describes the distribution of data from what is observed rather than having an underlying assumption. It represents the frequency or proportion of observations falling into a particular range by using histograms, cumulative distribution functions (CDFs), or probability mass functions (PMFs). <\/p>\n\n\n\n<p>It is a type of deductive distribution technique that makes direct conclusions about distributions from the observed data. This type of distribution is especially useful when the underlying distribution structure is not known or complex to fit into any standard hypothesis. <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Empirical distribution in Python describes the distribution of data based on observations without relying on underlying assumptions. It represents the frequency or proportion of observations using histograms, cumulative distribution functions (CDFs), or probability mass functions (PMFs). Empirical distribution is data-driven, flexible, and non-parametric, making it valuable for exploratory data analysis and decision-making in various fields.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>In this article, we will look at what empirical distribution is and how we can implement it in python so that you can use it in your exploratory data analysis projects.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Key Characteristics of Empirical Distribution<\/h2>\n\n\n\n<p>There are numerous key characteristics of empirical distribution. Some of them are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data-Driven <\/strong>: Empirical distributions are always data-driven and hence are unbiased representors of the dataset in question.. They are easier and more accurate when visualized. <\/li>\n\n\n\n<li><strong>Flexible<\/strong>: They are flexible since they can be represented through histograms, cumulative distribution functions or through probability mass functions. <\/li>\n\n\n\n<li><strong>Non-parametric<\/strong>: Since these distributions are primarily dependent on observed data, they do not take into consideration predefined parameters and hence they are flexible, making them suitable for data analysis. <\/li>\n<\/ul>\n\n\n\n<p><em>Suggested: <a href=\"https:\/\/www.askpython.com\/python\/examples\/applied-predictive-modeling-python\">Applied Predictive Modeling in Python<\/a>.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implementing Empirical Distribution in Python<\/h2>\n\n\n\n<p>In this section, we will explore empirical distribution in Python in three different ways, namely, histograms, cumulative distribution functions(CDF), and probability mass functions(PMF). <\/p>\n\n\n\n<p>For histogram and CDF, we are going to generate random continuous data and for PMF we are going to generate random discrete data. You can use any dataset of your choice for this part. We will be using<a href=\"https:\/\/numpy.org\/\" data-type=\"link\" data-id=\"https:\/\/numpy.org\/\" target=\"_blank\" rel=\"noopener\"> numpy<\/a> and <a href=\"https:\/\/matplotlib.org\/\" data-type=\"link\" data-id=\"https:\/\/matplotlib.org\/\" target=\"_blank\" rel=\"noopener\">matpoltlib<\/a> as our main libraries in this section.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Histogram<\/strong>: A histogram is a graphical representation of the frequency of data points in a given interval. Below is the code for plotting a histogram with synthetic data generated by us.<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#importing required modules\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Generating sample data\ndata = np.random.normal(loc=0, scale=1, size=1000)\n\n# Plotting histogram\nplt.hist(data, bins=30, density=True, alpha=0.7, color=&#039;blue&#039;)\nplt.title(&#039;Histogram of Sample Data&#039;)\nplt.xlabel(&#039;Value&#039;)\nplt.ylabel(&#039;Frequency&#039;)\nplt.show()\n<\/pre><\/div>\n\n\n<p>The output of the above code is:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"455\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Histogram.png\" alt=\"Histogram\" class=\"wp-image-60983\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Histogram.png 567w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Histogram-300x241.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><figcaption class=\"wp-element-caption\">Histogram<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cumulative distribution function (CDF)<\/strong>: CDF represents the cumulative probability of the given data, showing the probability that a random variable is less than or equal to a given value. <\/li>\n<\/ul>\n\n\n\n<p>The code of CDF is given below:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n#importing required modules\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Generating sample data\ndata = np.random.normal(loc=0, scale=1, size=1000)\n\n# Plotting empirical CDF\nplt.hist(data, bins=30, density=True, cumulative=True, alpha=0.7, color=&#039;green&#039;)\nplt.title(&#039;Empirical CDF of Sample Data&#039;)\nplt.xlabel(&#039;Value&#039;)\nplt.ylabel(&#039;Cumulative Probability&#039;)\nplt.show()\n<\/pre><\/div>\n\n\n<p>The output of the above code is:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"455\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Cumulative-Distribution-Function.png\" alt=\"Cumulative Distribution Function\" class=\"wp-image-60985\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Cumulative-Distribution-Function.png 567w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Cumulative-Distribution-Function-300x241.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><figcaption class=\"wp-element-caption\">Cumulative Distribution Function<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Probability Mass Function: <\/strong>For discrete data, the probability mass function is used to visualize the probability of a random variable takes on a specific value. <\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"576\" height=\"455\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Probability-mass-function.png\" alt=\"Probability Mass Function\" class=\"wp-image-60989\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Probability-mass-function.png 576w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Probability-mass-function-300x237.png 300w\" sizes=\"auto, (max-width: 576px) 100vw, 576px\" \/><figcaption class=\"wp-element-caption\">Probability Mass Function<\/figcaption><\/figure>\n\n\n\n<p><em>Recommended : <a href=\"https:\/\/www.askpython.com\/python\/examples\/boxplots\">Boxplots: Everything you need to know<\/a>.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Significance and Application of Empirical Distribution<\/h2>\n\n\n\n<p>The empirical distribution is significant in many ways in the field of statistics. It provides valuable insights into the features of observed data, allowing analysts and data specialists to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>understand the central tendency and variability of the data. <\/li>\n\n\n\n<li>identify outliers or unusual patterns of observed data. <\/li>\n\n\n\n<li>Compare and contrast observed data with theoretical distributions or model predictions. <\/li>\n\n\n\n<li>Make meaningful decisions and draw conclusions from empirical evidence. <\/li>\n<\/ul>\n\n\n\n<p>The empirical distribution is used in a variety of fields such as finance, marketing, environmental studies, and marketing. It is also used for exploratory data analysis and model validation. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>Empirical distribution becomes invaluable for understanding the central tendency, variability, and patterns within observed data. It enables analysts and decision-makers to gain insights, identify outliers, compare data with theoretical distributions, and make informed decisions based on empirical evidence.<\/p>\n\n\n\n<p>The applications of empirical distribution span diverse fields, from finance and marketing to environmental studies and beyond. Its versatility and ability to provide meaningful insights make it a powerful tool in the data scientist&#8217;s arsenal. Are you going to use this technique in your next data science project?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Empirical distribution in Python describes the distribution of data from what is observed rather than having an underlying assumption. It represents the frequency or proportion of observations falling into a particular range by using histograms, cumulative distribution functions (CDFs), or probability mass functions (PMFs). It is a type of deductive distribution technique that makes direct [&hellip;]<\/p>\n","protected":false},"author":49,"featured_media":63913,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-60964","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/60964","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/49"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=60964"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/60964\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/63913"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=60964"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=60964"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=60964"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}