{"id":9423,"date":"2020-10-17T07:03:39","date_gmt":"2020-10-17T07:03:39","guid":{"rendered":"https:\/\/www.askpython.com\/?p=9423"},"modified":"2023-02-16T19:56:59","modified_gmt":"2023-02-16T19:56:59","slug":"density-plots-in-python","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/density-plots-in-python","title":{"rendered":"Density Plots in Python &#8211; A Comprehensive Overview"},"content":{"rendered":"\n<p class=\"has-text-align-left\">A density plot is used to visualize the distribution of a continuous numerical variable in a dataset. It is also known as<em> Kernel Density Plots.<\/em><\/p>\n\n\n\n<p>It\u2019s a good practice to know your data well before starting to apply any machine learning techniques to it.                                                                                                                                          <\/p>\n\n\n\n<p>As a good ML practitioner we should be asking some questions like:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>What does our data look like? <\/li><li>Is it normally distributed or have some different shape?<\/li><li>The algorithms we are intending to apply to our data, does it has any underlying assumptions about the distribution of data?<\/li><\/ul>\n\n\n\n<p>Addressing such questions right after we acquire our data can drastically improve the results in later stages and save us a lot of time. <\/p>\n\n\n\n<p>Plots like <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/plot-graph-for-a-dataframe\" class=\"rank-math-link\">Histograms<\/a> and Density plots serve us the ways to answer the questions mentioned above.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why understand histograms before learning about density plots?<\/h2>\n\n\n\n<p>A density plot is very analogous to a histogram. We visualize the shape of the distribution using a histogram. Histograms can be created by binning the data and keeping the count of the number of observations in each bin. In a histogram, the y-axis usually denotes bin counts, but can also be represented in counts per unit also called as densities.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2000\" height=\"1000\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/A-Histogram-with-less-number-of-bins.jpeg\" alt=\"A Histogram With Less Number Of Bins\" class=\"wp-image-9428\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/A-Histogram-with-less-number-of-bins.jpeg 2000w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/A-Histogram-with-less-number-of-bins-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/A-Histogram-with-less-number-of-bins-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/A-Histogram-with-less-number-of-bins-768x384.jpeg 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/A-Histogram-with-less-number-of-bins-1536x768.jpeg 1536w\" sizes=\"auto, (max-width: 2000px) 100vw, 2000px\" \/><figcaption><strong>A Histogram With Less Number Of Bins<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>If we increase the number of bins in our histogram, the shape of distribution appears to be smoother.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2000\" height=\"1000\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Histogram-having-more-number-of-bins.jpeg\" alt=\"Histogram Having More Number Of Bins\" class=\"wp-image-9430\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Histogram-having-more-number-of-bins.jpeg 2000w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Histogram-having-more-number-of-bins-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Histogram-having-more-number-of-bins-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Histogram-having-more-number-of-bins-768x384.jpeg 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Histogram-having-more-number-of-bins-1536x768.jpeg 1536w\" sizes=\"auto, (max-width: 2000px) 100vw, 2000px\" \/><figcaption><strong>Histogram Having More Number Of Bins<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>Now, imagine a smooth continuous line passing through top of each bin, creating an outline of the shape of our distribution. The result we get is what we call as a Density Plot.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"512\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-with-Histogram-1024x512.jpeg\" alt=\"Density Plot With Histogram\" class=\"wp-image-9432\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-with-Histogram-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-with-Histogram-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-with-Histogram-768x384.jpeg 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-with-Histogram-1536x768.jpeg 1536w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-with-Histogram.jpeg 2000w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption><strong>Density Plot With Histogram<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Understanding The Density Plot<\/strong><\/h2>\n\n\n\n<p>We can think of density plots as plots of smoothened histograms, which is quite intuitive by now.\u00a0 Density plots mostly use a\u00a0<em><a href=\"https:\/\/en.wikipedia.org\/wiki\/Kernel_density_estimation\" class=\"rank-math-link\" target=\"_blank\" rel=\"noopener\">k<\/a><\/em><a aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Kernel_density_estimation\" target=\"_blank\" class=\"rank-math-link\"><em>ernel density estimate<\/em><\/a>. Kernel density estimate allows smoother distributions by smoothing out the noise. <\/p>\n\n\n\n<p>The density plots are not affected by the number of bins which is a major parameter when histograms are to be considered, hence allows us to better visualize the distribution of our data. <\/p>\n\n\n\n<p>So in summary it is just like a histogram but having a smooth curve drawn through the top of each bin.<\/p>\n\n\n\n<p>Several shapes of distributions exist out there in the wild. Some of the most common shapes that we would very likely to encounter are:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"791\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Some-shapes-of-Distributions-1024x791.jpeg\" alt=\"Some Shapes Of Distributions\" class=\"wp-image-9436\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Some-shapes-of-Distributions-1024x791.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Some-shapes-of-Distributions-300x232.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Some-shapes-of-Distributions-768x594.jpeg 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Some-shapes-of-Distributions-1536x1187.jpeg 1536w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Some-shapes-of-Distributions-2048x1583.jpeg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption><strong>Some Shapes Of Distributions<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Density Plots with Python<\/h2>\n\n\n\n<p>We can plot a density plot in many ways using python. Let\u2019s look at a few commonly used methods.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. <strong>Using Python scipy.stats module<\/strong><\/h3>\n\n\n\n<p><code>scipy.stats<\/code> module provides us with <code>gaussian_kde<\/code>  class to find out density for a given data.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom scipy.stats import gaussian_kde\n\ndata = np.random.normal(10,3,100) # Generate Data\ndensity = gaussian_kde(data)\n\nx_vals = np.linspace(0,20,200) # Specifying the limits of our data\ndensity.covariance_factor = lambda : .5 #Smoothing parameter\n\ndensity._compute_covariance()\nplt.plot(x_vals,density(x_vals))\nplt.show()\n\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"400\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-scipy.jpeg\" alt=\"Density Plot Using Scipy\" class=\"wp-image-9442\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-scipy.jpeg 600w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-scipy-300x200.jpeg 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><figcaption><strong>Density Plot Using Scipy<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>We change the function <code>covariance_factor<\/code> of the <code>gaussian_kde<\/code> class and pass on different values to get a smoother plot. Remember to call <code>_compute_covariance<\/code> after changing the function. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. <strong>Using Seaborn <code>kdeplot<\/code> module<\/strong><\/h3>\n\n\n\n<p>Seaborn module provides us with an easier way to execute the above task with much more flexibility.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport seaborn as sb\nimport matplotlib.pyplot as plt\n\ndata = np.random.normal(10,3,300) #Generating data.\nplt.figure(figsize = (5,5))\nsb.kdeplot(data , bw = 0.5 , fill = True)\nplt.show()\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"400\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-kdeplot.jpeg\" alt=\"Density Plot Using Kdeplot\" class=\"wp-image-9445\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-kdeplot.jpeg 600w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-kdeplot-300x200.jpeg 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><figcaption><strong>Density Plot Using Kdeplot<\/strong><\/figcaption><\/figure>\n\n\n\n<p>Seaborn <code>kdeplot<\/code> requires a univariate data array or a pandas series object as an input argument to it. The <code>bw<\/code> argument is equivalent to <code>covariance_factor<\/code> of the <code>gaussian_kde<\/code>  class demonstrated above. we can pass on <code>fill<\/code> = <code>False<\/code> to not fill the area under the curve with color and will simply plot a curve.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. <strong>Using pandas plot function<\/strong><\/h3>\n\n\n\n<p>Pandas <code>plot<\/code> method can also be used to plot density plots by providing <code>kind = 'density'<\/code> as an input argument to it.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nx_values = np.random.random(10,3,300) #Generating Data\ndf = pd.DataFrame(x_values, columns = &#x5B;&#039;var_name&#039;] ) #Converting array to pandas DataFrame\ndf.plot(kind = &#039;density)\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"400\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-Pandas.jpeg\" alt=\"Density Plot Using Pandas\" class=\"wp-image-9451\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-Pandas.jpeg 600w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/density-plot-using-Pandas-300x200.jpeg 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><figcaption><strong>Density Plot Using Pandas<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. Using Seaborn <code>distplot<\/code><\/strong><\/h3>\n\n\n\n<p>We can also use the seaborn <code>distplot<\/code> method to visualize the distribution of continuous numerical data. <code>seaborn.distplot( )<\/code> method requires a univariate data variable as an input parameter which can be a pandas Series, 1d-array, or a list.<\/p>\n\n\n\n<p>Some important arguments we can pass to <code>seaborn.distplot( )<\/code> to tweak the plot according to our needs are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong><code>hist<\/code><\/strong> : (<em>Type &#8211; Bool<\/em>)  whether to plot a histogram or not.<\/li><li><strong><code>kde<\/code> :  <\/strong>(<em>Type &#8211; Bool)<\/em><strong><em> <\/em><\/strong> whether to plot a gaussian kernel density estimate.<\/li><li><code><strong>bins<\/strong><\/code> : (<em>Type &#8211; Number<\/em>) specifying the number of bins in the histogram.<\/li><li><strong><code>hist_kws<\/code> :<\/strong><em> (Type &#8211; Dict)<\/em><strong> <\/strong>dict of Keyword arguments for\u00a0<a aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\" href=\"https:\/\/matplotlib.org\/api\/_as_gen\/matplotlib.axes.Axes.hist.html#matplotlib.axes.Axes.hist\" target=\"_blank\">matplotlib.axes.Axes.hist()<\/a> <\/li><li><strong><code>kde_kws<\/code> : <\/strong><em>(Type &#8211; Dict)<\/em> Keyword arguments for\u00a0<a aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"rank-math-link\" href=\"https:\/\/seaborn.pydata.org\/generated\/seaborn.kdeplot.html#seaborn.kdeplot\" target=\"_blank\">kdeplot()<\/a> passed as a dictionary.<\/li><\/ul>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sb\n\ndata = np.random.normal(10, 3, 1000) #Generating data randomly from a normal distribution.\n\nsb.set_style(&quot;whitegrid&quot;)  # Setting style(Optional)\nplt.figure(figsize = (10,5)) #Specify the size of figure we want(Optional)\nsb.distplot(x = data  ,  bins = 10 , kde = True , color = &#039;teal&#039;\\\n             , kde_kws=dict(linewidth = 4 , color = &#039;black&#039;))\nplt.show()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"512\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-using-distplot-1-1024x512.jpeg\" alt=\"Density Plot Using Distplot 1\" class=\"wp-image-9461\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-using-distplot-1-1024x512.jpeg 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-using-distplot-1-300x150.jpeg 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-using-distplot-1-768x384.jpeg 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-using-distplot-1-1536x768.jpeg 1536w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Density-plot-using-distplot-1.jpeg 2000w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption><strong>Density Plot Using Seaborn <code>distplot<\/code><\/strong>  <\/figcaption><\/figure><\/div>\n\n\n\n<p>To know more about seaborn <code>distplot<\/code> you can refer to this article on seaborn Distplots. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>That brings us to the end of the article! We hope that you&#8217;ve learned a lot about different density plots today. You can read these articles to learn more about the <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-pandas-module-tutorial\" class=\"rank-math-link\">Pandas<\/a> and <a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/python-matplotlib\" class=\"rank-math-link\">Matplotlib<\/a> libraries that we&#8217;ve used in this article.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A density plot is used to visualize the distribution of a continuous numerical variable in a dataset. It is also known as Kernel Density Plots. It\u2019s a good practice to know your data well before starting to apply any machine learning techniques to it. As a good ML practitioner we should be asking some questions [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":9496,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-9423","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=9423"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9423\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/9496"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=9423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=9423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=9423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}