{"id":9144,"date":"2020-10-07T12:29:45","date_gmt":"2020-10-07T12:29:45","guid":{"rendered":"https:\/\/www.askpython.com\/?p=9144"},"modified":"2024-10-31T09:17:07","modified_gmt":"2024-10-31T09:17:07","slug":"calculate-summary-statistics","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/calculate-summary-statistics","title":{"rendered":"How to Calculate Summary Statistics in Python?"},"content":{"rendered":"\n<p>To calculate summary statistics in Python you need to use the<strong> .describe<\/strong>() <strong>method <\/strong>under <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-pandas-module-tutorial\" class=\"rank-math-link\">Pandas<\/a>. The <strong>.describe() method<\/strong> works on both numeric data as well as <a href=\"https:\/\/www.askpython.com\/python\/oops\/python-classes-objects\" class=\"rank-math-link\">object<\/a> data such as strings or timestamps. <\/p>\n\n\n\n<p>The output for the two will contain different fields. For numeric data the result will include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>count<\/li><li><a href=\"https:\/\/www.askpython.com\/python\/examples\/mean-and-standard-deviation-python\" class=\"rank-math-link\">mean<\/a><\/li><li><a href=\"https:\/\/www.askpython.com\/python\/examples\/standard-deviation\" class=\"rank-math-link\">standard deviation<\/a><\/li><li>minimum<\/li><li>maximum<\/li><li>25 percentile<\/li><li>50 percentile<\/li><li>75 percentiles<\/li><\/ul>\n\n\n\n<p>For object data the result will include :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>count<\/li><li>unique<\/li><li>top<\/li><li>freq<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Calculate Summary Statistics in Python Using the describe() method<\/h2>\n\n\n\n<p>In this tutorial, we will see how to use .describe() method with numeric and object data. <\/p>\n\n\n\n<p>We will also see how to analyze a large dataset and timestamp series using .describe method. <\/p>\n\n\n\n<p>Let&#8217;s get started. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Summary Statistics for Numeric data <\/h3>\n\n\n\n<p>Let&#8217;s define a list with numbers from 1 to 6 and try getting summary statistics for the list. <\/p>\n\n\n\n<p>We will start by importing pandas.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\n<\/pre><\/div>\n\n\n<p>Now we can define a series as :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ns = pd.Series(&#x5B;1, 2, 3, 4, 5, 6])\n<\/pre><\/div>\n\n\n<p>To display summary statistics use:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ns.describe()\n<\/pre><\/div>\n\n\n<p>The complete code and output are as follows :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\ns = pd.Series(&#x5B;1, 2, 3, 4, 5, 6])\ns.describe()\n<\/pre><\/div>\n\n\n<p>Output :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncount    6.000000\nmean     3.500000\nstd      1.870829\nmin      1.000000\n25%      2.250000\n50%      3.500000\n75%      4.750000\nmax      6.000000\ndtype: float64\n<\/pre><\/div>\n\n\n<p>Let&#8217;s understand what each of the value means.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>count <\/td><td>Total number of entries <\/td><\/tr><tr><td>mean<\/td><td>Average of all the entries<\/td><\/tr><tr><td>std<\/td><td>standard deviation<\/td><\/tr><tr><td>min<\/td><td>minimum value<\/td><\/tr><tr><td>25%<\/td><td>25 percentile mark<\/td><\/tr><tr><td>50%<\/td><td>50 percentile mark (median)<\/td><\/tr><tr><td>75%<\/td><td>75 percentile mark<\/td><\/tr><tr><td>max<\/td><td>maximum value <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">2. Summary Statistics for Python Object data<\/h3>\n\n\n\n<p>Let&#8217;s define a series as a set of characters and use the .describe method on it to calculate summary statistics. <\/p>\n\n\n\n<p>We can define the series as:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ns = pd.Series(&#x5B;&#039;a&#039;, &#039;a&#039;, &#039;b&#039;, &#039;c&#039;])\n<\/pre><\/div>\n\n\n<p>To get the summary statistics use :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ns.describe()\n<\/pre><\/div>\n\n\n<p>The complete code and output is as follows:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas\ns = pd.Series(&#x5B;&#039;a&#039;, &#039;a&#039;, &#039;b&#039;, &#039;c&#039;])\ns.describe()\n<\/pre><\/div>\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncount     4\nunique    3\ntop       a\nfreq      2\ndtype: object\n<\/pre><\/div>\n\n\n<p>Let&#8217;s understand what each of the following means:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>count<\/td><td>Total number of entries<\/td><\/tr><tr><td>unique<\/td><td>Total number of unique entries<\/td><\/tr><tr><td>top<\/td><td>Most frequent entry<\/td><\/tr><tr><td>freq<\/td><td>Frequency of the most frequent entry<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">3. Summary statistics of a large data set <\/h3>\n\n\n\n<p>You can use pandas to get the summary statistics from a large dataset as well. You just need to import the dataset into a pandas data frame and then use the .describe method. <\/p>\n\n\n\n<p>In this tutorial, we will be using the California Housing dataset as the sample dataset. <\/p>\n\n\n\n<p>Let&#8217;s start by <a href=\"https:\/\/www.askpython.com\/python-modules\/python-csv-module\" class=\"rank-math-link\">importing the CSV dataset<\/a> and then call the .describe method on it. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nhousing = pd.read_csv(&quot;\/content\/sample_data\/california_housing.csv&quot;)\nhousing.describe()\n<\/pre><\/div>\n\n\n<p><strong>Output :<\/strong><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"255\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/describe-1024x255.png\" alt=\"Describe\" class=\"wp-image-9146\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/describe-1024x255.png 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/describe-300x75.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/describe-768x192.png 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/describe.png 1171w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>We can see that the result contains the summary statistics for all the columns in our dataset.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Summary Statistics for timestamp series<\/h3>\n\n\n\n<p>You can use .describe to get summary statistics for a timestamp series as well.  Let&#8217;s start by defining a timestamp series.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport datetime\nimport numpy as np\n s = pd.Series(&#x5B;np.datetime64(&quot;2000-01-01&quot;),np.datetime64(&quot;2010-01-01&quot;),np.datetime64(&quot;2010-01-01&quot;),np.datetime64(&quot;2002-05-08&quot;)])\n<\/pre><\/div>\n\n\n<p>Now you can call .describe on this timestamp series. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n s.describe()\n<\/pre><\/div>\n\n\n<p>The complete code and output are as follows :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport datetime\nimport numpy as np\n s = pd.Series(&#x5B;np.datetime64(&quot;2000-01-01&quot;),np.datetime64(&quot;2010-01-01&quot;),np.datetime64(&quot;2010-01-01&quot;),np.datetime64(&quot;2002-05-08&quot;)])\ns.describe()\n<\/pre><\/div>\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncount                       4\nunique                      3\ntop       2010-01-01 00:00:00\nfreq                        2\nfirst     2000-01-01 00:00:00\nlast      2010-01-01 00:00:00\ndtype: object\n<\/pre><\/div>\n\n\n<p>You can also instruct .describe to treat <strong>dateTime as a numeric<\/strong>. This will display the result in a manner similar to that of numeric data. You can get mean, median, 25 percentile and 75 percentile in DateTime format.<\/p>\n\n\n\n<p>This can be done using :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ns.describe(datetime_is_numeric=True)\n<\/pre><\/div>\n\n\n<p>The output is as follows:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ncount                      4\nmean     2005-08-03 00:00:00\nmin      2000-01-01 00:00:00\n25%      2001-10-05 12:00:00\n50%      2006-03-05 12:00:00\n75%      2010-01-01 00:00:00\nmax      2010-01-01 00:00:00\n<\/pre><\/div>\n\n\n<p>You can see that the result contains mean, median, 25 percentile and 75 percentile in <a href=\"https:\/\/www.askpython.com\/python-modules\/python-datetime-module\" class=\"rank-math-link\">DateTime format<\/a>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>This tutorial was about computing summary statistics in Python. We looked at numeric data, object data, large datasets and timestamp series to calculate summary statistics. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>To calculate summary statistics in Python you need to use the .describe() method under Pandas. The .describe() method works on both numeric data as well as object data such as strings or timestamps. The output for the two will contain different fields. For numeric data the result will include: count mean standard deviation minimum maximum [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":9147,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-9144","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=9144"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9144\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/9147"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=9144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=9144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=9144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}