{"id":60761,"date":"2024-03-29T06:39:39","date_gmt":"2024-03-29T06:39:39","guid":{"rendered":"https:\/\/www.askpython.com\/?p=60761"},"modified":"2025-04-10T20:34:47","modified_gmt":"2025-04-10T20:34:47","slug":"non-parametric-statistics-in-python","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/non-parametric-statistics-in-python","title":{"rendered":"Non-Parametric Statistics in Python: Exploring Distributions and Hypothesis Testing"},"content":{"rendered":"\n<p>Non-parametric statistics do not assume any strong assumptions of the distribution, which contrasts with parametric statistics. Non-parametric statistics focus on ranks and signs along with minimal assumptions. <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Non-parametric statistics focus on analyzing data without making strong assumptions about the underlying distribution. Python offers various methods for exploring data distributions, such as histograms, kernel density estimation (KDE), and Q-Q plots. Apart from this, non-parametric hypothesis testing techniques like the Wilcoxon rank-sum test, Kruskal-Wallis test, and chi-square test allow for inferential analysis without relying on parametric assumptions.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>In this article, we have divided non-parametric statistics into two parts &#8211; Methods for Exploring the underlying distribution and Hypothesis Testing and Inference. <\/p>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/python\/examples\/how-to-calculate-power-statistics\">How To Calculate Power Statistics?<\/a><\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Exploring Data Distributions<\/h2>\n\n\n\n<p>Exploration of distribution helps us visualize the data and pin it to a theoretical distribution. It also helps us summarize the stats.<\/p>\n\n\n\n<p>This subheading will teach us about Histograms, Kernel Density Estimation, and Q-Q Plots. We will also implement each of them in Python.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Visualizing Data with Histograms<\/strong><\/h3>\n\n\n\n<p>Histograms are used to visualize the distribution of numerical data. The histogram gives us the range and shows the frequency of the range. They are very similar to Bar charts. Let us understand it further with Python code.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Sample data (replace with your actual data)\ndata = &#x5B;2, 5, 7, 8, 2, 1, 9, 4, 5, 3, 7, 8, 2, 6, 1]\n\n# Create the histogram\nplt.hist(data, bins=10, edgecolor=&#039;black&#039;)  # Adjust &#039;bins&#039; for different bin counts\n\n# Customize the plot (optional)\nplt.xlabel(&#039;Data Values&#039;)\nplt.ylabel(&#039;Frequency&#039;)\nplt.title(&#039;Histogram of Sample Data&#039;)\nplt.grid(True)\n\n# Display the plot\nplt.show()\n\n# Output: This will display the generated histogram.\n\n<\/pre><\/div>\n\n\n<p>Let us look at the output for the above code.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"455\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Histogram-output.png\" alt=\"Histogram Output\" class=\"wp-image-60766\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Histogram-output.png 567w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Histogram-output-300x241.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Histogram Output<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Estimating Probability Density with Kernel Density Estimation<\/strong><\/h3>\n\n\n\n<p>Kernel Density Estimation (KDE) approximates the random variable&#8217;s probability density function (pdf). It provides us with continuous and much smoother visualization of the distribution. Let us look at the Python code for the same.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Sample data (replace with your actual data)\ndata = &#x5B;2, 5, 7, 8, 2, 1, 9, 4, 5, 3, 7, 8, 2, 6, 1]\n\n# Create the KDE plot\nsns.kdeplot(data)\n\n# Customize the plot (optional)\nplt.xlabel(&#039;Data Values&#039;)\nplt.ylabel(&#039;Probability Density&#039;)\nplt.title(&#039;KDE Plot of Sample Data&#039;)\nplt.grid(True)\n\n# Display the plot\nplt.show()\n\n# Output: This will display the generated KDE plot.\n<\/pre><\/div>\n\n\n<p>Let us look at the output of the above code.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"576\" height=\"455\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Kernel-Density-Estimation-Plot.png\" alt=\"Kernel Density Estimation Plot\" class=\"wp-image-60767\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Kernel-Density-Estimation-Plot.png 576w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Kernel-Density-Estimation-Plot-300x237.png 300w\" sizes=\"auto, (max-width: 576px) 100vw, 576px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Kernel Density Estimation Plot<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Comparing Distributions with Q-Q Plots<\/strong><\/h3>\n\n\n\n<p>Q-Q Plots or quantile=quantile plots are used to compare two probability distributions. They help us visualize whether two datasets came from some population or have the same distribution. Let us look at the Python code for the same.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Sample data sets (replace with your actual data)\ndata1 = &#x5B;2, 5, 7, 8, 2, 1, 9, 4, 5, 3, 7, 8, 2, 6, 1]\ndata2 = &#x5B;3, 6, 8, 9, 3, 2, 10, 5, 6, 4, 8, 9, 3, 7, 2]\n\n# Calculate quantiles\nq1 = np.quantile(data1, np.linspace(0, 1, 100))\nq2 = np.quantile(data2, np.linspace(0, 1, 100))\n\n# Create the Q-Q plot\nplt.plot(q1, q2, &#039;o&#039;, markersize=5)\n\n# Reference line for perfect match (optional)\nplt.plot(q1, q1, color=&#039;red&#039;, linestyle=&#039;--&#039;)\n\n# Customize the plot (optional)\nplt.xlabel(&#039;Quantiles of Data Set 1&#039;)\nplt.ylabel(&#039;Quantiles of Data Set 2&#039;)\nplt.title(&#039;Q-Q Plot of Sample Data Sets&#039;)\nplt.grid(True)\n\n# Display the plot\nplt.show()\n\n<\/pre><\/div>\n\n\n<p>Let us look at the output of the plot.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"562\" height=\"455\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Q-Q-Plot-output.png\" alt=\"Q Q Plot Output\" class=\"wp-image-60768\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Q-Q-Plot-output.png 562w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Q-Q-Plot-output-300x243.png 300w\" sizes=\"auto, (max-width: 562px) 100vw, 562px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Q Q Plot Output<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<p>Now let us move on and see what are the methods for Hypothesis Testing and Inference.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Non-Parametric Hypothesis Testing and Inference<\/h2>\n\n\n\n<p>In Hypothesis testing and inference for non-parametric statistics, minimal assumptions about the underlying distribution are made and more focus is on rank-based statistics.<\/p>\n\n\n\n<p>Under this subheading, we will learn about the Wilcoxon rank-sum, Krusal-Wallis, and Chi-square tests. Let us learn all of these with their Python implementation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Comparing Means with Wilcoxon Rank-Sum Test<\/strong><\/h3>\n\n\n\n<p>The Wilcoxon rank sum test, or the Mann-Whitney U test, is a non-parametric statistical test used to compare the means of two independent groups. In the code below, we have two datasets, and we want to conclude if there is any difference between the mean of the datasets. Let us look at the code below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport scipy.stats as stats\nimport matplotlib.pyplot as plt\n\n# Sample data (replace with your actual data)\ndata1 = &#x5B;2, 5, 7, 10, 12]\ndata2 = &#x5B;3, 6, 8, 9, 11, 13]\n\n# Perform Wilcoxon Rank Sum Test\nstatistic, pvalue = stats.ranksums(data1, data2)\n\n# Print test results\nprint(&quot;Test Statistic:&quot;, statistic)\nprint(&quot;p-value:&quot;, pvalue)\n\n# Decide on rejecting the null hypothesis based on significance level (e.g., 0.05)\nif pvalue &lt; 0.05:\n    print(&quot;Reject null hypothesis: There is a significant difference between the distributions.&quot;)\nelse:\n    print(&quot;Fail to reject null hypothesis: Insufficient evidence to conclude a difference.&quot;)\n<\/pre><\/div>\n\n\n<p>Let us look at its output.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"93\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Wilcox-rank-sum-test-1024x93.png\" alt=\"Wilcox Rank Sum Test\" class=\"wp-image-60769\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Wilcox-rank-sum-test-1024x93.png 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Wilcox-rank-sum-test-300x27.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Wilcox-rank-sum-test-768x70.png 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Wilcox-rank-sum-test.png 1067w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Wilcoxon Rank Sum Test<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<p>Since the p-value is greater than 0.05, we can conclude that there is no difference between the mean of the datasets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>One-way ANOVA on ranks\/Krusal Wallis Test<\/strong><\/h3>\n\n\n\n<p>One-way <a href=\"https:\/\/www.askpython.com\/python\/examples\/anova-test-in-python\" data-type=\"post\" data-id=\"12164\">ANOVA<\/a> on ranks or Krusal-Wallis test is a non-parametric test to compare the mean of three or more independent groups. It does not assume normally distributed data. Let us look at the output code where we have assumed three datasets.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport scipy.stats as stats\nimport matplotlib.pyplot as plt\n\n# Sample data (replace with your actual data)\ndata1 = &#x5B;2, 5, 7, 10, 12]\ndata2 = &#x5B;3, 6, 8, 9, 11, 13]\ndata3 = &#x5B;1, 4, 6, 9, 10]\n\n# Perform Kruskal-Wallis test\nstatistic, pvalue = stats.kruskal(*&#x5B;data1, data2, data3])\n\n# Print test results\nprint(&quot;Test Statistic:&quot;, statistic)\nprint(&quot;p-value:&quot;, pvalue)\n\n# Decide on rejecting the null hypothesis based on significance level (e.g., 0.05)\nif pvalue &lt; 0.05:\n    print(&quot;Reject null hypothesis: There is a significant difference between distributions.&quot;)\nelse:\n    print(&quot;Fail to reject null hypothesis: Insufficient evidence to conclude a difference.&quot;)\n<\/pre><\/div>\n\n\n<p>Let us look at the output of the code below.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"91\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Krusal-Wallis-Test-output-1024x91.png\" alt=\"Krusal Wallis Test Output\" class=\"wp-image-60771\" style=\"width:424px;height:auto\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Krusal-Wallis-Test-output-1024x91.png 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Krusal-Wallis-Test-output-300x27.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Krusal-Wallis-Test-output-768x68.png 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Krusal-Wallis-Test-output.png 1057w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Krusal Wallis Test Output<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<p>We fail to reject the null hypothesis since the p-value is greater than 0.05.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Testing Categorical Variables with Chi-Square Test<\/strong><\/h3>\n\n\n\n<p>The <a href=\"https:\/\/www.askpython.com\/python\/examples\/chi-square-test\" data-type=\"post\" data-id=\"12161\">chi-square test <\/a>tests the difference between observed and expected frequencies in one or more categorical variables. It is also used for goodness-of-fit tests or whether they are independent. Let us look at the code below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport scipy.stats as stats\n\n# Sample contingency table (replace with your actual data)\nobserved_data = &#x5B;&#x5B;10, 20],\n                 &#x5B;15, 25]]\n\n# Perform Chi-square test\nchi2_statistic, pvalue, expected_counts, variance = stats.chi2_contingency(observed_data)\n\n# Print test results\nprint(&quot;Chi-square statistic:&quot;, chi2_statistic)\nprint(&quot;p-value:&quot;, pvalue)\n\n# Decide on rejecting the null hypothesis based on significance level (e.g., 0.05)\nif pvalue &lt; 0.05:\n    print(&quot;Reject null hypothesis: There is a significant association between the variables.&quot;)\nelse:\n    print(&quot;Fail to reject null hypothesis: Insufficient evidence to conclude an association.&quot;)\n\n<\/pre><\/div>\n\n\n<p>Let us look at the output of the code below.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"80\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Chi-Square-test-output-1024x80.png\" alt=\"Chi Square Test Output\" class=\"wp-image-60772\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Chi-Square-test-output-1024x80.png 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Chi-Square-test-output-300x23.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Chi-Square-test-output-768x60.png 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2024\/03\/Chi-Square-test-output.png 1105w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><strong><em>Chi-Square Test Output<\/em><\/strong><\/figcaption><\/figure>\n\n\n\n<p>Since the p-value is more than 0.05, we cannot conclude any dependence between datasets.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Here we go! Now you know what non-parametric tests are. In this article, we have learned about the exploration of distribution and hypothesis testing of data without considering any parameters. We also learned about different kinds of tests to compare different datasets.<\/p>\n\n\n\n<p>Hope you enjoyed reading it!!<\/p>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/python\/examples\/chi-square-test\">Chi-square test in Python \u2014 All you need to know!!<\/a><\/em><\/strong><\/p>\n\n\n\n<p><strong><em>Recommended: <a href=\"https:\/\/www.askpython.com\/resources\/python-influence-on-cloud-computing-statistics\">Python\u2019s Influence on Cloud Computing Projects: Revealing the Statistics<\/a><\/em><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Non-parametric statistics do not assume any strong assumptions of the distribution, which contrasts with parametric statistics. Non-parametric statistics focus on ranks and signs along with minimal assumptions. Non-parametric statistics focus on analyzing data without making strong assumptions about the underlying distribution. Python offers various methods for exploring data distributions, such as histograms, kernel density estimation [&hellip;]<\/p>\n","protected":false},"author":80,"featured_media":63921,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-60761","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/60761","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/80"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=60761"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/60761\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/63921"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=60761"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=60761"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=60761"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}