{"id":42298,"date":"2023-03-30T16:24:03","date_gmt":"2023-03-30T16:24:03","guid":{"rendered":"https:\/\/www.askpython.com\/?p=42298"},"modified":"2023-03-30T16:24:04","modified_gmt":"2023-03-30T16:24:04","slug":"joint-probability-distribution","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/joint-probability-distribution","title":{"rendered":"Understanding Joint Probability Distribution with Python"},"content":{"rendered":"\n<p>In this tutorial, we will explore the concept of joint probability and joint probability distribution in mathematics and demonstrate how to implement them in Python. Joint distribution is essential for finding relationships between variables and has numerous applications in data science.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understanding Joint Probability<\/h2>\n\n\n\n<p>Let&#8217;s understand this with an example(Don&#8217;t worry we won&#8217;t roll a die here :))<\/p>\n\n\n\n<p>Suppose we have a bag that contains 10 red balls and 5 green balls. We randomly select two balls from the bag without replacing them. And let&#8217;s say we want to determine the joint probability of selecting a green ball on the first draw and a red ball on the second draw.<\/p>\n\n\n\n<p>To find the joint probability, our thought process should go like this:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The probability of getting a green ball on the first draw is 5\/15 since there are 5 green balls out of 15 total balls.<\/li>\n\n\n\n<li>The probability of selecting a red ball on the second draw, given that a green ball was chosen on the first draw, is 10\/14, since there are now 14 balls remaining in the bag.<\/li>\n<\/ul>\n\n\n\n<p>To find the joint probability, we multiply the probabilities of each event:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Joint probability is given by <strong>P(A and B) =P(A \u2229<\/strong> <strong>B)= P(A) * P(B|A)<\/strong> where A and B are independent events <\/li>\n\n\n\n<li>Note here that <strong>P(B|A)<\/strong> represents <a href=\"https:\/\/en.wikipedia.org\/wiki\/Conditional_probability\" target=\"_blank\" rel=\"noopener\">conditional probability<\/a>.<\/li>\n\n\n\n<li>P(Green on first draw and Red on second draw) = P(Green on first draw) x P(Red on 2nd draw | Green on 1st draw)<\/li>\n\n\n\n<li>P(Green on first draw and Red on second draw) = (5\/15) x (10\/14)<\/li>\n\n\n\n<li>P(Green on first draw and Red on second draw)= 0.1786<\/li>\n<\/ul>\n\n\n\n<p>So the joint probability of selecting a red ball on the first draw and a green ball on the second draw is 0.1786 or approximately 17.86%. This tells us that out of all possible pairs of balls that can be drawn from the bag in this way(Red-Red, Red-Green, Green-Red, and Green-Green), approximately 17.86% of them will consist of a red ball on the first draw and a green ball on the second draw.<\/p>\n\n\n\n<p><em>Now that we understand joint probability, let&#8217;s dive deeper into joint probability distribution.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Exploring Joint Probability Distribution<\/h2>\n\n\n\n<p>A short definition is that a joint probability distribution&nbsp;represents a probability distribution for two or more random variables and looks for a relationship between the two. But let&#8217;s use a very basic example to understand joint probability distribution <\/p>\n\n\n\n<p>Suppose we have a group of 50 young boys and girls and a survey is done on whether some like anime, horror, or both. And the data is represented in the table below. <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>Boys<\/strong><\/td><td><strong>Girls<\/strong><\/td><td><strong>Total <\/strong><\/td><\/tr><tr><td><strong>Anime<\/strong><\/td><td>19<\/td><td>11            <\/td><td>30<\/td><\/tr><tr><td><strong>Horror<\/strong><\/td><td>8<\/td><td>12<\/td><td>20<\/td><\/tr><tr><td><strong>Total<\/strong><\/td><td>27<\/td><td>23<\/td><td>50<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Survey data<\/figcaption><\/figure>\n\n\n\n<p>There are some straightforward questions that can be answered from the data <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is the total number of girls that participated in the survey? (Answer: 23)<\/li>\n\n\n\n<li>What is the total number of boys that like horror? (Answer: 8)<\/li>\n\n\n\n<li>How many people in the sample are girls and like anime? (Answer: 11)<\/li>\n\n\n\n<li>How many like horror?(Answer: 20)<\/li>\n<\/ul>\n\n\n\n<p>Till here things are pretty simple, right? Now with a small change, we will convert this data into a probability distribution table. <\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>Boys<\/strong><\/td><td><strong>Girls<\/strong><\/td><td><strong>Total <\/strong><\/td><\/tr><tr><td><strong>Anime<\/strong><\/td><td>0.38<\/td><td>0.22       <\/td><td>0.6<\/td><\/tr><tr><td><strong>Horror<\/strong><\/td><td>0.16<\/td><td>0.24<\/td><td>0.4<\/td><\/tr><tr><td><strong>Total<\/strong><\/td><td>0.54<\/td><td>0.46<\/td><td>1<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Probability distribution of the survey<\/figcaption><\/figure>\n\n\n\n<p>We have just represented every data as a fraction of 1 by dividing everything by 50(Total number of people), this makes our data a probability distribution. Now if we pick a person from the sample at random and ask what will be the probability of the person being a boy and liking horror? The answer will be 0.16 <strong>which is the joint probability of the event of a person being a boy and a person liking the horror genre<\/strong>. In other words <strong>P(Boy And Horror) = P(Boy \u2229 Horror ) = 0.16   <\/strong><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Joint probability distribution represents the probability distribution of two or more random variables and explores their relationship. It can be visualized in Python using libraries like NumPy, Pandas, and Seaborn to analyze and plot the data.<\/p>\n<\/blockquote>\n\n\n\n<p>Let&#8217;s apply the concepts we&#8217;ve learned to a real-world example and implement joint probability distribution using Python.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implementing Joint Probability Distribution in Python<\/h2>\n\n\n\n<p>Let&#8217;s implement and visualize joint probability distribution using python.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Let&#8217;s start with 2 random variables A and B.<\/li>\n\n\n\n<li>These variables are <a href=\"https:\/\/www.askpython.com\/python\/normal-distribution\">normally distributed <\/a><\/li>\n<\/ul>\n\n\n\n<p>We will start by importing the required modules<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nimport numpy as np\nimport seaborn as sns\nimport pandas as pd\n<\/pre><\/div>\n\n\n<p>Now we will define our normal variables A and B by using the normal() function of the random module of numpy. <code>np.random.normal(size=100)<\/code> will generate an array of 100 random numbers drawn from a standard normal distribution with a mean of 0. The <code>size<\/code> parameter specifies the size of the array to generate.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nA = np.random.normal(size=100)\nB = np.random.normal(size=100)\n<\/pre><\/div>\n\n\n<p>The normal distribution data produced is used to build a pandas DataFrame with two columns, &#8220;A&#8221; and &#8220;B,&#8221; and to populate them. There will be two columns and 100 rows in the final data frame.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\ndf = pd.DataFrame({&#039;A&#039; : A , &#039;B&#039;:B})\n<\/pre><\/div>\n\n\n<p>Using the<code> jointplot()<\/code> function from seaborn, a joint plot is produced in the next line of code.. The x and y parameters specify the columns in the DataFrame to use for the x and y axes, respectively. In this case, &#8216;A&#8217; is used for the x-axis and &#8216;B&#8217; for the y-axis. The data parameter specifies the DataFrame to use as the data source for the plot.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nsns.jointplot(x=&#039;A&#039;, y=&#039;B&#039; ,data=df )\n<\/pre><\/div>\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"636\" height=\"636\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-26-201900.png\" alt=\"Screenshot 2023 03 26 201900\" class=\"wp-image-47759\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-26-201900.png 636w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-26-201900-300x300.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-26-201900-150x150.png 150w\" sizes=\"auto, (max-width: 636px) 100vw, 636px\" \/><figcaption class=\"wp-element-caption\">The joint probability distribution for normal random variables A and B<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>In this tutorial, we explored joint probability and joint probability distribution in mathematics and demonstrated their implementation in Python using libraries like NumPy, <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-pandas-module-tutorial\" data-type=\"post\" data-id=\"2986\">Pandas<\/a>, and <a href=\"https:\/\/www.askpython.com\/python-modules\/python-seaborn-tutorial\" data-type=\"post\" data-id=\"4055\">Seaborn<\/a>. Understanding these concepts is crucial for anyone interested in pursuing a career in machine learning and AI. As you explore further into probability and data science, consider how joint probability distributions can be used to reveal deeper insights into relationships between variables in various datasets.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we will explore the concept of joint probability and joint probability distribution in mathematics and demonstrate how to implement them in Python. Joint distribution is essential for finding relationships between variables and has numerous applications in data science. Understanding Joint Probability Let&#8217;s understand this with an example(Don&#8217;t worry we won&#8217;t roll a [&hellip;]<\/p>\n","protected":false},"author":56,"featured_media":47652,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-42298","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/42298","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/56"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=42298"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/42298\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/47652"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=42298"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=42298"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=42298"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}