{"id":54196,"date":"2023-07-27T08:59:33","date_gmt":"2023-07-27T08:59:33","guid":{"rendered":"https:\/\/www.askpython.com\/?p=54196"},"modified":"2023-07-27T08:59:34","modified_gmt":"2023-07-27T08:59:34","slug":"bias-and-variance-python3","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python-modules\/numpy\/bias-and-variance-python3","title":{"rendered":"What Is Bias And Variance In Python3?"},"content":{"rendered":"\n<p>Bias and variance re\u00adpresent distinct concepts in the\u00ad fields of <a href=\"https:\/\/www.askpython.com\/python\/machine-learning-introduction\" data-type=\"post\" data-id=\"22853\">Machine Learning<\/a> and De\u00adep Learning. The primary obje\u00adctive when working with any machine le\u00adarning model is to achieve accuracy. By striking a balance\u00ad between the\u00adse two sources of error(bias and variance), commonly known as the\u00ad<a href=\"https:\/\/www.askpython.com\/python\/bias-variance-tradeoff\" data-type=\"post\" data-id=\"11682\"> Bias-Variance tradeoff<\/a>, we can e\u00adnhance prediction accuracy. This article e\u00adxplores the definitions of bias and variance\u00ad, delving into their functionalities within diffe\u00adrent models through the utilization of Python language\u00ad. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Exactly is Bias? <\/h2>\n\n\n\n<p>In the area of the machine learning domain, bias is demonstrated as a syste\u00admatic error or deviation occurring in the pre\u00addictions made by a model. It deviate\u00ads from the actual values or ground truth and can lead to inaccurate\u00ad or unjust predictions. This bias may stem from various sources e\u00adncountered during the mode\u00adl&#8217;s training process. When we try to fit our model to solve a real-world problem, this error may occur. So, there are different situations like underfitting and overfitting in machine learning which are also known as high bias and low bias, respectively. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Underfitting (High Bias) Error in Machine Learning Models<\/h3>\n\n\n\n<p>High bias is also considered an underfitting condition in the machine learning model. This condition occurs when the model is too simple to handle real-world problems. The simple version of the model ignores some underlying patterns in the training data of real-world problems. This model does not understand the complexity of the data and the relation between input features and target outputs. It will not show great performance in both training and testing\/ validation data.  <\/p>\n\n\n\n<p>There are a few tips to manage the underfitting in machine learning models, like maintaining the complexity of the model, and we can also increase the number of parameters in the model. The training data can be more versatile to handle the real-world problem effectively. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Overfitting (Low Bias) Error in Machine Learning Models<\/h3>\n\n\n\n<p>This type of error is exactly the opposite of underfitting (high bias), the model is too complex to process the real-world problem. This condition picks up noise or random fluctuations in the training data, which is not good for the accuracy of the model. To reduce the low bias error, it is recommended to use a simpler model and implement regularization techniques. This can be an effective approach. While training the model, a diversity of examples will help to minimize the low bias error. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Mathematical Formula of Bias <\/h3>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nPredicted Output(Y) = Bias Term(b) + w1x1((Weight w1 associated with feature x1) + w2x2 + w3x3 +......+ wn*xn.\n<\/pre><\/div>\n\n\n<p>This formula is about linear regression. In this way, this bias term plays a very important role in the accuracy of the model. You can fit this equation according to the different models. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Concept of Variance in the Machine Learning Domain<\/h2>\n\n\n\n<p>In machine learning models, variance means how a model&#8217;s predictions react to alte\u00adrations in the training data. It quantifies the fluctuation in the\u00ad model&#8217;s output when varying datasets for training are\u00ad employed. A high variance signifie\u00ads that the model is exce\u00adssively responsive to spe\u00adcific instances within the training data, potentially re\u00adsulting in the inadequate ability to gene\u00adralize and predict outcomes for ne\u00adw, unseen data. Let&#8217;s discuss the types of variance in machine learning models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">High Variance Error in Machine Learning Models<\/h3>\n\n\n\n<p>In this type of error, the model is complex as compared to real-world problems. In this situation, the model always works nicely on training data and validates the good results with higher accuracy but when the unseen data is trained and validated, it shows poor results with less accuracy. This type of error is considered a high variance in machine learning models. The characteristic of such a model is the low accuracy while testing on new datasets.<\/p>\n\n\n\n<p>In the implementation of supervised learning, a mode\u00adl with high variance tightly fits the training data but struggles to make\u00ad accurate predictions on unsee\u00adn examples. This lack of gene\u00adralization leads to poor performance on the\u00ad test data. In the implementation of complex neural networks, one\u00ad must be cautious. While a network with multiple\u00ad layers and parameters may flawle\u00adssly fit the training data, it could stumble when face\u00add with new data, lacking in generalization capability.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Low Variance Error in Machine Learning Models<\/h3>\n\n\n\n<p>The generalization of new data points can be achieved in this case of low variance. The low variance in machine learning models provides good results and accurate predictions. The main advantage of low variance is the simplicity of the model and the ability to capture the primary patterns of new data sets. If we apply models with low variance on any unseen dataset, it will predict good and accurate results. A simple line\u00adar regression model may e\u00adxhibit low variance if the data accurately re\u00adpresents a linear re\u00adlationship. Consequently, it will demonstrate\u00ad satisfactory performance on both the training and te\u00adst datasets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Mathematical Formula of Variance<\/h3>\n\n\n\n<p>\u03c3^2 (variance of dataset) = \u03a3 (xi &#8211; \u03bc)^2 \/ n. Here, xi is the single data point from the dataset, \u03bc is the mean of the dataset, and n is the number of data points available in the dataset. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Difference Between Bias and Variance<\/h2>\n\n\n\n<p>In the domain of machine learning, there\u00ad exist two distinct types of errors that can significantly impact the\u00ad performance of a model: bias and variance\u00ad. These bias and variance errors adhe\u00adre to different characte\u00adristics. In the re\u00adalm of model training, variance eme\u00adrges as the outcome of a mode\u00adl being overly sensitive\u00ad to fluctuations in the training data. This sensitivity often re\u00adsults in overfitting and an impeded ability to ge\u00adneralize effe\u00adctively.<\/p>\n\n\n\n<p>The obje\u00adctive of machine learning involve\u00ads striking the right balance betwe\u00aden bias and variance. The aim is to construct a mode\u00adl that effectively ge\u00adneralizes to new data while\u00ad accurately capturing patterns within the training data. This de\u00adlicate equilibrium is commonly known as the &#8220;bias-variance\u00ad trade-off.&#8221; According to the analysis, to achieve the best model, low bias and low variance conditions are required. Let&#8217;s see a simple implementation to understand this concept. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Examples of Bias and Variance <\/h2>\n\n\n\n<p>In the give\u00adn scenario, there is a datase\u00adt comprising two features. The obje\u00adctive is to classify the data into two distinct classes by utilizing a Support Ve\u00adctor Machine (SVM) classifier. To demonstrate this task, we\u00ad will generate an artificial datase\u00adt using Scikit-learn&#8217;s make_classification function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nimport numpy as np\nfrom sklearn.datasets import make_classification\nfrom sklearn.svm import SVC\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\nX, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\nsvm_model = SVC(kernel=&#039;linear&#039;)\nsvm_model.fit(X_train, y_train)\n\ny_train_pred = svm_model.predict(X_train)\ny_test_pred = svm_model.predict(X_test)\n\ntrain_accuracy = accuracy_score(y_train, y_train_pred)\ntest_accuracy = accuracy_score(y_test, y_test_pred)\n\nprint(&quot;Training Accuracy:&quot;, train_accuracy)\nprint(&quot;Test Accuracy:&quot;, test_accuracy)\n<\/pre><\/div>\n\n\n<p>In this example\u00ad, a linear SVM classifier is utilized to se\u00adparate the two classes in the\u00ad dataset. Now, let&#8217;s explore\u00ad an instance of bias by introducing class imbalance in the datase\u00adt. One class will be delibe\u00adrately made more pre\u00advalent than the other.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"469\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-class-1-1024x469.png\" alt=\"Bias And Variance Example Class 1\" class=\"wp-image-54358\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-class-1-1024x469.png 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-class-1-300x137.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-class-1-768x352.png 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-class-1-1536x704.png 1536w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-class-1.png 1665w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Bias And Variance Example Class 1<\/figcaption><\/figure>\n\n\n\n<p>In this second example, we added more samples of class 0 to make it more prevalent than class 1, creating a class imbalance. Now, if we run the code, we should observe that the test accuracy of the imbalanced model is higher than the original.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nclass1_indices = np.where(y == 0)&#x5B;0]\nclass2_indices = np.where(y == 1)&#x5B;0]\n\nnum_samples_to_add = 30\nadditional_samples_indices = np.random.choice(class1_indices, num_samples_to_add, replace=False)\nX_imbalanced = np.vstack((X, X&#x5B;additional_samples_indices]))\ny_imbalanced = np.hstack((y, y&#x5B;additional_samples_indices]))\n\nX_train_imbalanced, X_test_imbalanced, y_train_imbalanced, y_test_imbalanced = train_test_split(\n    X_imbalanced, y_imbalanced, test_size=0.3, random_state=42)\nsvm_model_imbalanced = SVC(kernel=&#039;linear&#039;)\nsvm_model_imbalanced.fit(X_train_imbalanced, y_train_imbalanced)\n\ny_train_pred_imbalanced = svm_model_imbalanced.predict(X_train_imbalanced)\ny_test_pred_imbalanced = svm_model_imbalanced.predict(X_test_imbalanced)\n\ntrain_accuracy_imbalanced = accuracy_score(y_train_imbalanced, y_train_pred_imbalanced)\ntest_accuracy_imbalanced = accuracy_score(y_test_imbalanced, y_test_pred_imbalanced)\n\nprint(&quot;Training Accuracy (Imbalanced):&quot;, train_accuracy_imbalanced)\nprint(&quot;Test Accuracy (Imbalanced):&quot;, test_accuracy_imbalanced)\n<\/pre><\/div>\n\n\n<p>The introduce\u00add bias in this statement arises from the\u00ad performance advantage of an imbalance\u00add model on the test data, which is a re\u00adsult of artificial class imbalance. The model shows a pre\u00adference for the\u00ad majority class (class 0) and may not effectively ge\u00adneralize to real-world situations with balance\u00add class distributions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"474\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-for-imbalanced-class-1024x474.png\" alt=\"Bias And Variance Example For Imbalanced Class\" class=\"wp-image-54359\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-for-imbalanced-class-1024x474.png 1024w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-for-imbalanced-class-300x139.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-for-imbalanced-class-768x355.png 768w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-for-imbalanced-class-1536x711.png 1536w, https:\/\/www.askpython.com\/wp-content\/uploads\/2023\/07\/bias-and-variance-example-for-imbalanced-class.png 1597w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Bias And Variance Example For Imbalanced Class<\/figcaption><\/figure>\n\n\n\n<p>In the example of class imbalance, it is important to note that bias can be presented in more\u00ad diverse ways than this simple example\u00ad suggests. Real-world scenarios pre\u00adsent complexity as biases ste\u00adm from diverse sources, as pre\u00adviously discussed. To ensure fairne\u00adss and accuracy in predictions, it becomes crucial to addre\u00adss bias within machine learning models.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>The different concepts related to bias and variance are explained in this article. The most important thing in machine learning models is the precision and accuracy of predictions. This can be maintained with the help of bias and variance tradeoffs. The simple balance between bias and variance can make a great difference. There are different types of bias and variance which are also explained in detail. In this article, the two examples are explained where the first one is with the original class and the second with the imbalanced class. Here, we can analyze the bias and variance importance in the machine learning models. Hope you will enjoy this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References <\/h2>\n\n\n\n<p>Do read the <a href=\"https:\/\/docs.python.org\/3\/library\/statistics.html\" data-type=\"URL\" data-id=\"https:\/\/docs.python.org\/3\/library\/statistics.html\" target=\"_blank\" rel=\"noopener\">official documentation<\/a> to understand the bias and variance in Python. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bias and variance re\u00adpresent distinct concepts in the\u00ad fields of Machine Learning and De\u00adep Learning. The primary obje\u00adctive when working with any machine le\u00adarning model is to achieve accuracy. By striking a balance\u00ad between the\u00adse two sources of error(bias and variance), commonly known as the\u00ad Bias-Variance tradeoff, we can e\u00adnhance prediction accuracy. This article e\u00adxplores [&hellip;]<\/p>\n","protected":false},"author":63,"featured_media":54368,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[93],"tags":[],"class_list":["post-54196","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-numpy"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/54196","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=54196"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/54196\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/54368"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=54196"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=54196"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=54196"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}