{"id":16300,"date":"2021-05-19T17:16:41","date_gmt":"2021-05-19T17:16:41","guid":{"rendered":"https:\/\/www.askpython.com\/?p=16300"},"modified":"2021-05-19T17:16:44","modified_gmt":"2021-05-19T17:16:44","slug":"catboost-module","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python-modules\/catboost-module","title":{"rendered":"Python catboost module: A Brief Introduction to CatBoost Classifier"},"content":{"rendered":"\n<p>Hello learner! In this tutorial, we will be learning about the catboost module and a little more complex concept known as <code>CatboostClassifier<\/code>. So let&#8217;s begin!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is the catboost module?<\/h2>\n\n\n\n<p>CatBoost module is an open-source library that is fast, scalable, a very high-performance <a href=\"https:\/\/www.askpython.com\/python\/examples\/gradient-boosting-model-in-python\" class=\"rank-math-link\">gradient boosting<\/a> system on decision trees and other Machine Learning tasks. It also offers GPU support to speed up training<\/p>\n\n\n\n<p>Catboost cab be used for a range of regression and classification problems which are available on kaggle as well.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implementing the Catboost Classifier <\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. Importing Modules<\/h3>\n\n\n\n<p>For the simple implementation of the catboost module, we will be importing three modules. The <code>catboost<\/code> module obviously and <a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/python-matplotlib\" class=\"rank-math-link\">matplotlib<\/a> for data visualization along with <code>numpy<\/code> module to generate datasets.<\/p>\n\n\n\n<p>If any of the module import gives an error make sure you install the module using the <code>pip<\/code> command. The code to import the right modules and right function is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nfrom catboost import CatBoostClassifier\nimport matplotlib.pyplot as plt\nimport numpy as np\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\">2. Training and Testing Data Preparation<\/h3>\n\n\n\n<p>The next step is to create <a href=\"https:\/\/www.askpython.com\/python\/examples\/split-data-training-and-testing-set\" class=\"rank-math-link\">testing data for training<\/a> the catboost module and then creating testing data to check for random points.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Training Data<\/h4>\n\n\n\n<p>To create sample training data we need two matrices one for mean and other one for covariance where the mean describes the center of the points and covariance describes the spread of the point.<\/p>\n\n\n\n<p>Later we create a multivariant normal distribution passing the <a href=\"https:\/\/www.askpython.com\/python\/examples\/mean-and-standard-deviation-python\" class=\"rank-math-link\">mean<\/a> and <a href=\"https:\/\/www.askpython.com\/python\/examples\/principal-component-analysis\" class=\"rank-math-link\">covariance matrix<\/a> along with the number of points.<\/p>\n\n\n\n<p>The code to create data for two different classes is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nmean1=&#x5B;8,8]\ncovar1=&#x5B;&#x5B;2,0.7],&#x5B;0.7,1]]\nd2=np.random.multivariate_normal(mean1,covar1,200)\n\nmean2=&#x5B;1,1]\ncovar2=&#x5B;&#x5B;2,0.7],&#x5B;0.7,1]]\nd2=np.random.multivariate_normal(mean2,covar2,200)\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\">Testing Data<\/h4>\n\n\n\n<p>To get training points we will be importing <a href=\"https:\/\/www.askpython.com\/python-modules\/python-random-module-generate-random-numbers-sequences\" class=\"rank-math-link\">random module<\/a> and generate 10 random x and y coordinates to pass to the trained model later on. The next step is to put the x and y coordinates together in a list using the <a href=\"https:\/\/www.askpython.com\/python\/python-for-loop\" class=\"rank-math-link\">for loop<\/a>.<\/p>\n\n\n\n<p>The code for the same is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nimport random\nx_cord_test = &#x5B;random.randint(-2,10) for i in range(5)]\ny_cord_test = &#x5B;random.randint(-2,10) for i in range(5)]\ntest_data = &#x5B;]\nfor i in range(len(x_cord_test)):\n    test_data.append(&#x5B;x_cord_test&#x5B;i],y_cord_test&#x5B;i]])\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\">Data Visualization &#8211; 1<\/h4>\n\n\n\n<p>We would be visualizing the data using the matplotlib library and plot the training data along with the testing points as well.<\/p>\n\n\n\n<p>The code for the same is shown below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nplt.style.use(&#039;seaborn&#039;)\nplt.scatter(d1&#x5B;:,0],d1&#x5B;:,1],color=&quot;Red&quot;,s=20)\nplt.scatter(d2&#x5B;:,0],d2&#x5B;:,1],color=&quot;Blue&quot;,s=20)\nfor i in test_data:\n    plt.scatter(i&#x5B;0],i&#x5B;1],marker=&quot;*&quot;,s=200,color=&quot;black&quot;)\nplt.show()\n<\/pre><\/div>\n\n\n<p>The resulting graph is shown below.<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" height=\"306\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2021\/05\/binary_data_plot_catboost.png\" alt=\"Binary Data Plot Catboost\" class=\"wp-image-16306\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2021\/05\/binary_data_plot_catboost.png 450w, https:\/\/www.askpython.com\/wp-content\/uploads\/2021\/05\/binary_data_plot_catboost-300x204.png 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><figcaption>Binary Data Plot Catboost<\/figcaption><\/figure><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Final training data for the model preparation<\/h4>\n\n\n\n<p>The final step would be to create the final training data by combining the data for two classes together into a single data frame.<\/p>\n\n\n\n<p>The no of rows in the resulting data would be equal to sum of no of data points in both the classes. The number of columns will be equal to 3 where the columns store the x and y coordinates and label of the point.<\/p>\n\n\n\n<p>We created a dummy dataframes with all values as 0. Then we put the data for two classes along with the label into the correct position in the dataframe. The last step involves shuffling of the data.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\ndf_rows=d1.shape&#x5B;0]+d2.shape&#x5B;0]\ndf_columns=d1.shape&#x5B;1]+1\n\ndf=np.zeros((df_rows,df_columns))\n\ndf&#x5B;0:d1.shape&#x5B;0],0:2]=d1\ndf&#x5B;d1.shape&#x5B;0]:,0:2]=d2\ndf&#x5B;0:d1.shape&#x5B;0],2]=0\ndf&#x5B;d1.shape&#x5B;0]:,2]=1\n\nnp.random.shuffle(df)\n<\/pre><\/div>\n\n\n<h4 class=\"wp-block-heading\">Data Visualization &#8211; 2<\/h4>\n\n\n\n<p>Now let&#8217;s visualize our final data using the code below.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nplt.scatter(df&#x5B;:,0],df&#x5B;:,1],color=&quot;Green&quot;)\nfor i in test_data:\n    plt.scatter(i&#x5B;0],i&#x5B;1],marker=&quot;*&quot;,s=200,color=&quot;black&quot;)\nplt.show()\n<\/pre><\/div>\n\n\n<p>The final graph is shown below. Now data is ready to go into the <code>CatBoostClassifier<\/code>.<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"301\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2021\/05\/train_test_data_plot_catboost.png\" alt=\"Train Test Data Plot Catboost\" class=\"wp-image-16310\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2021\/05\/train_test_data_plot_catboost.png 451w, https:\/\/www.askpython.com\/wp-content\/uploads\/2021\/05\/train_test_data_plot_catboost-300x200.png 300w\" sizes=\"auto, (max-width: 451px) 100vw, 451px\" \/><figcaption>Train Test Data Plot Catboost<\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3. Using the catboost module &#8211; CatBoostClassifier<\/h3>\n\n\n\n<p>To implement the CatBoostClassifier we create our model object for the same which takes the no of iterations as a parameter. We will also be using <code>GPU<\/code> for the model so we pass the <code>tak_type<\/code> as a parameter.<\/p>\n\n\n\n<p>The next step is fitting the training data points and labels to train the model using the <code>fit<\/code> function. We will also pass each testing point into the <code>predict<\/code> function and get the results.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; gutter: true; title: ; notranslate\" title=\"\">\nmodel = CatBoostClassifier(iterations=100,task_type=&quot;GPU&quot;)\nmodel.fit(df&#x5B;:,0:2],df&#x5B;:,2],verbose=False)\n<\/pre><\/div>\n\n\n<p>The results are as follows. You can cross check from the graph that the results are pretty accurate.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n(6,3) ==&gt; 0.0\n(10,4) ==&gt; 0.0\n(6,-2) ==&gt; 0.0\n(1,7) ==&gt; 1.0\n(3,0) ==&gt; 1.0\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Congratulations! Today you successfully learned about a fast and amazing Classifier known as CatBoost. You can try out the same on various datasets of your own! Happy Coding!<\/p>\n\n\n\n<p>Thank you for reading!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello learner! In this tutorial, we will be learning about the catboost module and a little more complex concept known as CatboostClassifier. So let&#8217;s begin! What is the catboost module? CatBoost module is an open-source library that is fast, scalable, a very high-performance gradient boosting system on decision trees and other Machine Learning tasks. It [&hellip;]<\/p>\n","protected":false},"author":28,"featured_media":16316,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-16300","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python-modules"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/16300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=16300"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/16300\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/16316"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=16300"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=16300"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=16300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}