{"id":9117,"date":"2020-10-07T09:50:20","date_gmt":"2020-10-07T09:50:20","guid":{"rendered":"https:\/\/www.askpython.com\/?p=9117"},"modified":"2024-10-31T09:17:07","modified_gmt":"2024-10-31T09:17:07","slug":"subset-a-dataframe","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python\/examples\/subset-a-dataframe","title":{"rendered":"How to Subset a DataFrame in Python?"},"content":{"rendered":"\n<p>In this tutorial, we will go over several ways that you can use to subset a dataframe. If you are importing data into Python then you must be aware of Data Frames. A DataFrame is a <strong>two-dimensional data structure<\/strong>, i.e., data is aligned in a tabular fashion in rows and columns.<\/p>\n\n\n\n<p>Subsetting a data frame is the process of <strong>selecting a set of desired rows and columns from the data frame. <\/strong><\/p>\n\n\n\n<p>You can select:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>all rows and limited columns<\/li><li>all columns and limited rows <\/li><li>limited rows and limited columns. <\/li><\/ul>\n\n\n\n<p>Subsetting a data frame is important as it allows you to access only a certain part of the data frame. This comes in handy when you want to reduce the number of parameters in your data frame. <\/p>\n\n\n\n<p>Let&#8217;s start with importing a dataset to work on. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Importing the Data to Build the Dataframe<\/h2>\n\n\n\n<p>In this tutorial we are using the California Housing dataset.  <\/p>\n\n\n\n<p>Let&#8217;s start with importing the data into a data frame using <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-pandas-module-tutorial\" class=\"rank-math-link\">pandas<\/a>. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\nhousing = pd.read_csv(&quot;\/sample_data\/california_housing.csv&quot;)\nhousing.head()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"921\" height=\"196\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/housing-dataframe.png\" alt=\"Housing Dataframe\" class=\"wp-image-9118\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/housing-dataframe.png 921w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/housing-dataframe-300x64.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/housing-dataframe-768x163.png 768w\" sizes=\"auto, (max-width: 921px) 100vw, 921px\" \/><figcaption>Housing Dataframe<\/figcaption><\/figure><\/div>\n\n\n\n<p>Our <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/save-dataframe-as-csv-file\" class=\"rank-math-link\">csv file<\/a> is now stored in housing variable as a Pandas data frame. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Select a Subset of a Dataframe using the Indexing Operator <\/h2>\n\n\n\n<p>Indexing Operator is just a fancy name for <strong>square brackets. <\/strong>You can select columns, rows, and a combination of rows and columns using just the square brackets. Let&#8217;s see this in action. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Selecting Only Columns  <\/h3>\n\n\n\n<p>To select a column using indexing operator use the following line of code. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing&#x5B;&#039;population&#039;]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"422\" height=\"219\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/population.png\" alt=\"Population\" class=\"wp-image-9119\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/population.png 422w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/population-300x156.png 300w\" sizes=\"auto, (max-width: 422px) 100vw, 422px\" \/><figcaption>Population<\/figcaption><\/figure><\/div>\n\n\n\n<p>This line of code selects the column with label as &#8216;population&#8217; and displays all row values corresponding to that.<\/p>\n\n\n\n<p><strong>You can also select multiple columns using indexing operator.<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing&#x5B;&#x5B;&#039;population&#039;, &#039;households&#039; ]]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"283\" height=\"419\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/population-and-household.png\" alt=\"Population And Household\" class=\"wp-image-9120\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/population-and-household.png 283w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/population-and-household-203x300.png 203w\" sizes=\"auto, (max-width: 283px) 100vw, 283px\" \/><figcaption>Population And Household<\/figcaption><\/figure><\/div>\n\n\n\n<p>To subset a dataframe and store it, use the following line of code :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing_subset = housing&#x5B;&#x5B;&#039;population&#039;, &#039;households&#039; ]]\nhousing_subset.head()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"236\" height=\"207\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/pop-and-household.png\" alt=\"Pop And Household\" class=\"wp-image-9121\"\/><\/figure><\/div>\n\n\n\n<p>This creates a separate data frame as a subset of the original one. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Selecting Rows<\/h3>\n\n\n\n<p>You can use the indexing operator to select specific rows based on certain conditions. <\/p>\n\n\n\n<p>For example to select rows having population greater than 500 you can use the following line of code. <\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\npopulation_500 = housing&#x5B;housing&#x5B;&#039;population&#039;]&gt;500]\npopulation_500\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"835\" height=\"409\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/greater-than-500.png\" alt=\"Greater Than 500\" class=\"wp-image-9122\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/greater-than-500.png 835w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/greater-than-500-300x147.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/greater-than-500-768x376.png 768w\" sizes=\"auto, (max-width: 835px) 100vw, 835px\" \/><figcaption>population Greater Than 500<\/figcaption><\/figure>\n\n\n\n<p>You can also further subset a data frame. For example, let&#8217;s try and filter rows from our housing_subset data frame that we created above.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\npopulation_500 = housing_subset&#x5B;housing&#x5B;&#039;population&#039;]&gt;500]\npopulation_500\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"264\" height=\"418\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/subset.png\" alt=\"Subset\" class=\"wp-image-9123\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/subset.png 264w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/subset-189x300.png 189w\" sizes=\"auto, (max-width: 264px) 100vw, 264px\" \/><figcaption>Subset<\/figcaption><\/figure><\/div>\n\n\n\n<p>Note that the two outputs above have the same number of rows (which they should). <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Subset a Dataframe using Python .loc()<\/h2>\n\n\n\n<p><strong>.<\/strong><a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/python-loc-function\" class=\"rank-math-link\"><strong>loc<\/strong>\u00a0indexer<\/a> is an effective way to select rows and columns from the data frame. It can also be used to select rows and columns simultaneously. <\/p>\n\n\n\n<p>An important thing to remember is that<strong> .loc() works on the labels of rows and columns.<\/strong> After this, we will look at .iloc() that is based on an index of rows and columns. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Selecting Rows with loc()<\/h3>\n\n\n\n<p>To select a single row using .loc() use the following line of code.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing.loc&#x5B;1]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"263\" height=\"190\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/loc-q.png\" alt=\"Loc\" class=\"wp-image-9124\"\/><figcaption>Loc<\/figcaption><\/figure><\/div>\n\n\n\n<p>To select multiple rows use :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing.loc&#x5B;&#x5B;1,5,7]]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"922\" height=\"138\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/loc.png\" alt=\"Loc\" class=\"wp-image-9125\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/loc.png 922w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/loc-300x45.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/loc-768x115.png 768w\" sizes=\"auto, (max-width: 922px) 100vw, 922px\" \/><figcaption>Loc<\/figcaption><\/figure><\/div>\n\n\n\n<p>You can also slice the rows between a starting index and ending index.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing.loc&#x5B;1:7]\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"919\" height=\"259\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/slicing.png\" alt=\"Slicing\" class=\"wp-image-9126\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/slicing.png 919w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/slicing-300x85.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/slicing-768x216.png 768w\" sizes=\"auto, (max-width: 919px) 100vw, 919px\" \/><figcaption>Slicing<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">2. Selecting rows and columns <\/h3>\n\n\n\n<p>To select specific rows and specific columns out of the data frame, use the following line of code :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing.loc&#x5B;1:7,&#x5B;&#039;population&#039;, &#039;households&#039;]]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"240\" height=\"270\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/Screenshot-2020-10-05-at-12.20.30-PM.png\" alt=\"rows and columns \n\" class=\"wp-image-9127\"\/><\/figure><\/div>\n\n\n\n<p>This line of code selects rows from 1 to 7 and columns corresponding to the labels &#8216;population&#8217; and &#8216;housing&#8217;.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Subset a Dataframe using Python iloc()<\/h2>\n\n\n\n<p><strong><a href=\"https:\/\/www.askpython.com\/python\/built-in-methods\/python-iloc-function\" class=\"rank-math-link\">iloc() function<\/a><\/strong> is short for <strong>integer location<\/strong>. It works entirely on integer indexing for both rows and columns. <\/p>\n\n\n\n<p>To select a subset of rows and columns using iloc() use the following line of code:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nhousing.iloc&#x5B;&#x5B;2,3,6], &#x5B;3, 5]]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"244\" height=\"145\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2020\/10\/iloc.png\" alt=\"Iloc\" class=\"wp-image-9128\"\/><figcaption>Iloc<\/figcaption><\/figure><\/div>\n\n\n\n<p>This line of code selects row number<strong> 2, 3 and 6<\/strong> along with column number <strong>3 and 5.<\/strong> <\/p>\n\n\n\n<p>Using iloc saves you from writing the complete labels of rows and columns. <\/p>\n\n\n\n<p>You can also use iloc() to select rows or columns individually just like loc() after replacing the labels with integers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion <\/h2>\n\n\n\n<p>This tutorial was about subsetting a data frame in python using square brackets, loc and iloc. We learnt how to import a dataset into a data frame and then how to filter rows and columns from the data frame. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we will go over several ways that you can use to subset a dataframe. If you are importing data into Python then you must be aware of Data Frames. A DataFrame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Subsetting a data frame [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":9129,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-9117","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-examples"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9117","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=9117"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/9117\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/9129"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=9117"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=9117"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=9117"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}