{"id":27456,"date":"2022-02-27T15:05:47","date_gmt":"2022-02-27T15:05:47","guid":{"rendered":"https:\/\/www.askpython.com\/?p=27456"},"modified":"2023-02-18T15:50:26","modified_gmt":"2023-02-18T15:50:26","slug":"data-analysis","status":"publish","type":"post","link":"https:\/\/www.askpython.com\/python-modules\/pandas\/data-analysis","title":{"rendered":"Data Analysis in Python with Pandas"},"content":{"rendered":"\n<p>Data Analysis is one of the most important tools in today\u2019s world. Data is present in every domain of life today whether it is biological data or data from a tech company. No matter what kind of data you are working with, you must know how to filter and analyze your data. Today we are going to deal with one such data analysis tool in Python i.e Pandas.&nbsp;<\/p>\n\n\n\n<p>Let\u2019s get started by first learning about some of the major libraries used for data analysis in Python.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"major-libraries-for-data-analysis-in-python\">Major Libraries for Data Analysis in Python<\/h2>\n\n\n\n<p>Python has many robust tools for data analysis such as Python libraries which provide data analysts the necessary functionality to analyze data.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong><a href=\"https:\/\/www.askpython.com\/python\/numpy-trigonometric-functions\" data-type=\"post\" data-id=\"14347\">Numpy<\/a> and <a href=\"https:\/\/www.askpython.com\/python-modules\/python-scipy\" data-type=\"post\" data-id=\"3248\">Scipy<\/a>:<\/strong> Both of these libraries are powerful and extensively used in scientific computing.<\/li><li>Pandas: Pandas is a robust tool used for data manipulation. Pandas is a relatively new tool that have been added to the library of data science.<\/li><li>Matplotlib: Matplotlib is an excellent package and is mainly used for plotting and visualization. You can plot a variety of graphs using Matplotlib, such as histograms, line plots, heat plots, etc.<\/li><li><strong>Scikit-Learn:<\/strong> Scikit-Learn is an excellent tool for machine learning. This library has all the necessary tools required for machine learning and statistical modeling.<\/li><li><strong>Stats Models<\/strong>: It is another excellent tool for statistical modelling. This library allows users to build statistical models and analyze them.<\/li><li><strong><a href=\"https:\/\/www.askpython.com\/python-modules\/python-seaborn-tutorial\" data-type=\"post\" data-id=\"4055\">Seaborn<\/a>: <\/strong>Seaborn is also extensively used for data visualization. It is based on Matplotlib and is<strong> <\/strong>used for building statistical graphics in Python.<\/li><\/ul>\n\n\n\n<p>Out of all these tools, we are going to learn about Pandas and work with hands-on data analysis in Pandas in this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-pandas-and-why-is-it-so-useful-in-data-analysis\">What is Pandas and Why is it so useful in Data Analysis?<\/h2>\n\n\n\n<p>Pandas is an open-source python library built on top of the Numpy Package. It provides all the necessary functions and methods which make the data analysis process faster and easier. Because of its flexibility and simpler syntax, it is most commonly used for data analysis. Pandas is really helpful when it comes to working with Excel spreadsheets, tabular data, or SQL.<\/p>\n\n\n\n<p>The two main data structures in Pandas are DataFrame and Series. A DataFrame is a two-dimensional data structure. In this article, we will be working with the Pandas dataframe. Data can be imported in a variety of formats for data analysis in Python, such as CSV, JSON, and SQL.<\/p>\n\n\n\n<p>Now let&#8217;s get on to the data analysis part.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"installing-different-environments-and-importing-pandas\">Installing Different Environments and Importing Pandas<\/h2>\n\n\n\n<p>First, you need to install Pandas. You can use different environments for the same. You can either use <a href=\"https:\/\/www.askpython.com\/python-modules\/python-anaconda-tutorial\" data-type=\"post\" data-id=\"10679\">Anaconda<\/a> to run Pandas directly on your computer or you can also use a <a href=\"https:\/\/www.askpython.com\/python\/jupyter-notebook-for-python\" data-type=\"post\" data-id=\"12648\">Jupyter Notebook<\/a> through your browser on Google Cloud. Anaconda comes with many pre-installed packages and can easily be downloaded on Mac, Windows, or Linux.<\/p>\n\n\n\n<p>Let\u2019s see the following steps on how to install and import Pandas. To install Pandas in your environment, use the <a href=\"https:\/\/www.askpython.com\/python-modules\/python-pip\" data-type=\"post\" data-id=\"3848\">pip command<\/a>.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\npip install pandas\n<\/pre><\/div>\n\n\n<p>Note: If you are using Google Colab, you do not need to add this command since Google Colab comes with Pandas pre-installed.<\/p>\n\n\n\n<p>Now to import Pandas into your environment type the following command.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\n<\/pre><\/div>\n\n\n<p>Now that we know, how to install and import Pandas, let\u2019s understand more closely what Pandas Dataframe is.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-pandas-dataframe\">The Pandas DataFrame<\/h2>\n\n\n\n<p>Pandas DataFrame is a two-dimensional Data structure, almost like a 2-D array.DataFrame has labeled axes (rows and columns) and is mutable.<\/p>\n\n\n\n<p>Let\u2019s get on to the hands-on data analysis part.<\/p>\n\n\n\n<p>In this article, we are using the data provided from a Kaggle competition about the &#8220;height of male and female by country in 2022.&#8221;<\/p>\n\n\n\n<p><strong>Link to the dataset<\/strong>: https:\/\/www.kaggle.com\/majyhain\/height-of-male-and-female-by-country-2022<\/p>\n\n\n\n<p>Let\u2019s load the dataset now and read it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"reading-csv-files-and-loading-the-data\">Reading CSV Files and Loading the Data<\/h2>\n\n\n\n<p>To read the file into DataFrame, you need to put the path of your file as an argument to the following function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf = pd.read_csv(&quot;C:\/\/Users\/\/Intel\/\/Documents\/\/Height of Male and Female by Country 2022.csv&quot;)\ndf.head()\n<\/pre><\/div>\n\n\n<p>Here we have used the read_csv function as we are reading a CSV file.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"717\" height=\"165\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-346.png\" alt=\"Screenshot 346\" class=\"wp-image-27459\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-346.png 717w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-346-300x69.png 300w\" sizes=\"auto, (max-width: 717px) 100vw, 717px\" \/><\/figure><\/div>\n\n\n\n<p>You can check the first n entries of your dataframe with the help of the head function. If you don\u2019t pass the number of entries, the first 5 rows will be displayed by default.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"evaluating-the-pandas-dataframe\">Evaluating the Pandas DataFrame<\/h2>\n\n\n\n<p>Now we will have a look at the dataframe that we are working with. <\/p>\n\n\n\n<p>Let\u2019s have a look at the dimensions of the data that we are using. For that, we need to pass the following command.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.shape\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n(199, 6)\n<\/pre><\/div>\n\n\n<p>The <a href=\"https:\/\/www.askpython.com\/python-modules\/pandas\/pandas-shape-attribute\" data-type=\"post\" data-id=\"23655\">shape function<\/a> will return a tuple with the number of rows and columns. We can see that our dataframe has 199 rows and 6 columns, or features.<\/p>\n\n\n\n<p>Next, we will see a summary of our dataset with the help of the info function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.info\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n&lt;bound method DataFrame.info of      Rank            Country Name  Male Height in Cm  Female Height in Cm  \\\n0       1             Netherlands             183.78               170.36   \n1       2              Montenegro             183.30               169.96   \n2       3                 Estonia             182.79               168.66   \n3       4  Bosnia and Herzegovina             182.47               167.47   \n4       5                 Iceland             182.10               168.91   \n..    ...                     ...                ...                  ...   \n194   195              Mozambique             164.30               155.42   \n195   196        Papua New Guinea             163.10               156.89   \n196   197         Solomon Islands             163.07               156.79   \n197   198                    Laos             162.78               153.10   \n198   199             Timor-Leste             160.13               152.71   \n\n     Male Height in Ft  Female Height in Ft  \n0                 6.03                 5.59  \n1                 6.01                 5.58  \n2                 6.00                 5.53  \n3                 5.99                 5.49  \n4                 5.97                 5.54  \n..                 ...                  ...  \n194               5.39                 5.10  \n195               5.35                 5.15  \n196               5.35                 5.14  \n197               5.34                 5.02  \n198               5.25                 5.01  \n\n&#x5B;199 rows x 6 columns]&gt;\n<\/pre><\/div>\n\n\n<p>You can see that the output gives us some valuable information about the data frame. It shows dtypes, memory usage, non-null values, and column names.<\/p>\n\n\n\n<p>Next, we will get a little bit of an idea of the statistics of the dataset.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.describe()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"636\" height=\"265\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-349.png\" alt=\"Screenshot 349\" class=\"wp-image-27467\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-349.png 636w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-349-300x125.png 300w\" sizes=\"auto, (max-width: 636px) 100vw, 636px\" \/><\/figure><\/div>\n\n\n\n<p>In the output, we can see counts, mean, median, standard deviation, upper and lower quartiles, and minimum and maximum values for each feature present in the dataset.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-manipulation-and-analysis\">Data Manipulation and Analysis<\/h2>\n\n\n\n<p>Let\u2019s first quickly look at the different features in the dataset to help you get a better understanding of the dataset.<\/p>\n\n\n\n<p><strong>Country Name: <\/strong>Name of the country for which data has been collected.<\/p>\n\n\n\n<p><strong>Male Height in Centimeters:<\/strong> Height of the Male population in centimeters<\/p>\n\n\n\n<p><strong>Female Height in Cm<\/strong>-Height of Female Population in Cm<\/p>\n\n\n\n<p><strong>Male Height in Ft<\/strong>.-Height of the male population in Ft.<\/p>\n\n\n\n<p><strong>Female Height in Ft.<\/strong>-Height of the female population in Ft.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"setting-the-dataframe-index\">Setting the DataFrame Index<\/h3>\n\n\n\n<p>Now, let\u2019s set the data frame index.<\/p>\n\n\n\n<p>We can see from our data that the first column \u2018Rank\u2019 is different for different countries and starts from number1. We can make use of that and set the \u2018Rank\u2019 column as the index.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.set_index(&#039;Rank&#039;,inplace=True)\ndf.index\n<\/pre><\/div>\n\n\n<p>Let\u2019s see the dataframe once again.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf= pd.read_csv(&quot;C:\/\/Users\/\/Intel\/\/Documents\/\/Height of Male and Female by Country 2022.csv&quot;, index_col=&#039;Rank&#039;)\ndf.head()\n<\/pre><\/div>\n\n\n<p> <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"697\" height=\"222\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-351.png\" alt=\"Screenshot 351\" class=\"wp-image-27468\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-351.png 697w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-351-300x96.png 300w\" sizes=\"auto, (max-width: 697px) 100vw, 697px\" \/><\/figure><\/div>\n\n\n\n<p>The dataset looks a bit more organized now.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"rows-and-columns\">Rows and Columns<\/h3>\n\n\n\n<p>You already know that dataframes have rows and columns. The columns in the dataframe can be easily accessed with the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.columns\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nIndex(&#x5B;&#039;Country Name&#039;, &#039;Male Height in Cm&#039;, &#039;Female Height in Cm&#039;,\n       &#039;Male Height in Ft&#039;, &#039;Female Height in Ft&#039;],\n      dtype=&#039;object&#039;)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf&#x5B;&#039;Country Name&#039;].head()\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nRank\n1               Netherlands\n2                Montenegro\n3                   Estonia\n4    Bosnia and Herzegovina\n5                   Iceland\nName: Country Name, dtype: object\n<\/pre><\/div>\n\n\n<p>We can also rename our columns with the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.rename(columns={&#039;Male Height in Cm&#039;: &#039;Male Height in Centimeter&#039;}, inplace=True)\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.head()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"729\" height=\"224\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-353.png\" alt=\"Screenshot 353\" class=\"wp-image-27470\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-353.png 729w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-353-300x92.png 300w\" sizes=\"auto, (max-width: 729px) 100vw, 729px\" \/><\/figure><\/div>\n\n\n\n<p>You can also add columns to your data frame. Let\u2019s take a look at how we can do that.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf_copy = df.copy()\ndf_copy&#x5B;&#039;Height Ratio&#039;] = &#039;N&#039;\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf_copy.head()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"817\" height=\"214\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-355.png\" alt=\"Screenshot 355\" class=\"wp-image-27471\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-355.png 817w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-355-300x79.png 300w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-355-768x201.png 768w\" sizes=\"auto, (max-width: 817px) 100vw, 817px\" \/><\/figure><\/div>\n\n\n\n<p>We have assigned the value of &#8220;N&#8221; to the new columns.<\/p>\n\n\n\n<p>Let\u2019s imagine you have another dataframe that you want to append or add to the existing DataFrame(df_copy). We can do that with the help of the append function.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndata_to_append = {&#039;Country Name&#039;: &#x5B;&#039;X&#039;, &#039;Y&#039;],\n                  &#039;Male Height in Centimeter&#039;: &#x5B;&#039;172.43&#039;, &#039;188.94&#039;],\n                  &#039;Female Height in Cm&#039;: &#x5B;&#039;150.99&#039;, &#039;160.99&#039;],\n                  &#039;Male Height in Ft&#039;: &#x5B;&#039;6.09&#039;, &#039;5.44&#039;],\n                  &#039;Female Height in Ft&#039;: &#x5B;&#039;5.66&#039;, &#039;6.66&#039;],\n                  &#039;Height Ratio&#039;: &#x5B;&#039;Y&#039;, &#039;N&#039;]}\n                  \ndf_append = pd.DataFrame(data_to_append)\ndf_append\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"752\" height=\"119\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-357.png\" alt=\"Screenshot 357\" class=\"wp-image-27472\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-357.png 752w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-357-300x47.png 300w\" sizes=\"auto, (max-width: 752px) 100vw, 752px\" \/><\/figure><\/div>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf_copy = df_copy.append(df_append, ignore_index=True)\ndf_copy.tail()\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"753\" height=\"194\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-360.png\" alt=\"Screenshot 360\" class=\"wp-image-27474\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-360.png 753w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-360-300x77.png 300w\" sizes=\"auto, (max-width: 753px) 100vw, 753px\" \/><\/figure><\/div>\n\n\n\n<p>We can use the drop function to remove rows and columns from our dataframe.<\/p>\n\n\n\n<p>For removing rows, you should use the following code:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf_copy.drop(labels=179, axis=0, inplace=True)\n<\/pre><\/div>\n\n\n<p>For removing columns, the following code will work:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf_copy.drop(labels=&#039;Height Ratio&#039;, axis=1, inplace=True)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"filtering-the-data\">Filtering the Data<\/h3>\n\n\n\n<p>We can also select the specific data we need. We will use one of the simplest methods, loc, and iloc, to select the data.<\/p>\n\n\n\n<p><strong>For example:<\/strong><\/p>\n\n\n\n<p>We are using loc to access rows and columns based on labels\/indexes.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.loc&#x5B;193]\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nCountry Name                  Nepal\nMale Height in Centimeter    164.36\nFemale Height in Cm          152.39\nMale Height in Ft              5.39\nFemale Height in Ft               5\nName: 193, dtype: object\n<\/pre><\/div>\n\n\n<p>You can also visualize columns using the following code.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.loc&#x5B;193, &#x5B;&#039;Country Name&#039;, &#039;Male Height in Centimeter&#039;,&#039;Female Height in Cm&#039;]]\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nCountry Name                  Nepal\nMale Height in Centimeter    164.36\nFemale Height in Cm          152.39\nName: 193, dtype: object\n<\/pre><\/div>\n\n\n<p>Now, if you want to see the male population with a height above 17 cm, we will add a condition to loc.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.loc&#x5B;df&#x5B;&#039;Male Height in Centimeter&#039;] &gt;= 170]\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"414\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-362.png\" alt=\"Screenshot 362\" class=\"wp-image-27475\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-362.png 740w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-362-300x168.png 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure><\/div>\n\n\n\n<p>If you want to select data present in the first row and column only, you can use <a href=\"https:\/\/www.askpython.com\/python\/built-in-methods\/python-iloc-function\" data-type=\"post\" data-id=\"8662\">iloc<\/a>. iloc selects data based on integer position or boolean array.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.iloc&#x5B;0,0]\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n&#039;Netherlands&#039;\n<\/pre><\/div>\n\n\n<p>You can also select an entire row. In this case, we have accessed row no. 10.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.iloc&#x5B;10,:]\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nCountry Name                 Ukraine\nMale Height in Centimeter     180.98\nFemale Height in Cm           166.62\nMale Height in Ft               5.94\nFemale Height in Ft             5.47\nName: 11, dtype: object\n<\/pre><\/div>\n\n\n<p>We can also select an entire column. In this case, we have selected the last column.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.iloc&#x5B;:,-1]\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nRank\n1      5.59\n2      5.58\n3      5.53\n4      5.49\n5      5.54\n       ... \n195    5.10\n196    5.15\n197    5.14\n198    5.02\n199    5.01\nName: Female Height in Ft, Length: 199, dtype: float64\n<\/pre><\/div>\n\n\n<p>You can also select multiple rows and columns.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.iloc&#x5B;100:199, 2:5]\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"459\" height=\"418\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-364.png\" alt=\"Screenshot 364\" class=\"wp-image-27476\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-364.png 459w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-364-300x273.png 300w\" sizes=\"auto, (max-width: 459px) 100vw, 459px\" \/><\/figure>\n\n\n\n<p>In the next section, we will learn how to look for missing data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"working-with-missing-values\">Working with Missing Values<\/h3>\n\n\n\n<p>The first step to identifying the missing value in the dataframe is to use the function isnull.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.isnull()\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"684\" height=\"412\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-366.png\" alt=\"Screenshot 366\" class=\"wp-image-27477\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-366.png 684w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-366-300x181.png 300w\" sizes=\"auto, (max-width: 684px) 100vw, 684px\" \/><\/figure>\n\n\n\n<p>We can see that the output is the same object with the same dimensions as the original DataFrame with boolean values for each and every element of the dataset. <\/p>\n\n\n\n<p>The missing values are considered True in this case, else they will be considered False. In this case, we can safely say that we do not have any missing values. However, we will run another quality check for our data with the following command.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.isnull().sum()\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nCountry Name                 0\nMale Height in Centimeter    0\nFemale Height in Cm          0\nMale Height in Ft            0\nFemale Height in Ft          0\ndtype: int64\n<\/pre><\/div>\n\n\n<p>Let\u2019s check the proportion of missing values for each column.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.isnull().sum() \/ df.shape&#x5B;0]\n<\/pre><\/div>\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nCountry Name                 0.0\nMale Height in Centimeter    0.0\nFemale Height in Cm          0.0\nMale Height in Ft            0.0\nFemale Height in Ft          0.0\ndtype: float64\n<\/pre><\/div>\n\n\n<p>We can see that the proportion of missing values is zero for all the columns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"plotting-the-data\">Plotting the Data<\/h3>\n\n\n\n<p>This is the most important part of any data analysis project. In this part, we will learn how we can use Pandas to visualize our data. We will use the plot function in Pandas to build the plots. <\/p>\n\n\n\n<p>Note: There are many other Python libraries that provide better data visualization. If anyone would like to have more detailed and elaborate plots, they can use the Matplotlib and Seaborn libraries.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/histogram-basic-to-advanced\" data-type=\"post\" data-id=\"27552\">Histograms<\/a><\/strong><\/p>\n\n\n\n<p>A histogram helps you to quickly understand and visualize the distribution of numerical variables within your dataset. A histogram will divide the values within each numerical variable into bins and will count the total number of observations that fall into each bin. Histograms help to distribute the data and get an immediate intuition about your data.<\/p>\n\n\n\n<p>In the following example, we have plotted a histogram for the feature &#8220;male height in centimeters.&#8221;<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf&#x5B;&#039;Male Height in Centimeter&#039;].plot(kind=&#039;hist&#039;)\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"395\" height=\"285\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-368.png\" alt=\"Screenshot 368\" class=\"wp-image-27478\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-368.png 395w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-368-300x216.png 300w\" sizes=\"auto, (max-width: 395px) 100vw, 395px\" \/><\/figure><\/div>\n\n\n\n<p>You can see from the histogram that most f male population have heights 175 cm and 180cm.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/www.askpython.com\/python-modules\/matplotlib\/matplotlib-subplots\" data-type=\"post\" data-id=\"9941\">Scatter Plots<\/a><\/strong><\/p>\n\n\n\n<p>Scatter Plots help you to visualize the relationship between two variables. The plot is built on cartesian coordinates. Scatter plots display the values as a collection of points and each point denotes the value of one variable indicating the position on the X-axis and another variable indicating the position Y-axis.<\/p>\n\n\n\n<p>In the following example, we have built a scatter plot to understand the relationship between the two variables, i.e., male height and female height.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndf.plot(x=&#039;Male Height in Centimeter&#039;, y=&#039;Female Height in Cm&#039;, kind=&#039;scatter&#039;)\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"588\" height=\"306\" src=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-370.png\" alt=\"Screenshot 370\" class=\"wp-image-27479\" srcset=\"https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-370.png 588w, https:\/\/www.askpython.com\/wp-content\/uploads\/2022\/02\/Screenshot-370-300x156.png 300w\" sizes=\"auto, (max-width: 588px) 100vw, 588px\" \/><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h2>\n\n\n\n<p>In this article, we learned a lot about hands-on data analysis in Python using Pandas, and I think that will help you a lot to understand what you can do with Pandas. Nowadays, Pandas is a widely used tool in data science and have replaced Excel in the work field. Pandas make data analysis a lot easier with its simpler syntax and flexibility. Hope you had fun with Pandas!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Analysis is one of the most important tools in today\u2019s world. Data is present in every domain of life today whether it is biological data or data from a tech company. No matter what kind of data you are working with, you must know how to filter and analyze your data. Today we are [&hellip;]<\/p>\n","protected":false},"author":39,"featured_media":27481,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[94],"tags":[],"class_list":["post-27456","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-pandas"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/27456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/comments?post=27456"}],"version-history":[{"count":0,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/posts\/27456\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media\/27481"}],"wp:attachment":[{"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/media?parent=27456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/categories?post=27456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.askpython.com\/wp-json\/wp\/v2\/tags?post=27456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}