www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Data Analysis With Python
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Agenda
Python Applications
Data Life-cycle
Python For Data Analysis
What is Pandas? – Numpy, Scipy
Pandas Operations
Python for Statistics
Python for Hadoop
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Python Applications
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Python Applications
Web Scraping
Testing
Web
Development
Data Analysis
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Data Life-Cycle
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Data Life-Cycle
Data
Data
Data
Data
Data
Warehousing
Data AnalysisData AnalysisData Analysis Data Visualization
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
What is Data Analysis?
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
What is Data Analysis?
Percentage increase in unemployed
youth in Afghanistan between 2010-2011
Data of unemployed
youth across the globe
from 2010-2014
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
What is Pandas?
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Data Analysis Using Python
Pandas is a software library written for the Python programming language for data manipulation and analysis.
Numpy and Scipy
and Matplotlib
Pandas is well suited for many different kinds of data:
 Tabular data with heterogeneously-typed columns.
 Ordered and unordered time series data.
 Arbitrary matrix data with row and column labels
 Any other form of observational / statistical data sets. The data actually
need not be labeled at all to be placed into a pandas data structure
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Pandas Operations
Changing the Index Concatenation
Slicing the
DataFrame
Data conversion
Changing the
column headers
Joining and Merging
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Slicing
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Slicing
Index Int rate US GDP Thousands
2001 2 50
2002 3 55
2003 2 65
2004 2 55
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Slicing
Slicing the starting 2 rows
Slicing the last 2 rows
Index Int rate US GDP Thousands
2001 2 50
2002 3 55
2003 2 65
2004 2 55
Index Int rate US GDP Thousands
2001 2 50
2002 3 55
Index Int rate US GDP Thousands
2003 2 65
2004 2 55
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Merging
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Merging
Index HPI Int rate US GDP Thousands
2001 80 2 50
2002 85 3 55
2003 88 2 65
2004 85 2 55
Index HPI Int rate US GDP Thousands
2005 80 2 50
2006 85 3 55
2007 88 2 65
2008 85 2 55
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Merging
Index HPI Int rate US GDP Thousands
2001 80 2 50
2002 85 3 55
2003 88 2 65
2004 85 2 55
Index HPI Int rate US GDP Thousands
2005 80 2 50
2006 85 3 55
2007 88 2 65
2008 85 2 55
Merging
Index HPI Int rate US GDP
Thousands x
US GDP
Thousands y
0 80 2 50 50
1 85 3 55 55
2 88 2 65 65
3 85 2 55 55
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Joining
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Joining
Index Int rate US GDP
Thousands
2001 2 50
2002 3 55
2003 2 65
2004 2 55
Index Low tier
HPI
Unemployment
2001 50 7
2003 52 8
2004 50 9
2005 43 6
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Joining
Index Int rate US GDP
Thousands
2001 2 50
2002 3 55
2003 2 65
2004 2 55
Index Low tier
HPI
Unemployment
2001 50 7
2003 52 8
2004 50 9
2005 43 6
Index Int rate US GDP
Thousands
Low tier
HPI
Unemployment
2001 2.0 50.0 50.0 7.0
2002 3.0 55.0 NaN NaN
2003 2.0 65.0 52.0 8.0
2004 2.0 55.0 50.0 9.0
2005 NaN NaN 53.0 6.0
Joining
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Changing the Index and Column Headers
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Changing the Index and Column Headers
Index Int rate US GDP Thousands
2001 2 50
2002 3 55
2003 2 65
2004 2 55
Index US GDP Thousands
2001 50
2002 55
2003 65
2004 55
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Changing the Index and Column Headers
Index Int rate US GDP Thousands
2001 2 50
2002 3 55
2003 2 65
2004 2 55
Index US GDP Thousands
2001 50
2002 55
2003 65
2004 55
Index US GDP Thousands
2 50
3 55
2 65
2 55
Index GDP
2001 50
2002 55
2003 65
2004 55
Changing the Index
Changing the
column headers
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Concatenation
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Concatenation
Student
Name:
Age:
Sex:
Phone number:
Student Data
Concatenate
E-mail
Student
Name:
Age:
Sex:
Phone number:
E-mail:
Concatenation
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Data Munging
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Data Munging
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Use-Case
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Example: Youth Unemployment Data
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Example: Youth Unemployment Data
Problem Statement
Find the change in percentage of unemployed youth for every country from 2010-2011
There is approx. 3.1%
increase in unemployed
youth in ‘Arab World’
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Example: Youth Unemployment Data
Column 1 – Country Name
Column 2 – Country Code
Column 3 – 2010
Column 4 – 2011
Column 5 – 2012
Column 6 – 2013
Column 7 – 2014
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Python For Statistics
from statistics import mean
print(mean([1,1,1,1,3,4,4,4,5,2]))
Mean
Median
from statistics import median
print(median([1,1,1,1,3,4,4,4,5,2]))
High Median
Low Median
from statistics import mode
print(mode([1,1,1,1,3,4,4,4,5,2]))
Mode
from statistics import mode
print(mode([1,1,1,1,3,4,4,4,5,2]))
Variance
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Python For Hadoop : Pydoop
Pydoop is a Python interface to Hadoop that allows you to write MapReduce applications and interact with HDFS
in pure Python.
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING
Python For Hadoop : Pydoop
Python Applications What Is Data Analysis
Pandas Operations Data Analysis Use-Case
What Is Pandas
Python For Statistics And
Python For Hadoop
www.edureka.co/pythonEDUREKA PYTHON CERTIFICATION TRAINING

Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Training | Edureka