# Pandas Basics Cheat Sheet 

## Pandas Data Structures
There are two main types of data structures that the Pandas library is centered around. The first is a one-dimensional array called a Series, and the second is a two-dimensional table called a Data Frame.

In [3]:
## Series — One dimensional labeled array

s = pd.Series([3, -5, 7, 4], index = ['a','b','c','d'])
s

a    3
b   -5
c    7
d    4
dtype: int64

In [4]:
## Data Frame — A two dimensional labeled data structure

data = {'Country':['Belgium','India','Brazil'], 'Capital':['Brussels','New Delhi','Brasilia'], 'Population':['111907','1303021','208476']}

In [5]:
df = pd.DataFrame(data, columns = ['Country','Capital','Population']) 

In [6]:
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,111907
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


In [21]:
data2 = {'Even':[2,4,6], 'Odd':[1,3,5]}

In [22]:
df2 = pd.DataFrame(data2, columns = ['Even','Odd']) 

In [23]:
df2

Unnamed: 0,Even,Odd
0,2,1
1,4,3
2,6,5


## Dropping
In this section, you’ll learn how to remove specific values from a Series, and how to remove columns or rows from a Data Frame.

__s__ and __df__ in the code below are used as examples of a Series and Data Frame throughout this section.

In [7]:
## Drop values from rows (axis = 0)
s.drop(['a','c']) 

b   -5
d    4
dtype: int64

In [8]:
s

a    3
b   -5
c    7
d    4
dtype: int64

In [9]:
df.drop('Country', axis = 1)

Unnamed: 0,Capital,Population
0,Brussels,111907
1,New Delhi,1303021
2,Brasilia,208476


## Sort & Rank
In this section, you’ll learn how to sort Data Frames by an index, or column, along with learning how to rank column values.

df in the code below is used as an example Data Frame throughout this section.

In [10]:
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,111907
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


In [11]:
## Sort by labels along an axis

df.sort_index()

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,111907
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


In [12]:
## Sort by values along an axis

df.sort_values(by = 'Country')

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,111907
2,Brazil,Brasilia,208476
1,India,New Delhi,1303021


In [13]:
## Assign ranks to entries
df.rank()

Unnamed: 0,Country,Capital,Population
0,1.0,2.0,1.0
1,3.0,3.0,2.0
2,2.0,1.0,3.0


## Retrieving Series/DataFrame Information
In this section, you’ll learn how to retrieve info from a Data Frame that includes the dimensions, column names column types, and index range.

df in the code below is used as an example Data Frame throughout this section.

In [14]:
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,111907
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


In [15]:
## (rows, columns)
df.shape

(3, 3)

In [16]:
## Describe index
df.index

RangeIndex(start=0, stop=3, step=1)

In [17]:
## Describe DataFrame columns
df.columns

Index(['Country', 'Capital', 'Population'], dtype='object')

In [18]:
## Info on DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Country     3 non-null      object
 1   Capital     3 non-null      object
 2   Population  3 non-null      object
dtypes: object(3)
memory usage: 200.0+ bytes


In [19]:
## Number of non-NA values
df.count()

Country       3
Capital       3
Population    3
dtype: int64

## DataFrame Summary
In this section, you’ll learn how to retrieve summary statistics of a Data Frame which include the sum of each column, min/max values of each column, mean values of each column, and others.

df2 in the code below is used as an example of a Data Frame throughout this section.

In [24]:
df2

Unnamed: 0,Even,Odd
0,2,1
1,4,3
2,6,5


In [25]:
# Sum of values
df2.sum()

Even    12
Odd      9
dtype: int64

In [26]:
# Cumulative sum of values
df2.cumsum()

Unnamed: 0,Even,Odd
0,2,1
1,6,4
2,12,9


In [27]:
# Minimum value
df2.min()

Even    2
Odd     1
dtype: int64

In [28]:
# Maximum value
df2.max()

Even    6
Odd     5
dtype: int64

In [29]:
# Summary statistics
df2.describe()

Unnamed: 0,Even,Odd
count,3.0,3.0
mean,4.0,3.0
std,2.0,2.0
min,2.0,1.0
25%,3.0,2.0
50%,4.0,3.0
75%,5.0,4.0
max,6.0,5.0


In [30]:
# Mean of values
df2.mean()

Even    4.0
Odd     3.0
dtype: float64

In [31]:
# Median of values
df2.median()

Even    4.0
Odd     3.0
dtype: float64

## Selection
In this section, you’ll learn how to retrieve specific values from a Series and Data Frame.

s and df in the code below are used as examples of a Series and Data Frame throughout this section.

In [32]:
s

a    3
b   -5
c    7
d    4
dtype: int64

In [33]:
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,111907
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


In [34]:
# Get one element
s['b']

-5

In [35]:
# Get subset of a DataFrame
df[1:]

Unnamed: 0,Country,Capital,Population
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


In [36]:
# Select single value by row & column
df.iloc[0,0]

'Belgium'

In [37]:
# Select single value by row and column labels
df.loc[0,'Country']

'Belgium'

In [39]:
# Select single row of subset rows
df.loc[2]

Country         Brazil
Capital       Brasilia
Population      208476
Name: 2, dtype: object

In [40]:
# Select a single column of subset of columns
df.loc[:,'Capital']

0     Brussels
1    New Delhi
2     Brasilia
Name: Capital, dtype: object

In [41]:
# Select rows and columns
df.loc[1,'Capital']

'New Delhi'

In [48]:
df.dtypes

Country       object
Capital       object
Population    object
dtype: object

In [49]:
#convert 'Population' column to integer
df['Population'] = df['Population'].astype(str).astype(int)

In [50]:
df.dtypes

Country       object
Capital       object
Population     int64
dtype: object

In [52]:
# Use filter to adjust DataFrame
df[df['Population'] > 120000]

Unnamed: 0,Country,Capital,Population
1,India,New Delhi,1303021
2,Brazil,Brasilia,208476


## Applying Functions
In this section, you’ll learn how to apply a function to all values of a Data Frame or a specific column.

df2 in the code below is used as an example of a Data Frame throughout this section.


In [53]:
df2

Unnamed: 0,Even,Odd
0,2,1
1,4,3
2,6,5


In [55]:
df2.apply(lambda x: x*2)

Unnamed: 0,Even,Odd
0,4,2
1,8,6
2,12,10


In [62]:
# Set index a of Series s to 6
s['a'] = 6

## Data Alignment
In this section, you’ll learn how to add, subtract, and divide two series that have different indexes from one another.

s and s3 in the code below are used as examples of Series throughout this section.

In [63]:
s

a    6
b   -5
c    7
d    4
dtype: int64

In [66]:
s3 = pd.Series([7, -2, 3], index = ['a','c','d'])

In [67]:
s3

a    7
c   -2
d    3
dtype: int64

In [68]:
# Internal Data Alignment
s + s3

a    13.0
b     NaN
c     5.0
d     7.0
dtype: float64

In [69]:
# Arithmetic Operations with Fill Methods
s.add(s3, fill_value = 0)

a    13.0
b    -5.0
c     5.0
d     7.0
dtype: float64

In [70]:
s.sub(s3, fill_value = 2)

a   -1.0
b   -7.0
c    9.0
d    1.0
dtype: float64

In [71]:
s.div(s3, fill_value = 4)

a    0.857143
b   -1.250000
c   -3.500000
d    1.333333
dtype: float64