# Python Pandas Tutorial

A Data frame is a multi-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

<img src="assets/series-and-dataframe.png" width=600px />

You can think of it as a spreadsheet data or SQL table representation.

## Data Frame Creation :

In [1]:
#import the pandas library and aliasing as pd
import pandas as pd

### Create an Empty DataFrame

#### Example 1: 

In [2]:
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


### Create a DataFrame from Lists

#### Example 1:

In [3]:
# Details of students marks mentioned inside the list 
marks = [35,67,34,89,12,55,83,56,90,99]

In [4]:
df = pd.DataFrame(marks)
print(df)

    0
0  35
1  67
2  34
3  89
4  12
5  55
6  83
7  56
8  90
9  99


#### Example 2:

In [5]:
students = [['Anand',65],['Mohan',90],['Kumaran',85],['Vinoth',50],['Pranav',93]]

In [6]:
df = pd.DataFrame(students,columns=['Name','Marks'])
print(df)

      Name  Marks
0    Anand     65
1    Mohan     90
2  Kumaran     85
3   Vinoth     50
4   Pranav     93


#### Example 3:

In [7]:
# dtype as float
df = pd.DataFrame(students,columns=['Name','Marks'],dtype=float)
print(df)

      Name  Marks
0    Anand   65.0
1    Mohan   90.0
2  Kumaran   85.0
3   Vinoth   50.0
4   Pranav   93.0


### Creating DataFrames from NumPy

#### Example 1:

In [8]:
import numpy as np
df = pd.DataFrame(np.array([85,50,75,80,78]),columns=['Maths Marks'])
print(df)

   Maths Marks
0           85
1           50
2           75
3           80
4           78


### Creating DataFrames from dictionary

There are *many* ways to create a DataFrame from scratch, but a great option is to just use a simple `dict`. 

#### Example 1:

In [9]:
students = {
    'Name'  : ['Anand','Mohan','Kumaran','Vinoth','Pranav'],
    'Maths Marks' : [65, 90, 85, 50, 93]    
}

In [10]:
df = pd.DataFrame(students)
print(df)

      Name  Maths Marks
0    Anand           65
1    Mohan           90
2  Kumaran           85
3   Vinoth           50
4   Pranav           93


#### Example 2:

#### **How did that work?**

Each *(key, value)* item in `data` corresponds to a *column* in the resulting DataFrame.

The **Index** of this DataFrame was given to us on creation as the numbers 0-4, but we could also create our own when we initialize the DataFrame. 

Let's have roll number names as our index: 

In [11]:
# Note: We can provide roll number(interger) and also string value(Surname)
df = pd.DataFrame(students,index=[100,101,102,103,104])
print(df)

        Name  Maths Marks
100    Anand           65
101    Mohan           90
102  Kumaran           85
103   Vinoth           50
104   Pranav           93


### Column Selection

In [12]:
df['Name']

100      Anand
101      Mohan
102    Kumaran
103     Vinoth
104     Pranav
Name: Name, dtype: object

### Column Addition

#### Example 1: Using Series

In [13]:
# Adding a new column to an existing DataFrame object with column label by passing new series

print("Adding a new columns by passing as Series:")
df['Physics Marks']=pd.Series([75,88,65,90,50],index=[100,101,102,103,104])
print(df)

Adding a new columns by passing as Series:
        Name  Maths Marks  Physics Marks
100    Anand           65             75
101    Mohan           90             88
102  Kumaran           85             65
103   Vinoth           50             90
104   Pranav           93             50


#### Example 2: Using NumPy

In [14]:
# Adding a new column to an existing DataFrame object with column label using numpy

df['Chemistry Marks'] = pd.DataFrame(data=np.array([85,50,75,80,78]),index=[100,101,102,103,104])
print(df)

        Name  Maths Marks  Physics Marks  Chemistry Marks
100    Anand           65             75               85
101    Mohan           90             88               50
102  Kumaran           85             65               75
103   Vinoth           50             90               80
104   Pranav           93             50               78


#### Example 3: Merging the Columns

In [15]:
print ("Calculating the total and displaying in new column")
df['Total']= df['Maths Marks'] + df['Physics Marks'] + df['Chemistry Marks']
print(df)

Calculating the total and displaying in new column
        Name  Maths Marks  Physics Marks  Chemistry Marks  Total
100    Anand           65             75               85    225
101    Mohan           90             88               50    228
102  Kumaran           85             65               75    225
103   Vinoth           50             90               80    220
104   Pranav           93             50               78    221


### Column Deletion

#### Example 1:

In [16]:
# using del function
print("Deleting the total column using DEL function:")
del df['Total']
print(df)

Deleting the total column using DEL function:
        Name  Maths Marks  Physics Marks  Chemistry Marks
100    Anand           65             75               85
101    Mohan           90             88               50
102  Kumaran           85             65               75
103   Vinoth           50             90               80
104   Pranav           93             50               78


#### Example 2:

In [17]:
# using pop function
print ("Deleting another column using POP function:")
df.pop('Chemistry Marks')
print(df)

Deleting another column using POP function:
        Name  Maths Marks  Physics Marks
100    Anand           65             75
101    Mohan           90             88
102  Kumaran           85             65
103   Vinoth           50             90
104   Pranav           93             50


### Row Selection

In [18]:
print(df.loc[103])

Name             Vinoth
Maths Marks          50
Physics Marks        90
Name: 103, dtype: object


### Addition of Rows
Pandas dataframe.append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. 

Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.

#### Example 1 : Data Frame with same shape

In [19]:
df1 = pd.DataFrame({
    'Name'  : ['Anand','Mohan','Kumaran','Vinoth','Pranav'],
    'Maths Marks' : [65, 90, 85, 50, 93]    
})
print(df1)
print(df1.shape)

      Name  Maths Marks
0    Anand           65
1    Mohan           90
2  Kumaran           85
3   Vinoth           50
4   Pranav           93
(5, 2)


In [20]:
df2 = pd.DataFrame({
    'Name'  : ['Sathish'],
    'Maths Marks' : [85]    
})
print(df2)
print(df2.shape)

      Name  Maths Marks
0  Sathish           85
(1, 2)


In [21]:
# Appending the data frame
df1.append(df2) 

Unnamed: 0,Name,Maths Marks
0,Anand,65
1,Mohan,90
2,Kumaran,85
3,Vinoth,50
4,Pranav,93
0,Sathish,85


In [22]:
# Ignoring the index 
df1.append(df2, ignore_index = True) 

Unnamed: 0,Name,Maths Marks
0,Anand,65
1,Mohan,90
2,Kumaran,85
3,Vinoth,50
4,Pranav,93
5,Sathish,85


#### Example 2 : Data frame with different shape
For unequal no. of columns in the data frame, non-existent value in one of the dataframe will be filled with NaN values.

In [23]:
df1 = pd.DataFrame({
    'Name'  : ['Anand','Mohan','Kumaran','Vinoth','Pranav'],
    'Maths Marks' : [65, 90, 85, 50, 93]    
})
print(df1)
print(df1.shape)

      Name  Maths Marks
0    Anand           65
1    Mohan           90
2  Kumaran           85
3   Vinoth           50
4   Pranav           93
(5, 2)


In [24]:
df2 = pd.DataFrame({
    'Name'  : ['Sathish'],
    'Maths Marks' : [85],
    'Physics Marks': [90]
})
print(df2)
print(df2.shape)

      Name  Maths Marks  Physics Marks
0  Sathish           85             90
(1, 3)


In [25]:
# Appending the data frame by ignoring the index
df1.append(df2, ignore_index = True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


Unnamed: 0,Maths Marks,Name,Physics Marks
0,65,Anand,
1,90,Mohan,
2,85,Kumaran,
3,50,Vinoth,
4,93,Pranav,
5,85,Sathish,90.0


### Deletion of Rows

In [26]:
print(df1)

      Name  Maths Marks
0    Anand           65
1    Mohan           90
2  Kumaran           85
3   Vinoth           50
4   Pranav           93


In [27]:
# Drop rows by mentioning the index number 
df1 = df1.drop(3)
print(df1)

      Name  Maths Marks
0    Anand           65
1    Mohan           90
2  Kumaran           85
4   Pranav           93
