# Python: Pandas Tutorial

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

Lets Learn some basic concepts of Pandas.

# Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

Lets create a basic series to understand this.

If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1].

In [1]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
s

0    a
1    b
2    c
3    d
dtype: object

We passed the index values here. Now we can see the customized indexed values in the output.

# Create a Series from dict

A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

# Import the pandas library and aliasing as pd

mport pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data,index=['b','c','d','a']) s

Note: Index order is persisted and the missing element is filled with NaN (Not a Number).

# Accessing Data from Series with Position

Data in the series can be accessed similar to that in an ndarray.

For example, Lets retrieve the first element. As we already know, the counting starts from zero for the array, which means the first element is stored at zeroth position and so on.

In [2]:
import pandas as pd
c = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element
c[0]

1

We can also retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index)

In [3]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element
s[:3]

a    1
b    2
c    3
dtype: int64

similarly, we can retrieve the last three elements.

In [4]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the last three element
s[-3:]

c    3
d    4
e    5
dtype: int64

# DataFrame

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.



<b>Creating a DataFrame</b>



A pandas DataFrame can be created using various inputs like −

1.Lists

2.dict

3.Series

3.Numpy ndarrays

5.Another DataFrame

we will see how to create a DataFrame using these inputs.

The DataFrame can be created using a single list or a list of lists.

In [6]:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
df


Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


Lets look at another example.

In [7]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
df

Unnamed: 0,Name,Age
0,Alex,10.0
1,Bob,12.0
2,Clarke,13.0


Observe, the dtype parameter changes the type of Age column to floating point.

# Column Selection

We will understand this by selecting a column from the DataFrame.

In [9]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
      'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df ['one']

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

Here, we selected a specific column named 'one' from the list of columns we created.

# Column Addition, Deletion

We will now learn how to add and delete a columnin Panda thorugh the following examples.

In [10]:
#Column Addition

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
      'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing new series

print ("Adding a new column by passing as Series:")
df['three']=pd.Series([20,30,50],index=['a','b','c'])
df

print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

df

Adding a new column by passing as Series:
Adding a new column using the existing columns in DataFrame:


Unnamed: 0,one,two,three,four
a,1.0,1,20.0,21.0
b,2.0,2,30.0,32.0
c,3.0,3,50.0,53.0
d,,4,,


# Row Selection, Addition, and Deletion

We will now understand row selection, addition and deletion through examples. Let us begin with the concept of selection.

In [11]:
#Selection

#Rows can be selected by passing row label to a loc function.
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df.loc['b']

one    2.0
two    2.0
Name: b, dtype: float64

The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.

In [12]:
#Rows can be selected by passing integer location to an iloc function.
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df.iloc[2]

one    3.0
two    3.0
Name: c, dtype: float64

<b>Slicing Rows</b>

Multiple rows can be selected using ‘ : ’ operator. Lets understand this by using an example.

In [13]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
df[2:4]

Unnamed: 0,one,two
c,3.0,3
d,,4


In [14]:
#Row Addition

#This function will append the rows at the end. It Adds new rows to a DataFrame using the append function.  Example for the same is below

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
df

Unnamed: 0,a,b
0,1,2
1,3,4
0,5,6
1,7,8


In [15]:
#Deletion of Rows
# We can use an index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows will be dropped.
#If you observe, in the above example, the labels are duplicate. Let us drop a label and will see how many rows will get dropped.

#Lets see an example

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0
df = df.drop(0)

df

Unnamed: 0,a,b
1,3,4
1,7,8


In the above example, two rows were dropped because those two contain the same label 0.

Thats all for now in this tutorial. I hope the concepts were easy to understand.

Reference: https://www.tutorialspoint.com https://www.google.com