# Introduction to Pandas
* It provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data.
* Pandas is an open source library in Python.
* It is useful in data manipulation and analysis.
* Pandas library is built on top of NumPy library providing high performance, easy to use data structures and data analysis tools for the python programming language. 


# Installation
```
pip install pandas
```


In [1]:
import pandas as pd

# Pandas Components
* One Dimensional
  * Series
* Multi Dimensional
  * DataFrame



# Pandas Series
* A Pandas Series is a one-dimensional array of indexed data.
* Pandas Series can be thought of as a column in the excel sheet.
```
panda.Series(data, index, dtype, copy)
```

* data : Data can be in the form of ndarray, lists
* index : Values must be hashable and have the same length as data
* dtype : Data type for the series
* copy : To copy the data

# Creating Panda Series from list
NOTE : By default, index ranges from 0 to (n-1) for series of length ‘n’


In [2]:
list_1 = ['a', 'b', 'c', 'd']
ser_1 = pd.Series(data=list_1)
print(ser_1)

0    a
1    b
2    c
3    d
dtype: object


# Creating Panda Series from numpy array






In [3]:
import numpy as np
arr_1 = np.array([1, 2, 3, 4])
ser_2 = pd.Series(arr_1)
print(ser_2)

0    1
1    2
2    3
3    4
dtype: int32


# Setting Index to Series
NOTE : We can also specify the strings as index values

In [4]:
labels = ['a', 'b', 'c', 'd']
list_1  = [1, 2, 3, 4]
ser_1 = pd.Series(data=list_1, index=labels)
print(ser_1)

a    1
b    2
c    3
d    4
dtype: int64


# Creating Panda Series from dictionary
NOTE : The key becomes the row index while the value is the row value at that row index 

In [5]:
dict_1 = {"f_name": "Derek", 
              "l_name": "Banas", 
              "age": 44}
ser_3 = pd.Series(dict_1)
print(ser_3)

f_name    Derek
l_name    Banas
age          44
dtype: object


## Exercise
Write a Pandas program to convert a dictionary to a Pandas series. 

Original dictionary:
{'a': 100, 'b': 200, 'c': 300, 'd': 400, 'e': 800}

# Accessing Series Index and Values

In [6]:
dict_1 = {"f_name": "Derek", 
              "l_name": "Banas", 
              "age": 44}
ser_3 = pd.Series(dict_1)
# getting index names
print(ser_3.index)
# getting values
print(ser_3.values)

Index(['f_name', 'l_name', 'age'], dtype='object')
['Derek' 'Banas' 44]


# Accessing Series Elements
NOTE : Access the element in a series using the index operator ‘[]’

In [7]:
import numpy as np
arr_1 = np.array([1, 2, 3, 4,5,6,7,8,9])
ser_2 = pd.Series(arr_1)
# Retrieve first five elements
print(ser_2[:5])
# Retrieve last five elements
print(ser_2[-5:])



0    1
1    2
2    3
3    4
4    5
dtype: int32
4    5
5    6
6    7
7    8
8    9
dtype: int32


## Use index to access the element

In [8]:
dict_1 = {"f_name": "Derek", 
              "l_name": "Banas", 
              "age": 44}

ser_3 = pd.Series(dict_1)
# getting index names
print(ser_3['age'])

44


## Retrieve multiple elements using a list of indices

In [9]:
dict_1 = {"f_name": "Derek", 
              "l_name": "Banas", 
              "age": 44}

ser_3 = pd.Series(dict_1)
# getting index names
print(ser_3[['f_name','age']])

f_name    Derek
age          44
dtype: object


# Filtering a Series

## Filter the Values

In [10]:
import numpy as np
data=np.array([59, 68, 42, 53, 80])
s = pd.Series(data, index=["a", "b", "c", "d", "e"])
print(s)
print('-'*40)
print(s>50)
print('-'*40)
print(s[s>50])

a    59
b    68
c    42
d    53
e    80
dtype: int32
----------------------------------------
a     True
b     True
c    False
d     True
e     True
dtype: bool
----------------------------------------
a    59
b    68
d    53
e    80
dtype: int32


## Exercise
Write a Pandas program to create a subset of a given series based on value (n) and condition (<).

[0, 1,2,3,4,5,6,7,8,9,10]

n=6

In [11]:
import pandas as pd
s = pd.Series([0, 1,2,3,4,5,6,7,8,9,10])
print("Original Data Series:")
print(s)
print("\nSubset of the above Data Series:")
n = 6
new_s = s[s < n]
print(new_s)


Original Data Series:
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
dtype: int64

Subset of the above Data Series:
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64


# Arithmetic Operations


## Series to single value

In [12]:
import numpy as np
data=np.array([59, 68, 42, 53, 80])
s = pd.Series(data, index=["a", "b", "c", "d", "e"])
# multiplication
print(s*2)
print('-'*40)
# addition
print(s+2)
print('-'*40)
# subtraction
print(s-2)
print('-'*40)
# division
print(s/2)
print('-'*40)

a    118
b    136
c     84
d    106
e    160
dtype: int32
----------------------------------------
a    61
b    70
c    44
d    55
e    82
dtype: int32
----------------------------------------
a    57
b    66
c    40
d    51
e    78
dtype: int32
----------------------------------------
a    29.5
b    34.0
c    21.0
d    26.5
e    40.0
dtype: float64
----------------------------------------


## Series to series

In [13]:
import numpy as np
s1 = pd.Series(np.array([59, 68, 42, 53, 80]), index=["a", "b", "c", "d", "e"])
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])

print(s1)
print('-'*40)
print(s2)
print('-'*40)
# multiplication
print(s1*s2)
print('-'*40)
# addition
print(s1+s2)
print('-'*40)
# subtraction
print(s1-s2)
print('-'*40)
# division
print(s1/s2)


a    59
b    68
c    42
d    53
e    80
dtype: int32
----------------------------------------
a    49
b    58
c    32
d    43
e    70
dtype: int32
----------------------------------------
a    2891
b    3944
c    1344
d    2279
e    5600
dtype: int32
----------------------------------------
a    108
b    126
c     74
d     96
e    150
dtype: int32
----------------------------------------
a    10
b    10
c    10
d    10
e    10
dtype: int32
----------------------------------------
a    1.204082
b    1.172414
c    1.312500
d    1.232558
e    1.142857
dtype: float64


## Series length is not same

In [14]:
import numpy as np
s1 = pd.Series(np.array([59, 68, 42]), index=["a", "b", "c"])
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])
print(s1)
print('-'*40)
print(s2)
print('-'*40)
# multiplication
print(s1*s2)
print('-'*40)
# addition
print(s1+s2)
print('-'*40)
# subtraction
print(s1-s2)
print('-'*40)
# division
print(s1/s2)

a    59
b    68
c    42
dtype: int32
----------------------------------------
a    49
b    58
c    32
d    43
e    70
dtype: int32
----------------------------------------
a    2891.0
b    3944.0
c    1344.0
d       NaN
e       NaN
dtype: float64
----------------------------------------
a    108.0
b    126.0
c     74.0
d      NaN
e      NaN
dtype: float64
----------------------------------------
a    10.0
b    10.0
c    10.0
d     NaN
e     NaN
dtype: float64
----------------------------------------
a    1.204082
b    1.172414
c    1.312500
d         NaN
e         NaN
dtype: float64


## Exercise
Write a Pandas program to add, subtract, multiple and divide two Pandas Series

Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]

In [28]:
import numpy as np
s1 = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
s2 = pd.Series(np.array([1, 3, 5, 7, 9]), index=["a", "b", "c", "d", "e"])

print(s1)
print('-'*40)
print(s2)
print('-'*40)
# multiplication
print(s1*s2)
print('-'*40)
# addition
print(s1+s2)
print('-'*40)
# subtraction
print(s1-s2)
print('-'*40)
# division
print(s1/s2)


a     2
b     4
c     6
d     8
e    10
dtype: int32
----------------------------------------
a    1
b    3
c    5
d    7
e    9
dtype: int32
----------------------------------------
a     2
b    12
c    30
d    56
e    90
dtype: int32
----------------------------------------
a     3
b     7
c    11
d    15
e    19
dtype: int32
----------------------------------------
a    1
b    1
c    1
d    1
e    1
dtype: int32
----------------------------------------
a    2.000000
b    1.333333
c    1.200000
d    1.142857
e    1.111111
dtype: float64


## Exercise
Write a Pandas program to compare the elements of the two Pandas Series. 

Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 10]

In [15]:
import pandas as pd
ds1 = pd.Series([2, 4, 6, 8, 10])
ds2 = pd.Series([1, 3, 5, 7, 10])
print("Series1:")
print(ds1)
print("Series2:")
print(ds2)
print("Compare the elements of the said Series:")
print("Equals:")
print(ds1 == ds2)
print("Greater than:")
print(ds1 > ds2)
print("Less than:")
print(ds1 < ds2)


Series1:
0     2
1     4
2     6
3     8
4    10
dtype: int64
Series2:
0     1
1     3
2     5
3     7
4    10
dtype: int64
Compare the elements of the said Series:
Equals:
0    False
1    False
2    False
3    False
4     True
dtype: bool
Greater than:
0     True
1     True
2     True
3     True
4    False
dtype: bool
Less than:
0    False
1    False
2    False
3    False
4    False
dtype: bool


# Series Ranking and Sorting
* The rank() method, by default, returns the ranking in ascending order .

In [16]:
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])
print(s2.rank())

a    3.0
b    4.0
c    1.0
d    2.0
e    5.0
dtype: float64


### sort_values()

In [17]:
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])
print(s2.sort_values(ascending=True, na_position='last'))

c    32
d    43
a    49
b    58
e    70
dtype: int32


## descending order

In [18]:
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])
print(s2.sort_values(ascending=False, na_position='first'))

e    70
b    58
a    49
d    43
c    32
dtype: int32


## with nan value

In [19]:
s2 = pd.Series(np.array([49, 58, 32, np.nan, np.nan]), index=["a", "b", "c", "d", "e"])
print(s2.sort_values(ascending=True, na_position='first'))

d     NaN
e     NaN
c    32.0
a    49.0
b    58.0
dtype: float64


## sorting index

In [20]:
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])
print(s2.sort_index(ascending=False))

e    70
d    43
c    32
b    58
a    49
dtype: int32


# Checking Null Values
* The isnull() method returns the boolean output indicating the presence of null values
* ‘True’ value indicates that the corresponding value is null

In [21]:
s2 = pd.Series(np.array([49, 58, 32, np.nan, np.nan]), index=["a", "b", "c", "d", "e"])
print(s2.isnull())

a    False
b    False
c    False
d     True
e     True
dtype: bool


* The notnull() method returns the boolean output indicating the presence of non-null values
* ‘False’ in the output indicates that the corresponding value is null

In [22]:
s2 = pd.Series(np.array([49, 58, 32, np.nan, np.nan]), index=["a", "b", "c", "d", "e"])
print(s2.notnull())

a     True
b     True
c     True
d    False
e    False
dtype: bool


## Exercise
Changing null values to zero

[49, 58, 32, np.nan, np.nan]

[49, 58, 32, 0, 0]

In [23]:
s2 = pd.Series(np.array([49, 58, 32, np.nan, np.nan]))
s2[s2.isnull()]=0
print(s2)

0    49.0
1    58.0
2    32.0
3     0.0
4     0.0
dtype: float64


# Append a Series
* The append() methods are used to append a series
* Creates a new series by appending a series with another series

### append()

In [30]:
s1 = pd.Series(np.array([59, 68, 42, 53, 80]), index=["a", "b", "c", "d", "e"])
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])

print(s1.append(s2))
print (s1)
print(s2)

a    59
b    68
c    42
d    53
e    80
a    49
b    58
c    32
d    43
e    70
dtype: int32
a    59
b    68
c    42
d    53
e    80
dtype: int32
a    49
b    58
c    32
d    43
e    70
dtype: int32


  print(s1.append(s2))


## Concat

In [32]:
s1 = pd.Series(np.array([59, 68, 42, 53, 80]), index=["a", "b", "c", "d", "e"])
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])

print(pd.concat([s1, s2]))
print (s1)
print(s2)

a    59
b    68
c    42
d    53
e    80
a    49
b    58
c    32
d    43
e    70
dtype: int32
a    59
b    68
c    42
d    53
e    80
dtype: int32
a    49
b    58
c    32
d    43
e    70
dtype: int32


### ignore index
* Ignores the index labels of original series

In [33]:
s1 = pd.Series(np.array([59, 68, 42, 53, 80]), index=["a", "b", "c", "d", "e"])
s2 = pd.Series(np.array([49, 58, 32, 43, 70]), index=["a", "b", "c", "d", "e"])

pd.concat([s1,s2],ignore_index=True)

0    59
1    68
2    42
3    53
4    80
5    49
6    58
7    32
8    43
9    70
dtype: int32

## Exercise
Write a Pandas program to add some data to an existing Series.

original series : ['100', '200', 'python', '300.12', '400']

append : ['500', 'php']


In [42]:
import pandas as pd
s = pd.Series(['100', '200', 'python', '300.12', '400'])
print("Original Data Series:")
print(s)
t =  pd.Series(['500', 'php'])
print("\nAdditional Data Deries:")
print(t)
new_s = pd.concat([s,t])
print("\nData Series after adding concat:")
print(new_s)


Original Data Series:
0       100
1       200
2    python
3    300.12
4       400
dtype: object

Additional Data Deries:
0    500
1    php
dtype: object

Data Series after adding concat:
0       100
1       200
2    python
3    300.12
4       400
0       500
1       php
dtype: object
