# Introduction to pandas Data Structures

## Series

**A Series is a one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its index.**

In [301]:
import pandas as pd
import numpy as np

In [302]:
obj = pd.Series([1,-2,0,-5])
#obj = pd.Series([1,-2,0,-5],index = ['a','b','c','d']) #with indexing

In [303]:
obj.values,obj.index

(array([ 1, -2,  0, -5], dtype=int64), RangeIndex(start=0, stop=4, step=1))

In [304]:
obj

0    1
1   -2
2    0
3   -5
dtype: int64

In [305]:
obj[0]

1

In [306]:
obj.index = ['a','b','c','d']

In [307]:
obj.values

array([ 1, -2,  0, -5], dtype=int64)

In [308]:
obj.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [309]:
obj

a    1
b   -2
c    0
d   -5
dtype: int64

In [310]:
obj['a'],obj.a

(1, 1)

In [311]:
obj['a':'c']

a    1
b   -2
c    0
dtype: int64

Using NumPy functions or NumPy-like operations, such as filtering with a boolean
array, scalar multiplication, or applying math functions, will preserve the index-value

In [312]:
obj.values

array([ 1, -2,  0, -5], dtype=int64)

In [313]:
obj2 = pd.Series([1,2,-2,0],index=['c','a','b','d'])

In [314]:
obj[ obj2 > 0 ]

a    1
c    0
dtype: int64

same length and index should contain on the both array. if indexes are scattered there is no problem.

In [315]:
obj2 * 2

c    2
a    4
b   -4
d    0
dtype: int64

In [316]:
'd' in obj2

True

In [317]:
np.exp(obj2)

c    2.718282
a    7.389056
b    0.135335
d    1.000000
dtype: float64

- **Python dictionary can be passed to the Series**

In [318]:
dic = {'Name':'Towhid','ID':17101135,'Roll':135}

In [319]:
Sdata1 = pd.Series(dic)
Sdata1

Name      Towhid
ID      17101135
Roll         135
dtype: object

In [320]:
Sdata1.Name,Sdata1['ID']

('Towhid', 17101135)

- when we are passing a dic in series, the keys will appear in a sorted way in the series. But we can order them by passing index to the series. if we directly gives the index name like index= ['firstName','Reg','Roll'] it will replace the name. if we pass a list of name then it will check the keys name to the dic and if not present then it will set a Nan value for it.

In [321]:
states = ['firstName','Reg','Roll']
Sdata = pd.Series(dic,index=states)
Sdata

firstName    NaN
Reg          NaN
Roll         135
dtype: object

In [322]:
Sdata.isnull().value_counts()

True     2
False    1
dtype: int64

- **It automaically aligns by index label in arithmetic operation** only key named 'Roll' is present on the both series so thats why Roll keys value is added.

In [323]:
Sdata1 + Sdata

ID           NaN
Name         NaN
Reg          NaN
Roll         270
firstName    NaN
dtype: object

# DataFrame

## Dataframe creation

In order to construct a dataframe we need to create a dictionary of equal length list or numpy array

In [324]:
data = {  'Name': ['Towhid','Hasib','Tamjid','Sadi','Nahian'],'ID':[17101135,17101118,17101152,17101150,17101114],
        'Roll':[135,118,152,150,114],'Year':['4th','4th','4th','4th','4th'],'Sex':['Male','Male','Male','Male','Male']
        }

In [325]:
df1 = pd.DataFrame(data)
df1

Unnamed: 0,Name,ID,Roll,Year,Sex
0,Towhid,17101135,135,4th,Male
1,Hasib,17101118,118,4th,Male
2,Tamjid,17101152,152,4th,Male
3,Sadi,17101150,150,4th,Male
4,Nahian,17101114,114,4th,Male


In [326]:
df = pd.DataFrame( [ ['Towhid',17101135,135,'4th','Male'], ['Hasib',17101118,118,'4th','Male'],
                     ['Tamjid',171011152,152,'4th','Male'], ['Sadi',17101150,150,'4th','Male'],
                     ['Nahian',17101114,114,'4th','Male'],['Sabuj',17101139,139,'4th','Male']
                      ])
df.columns = ['Name','ID','Roll','Year','Sex']
df.index = ['Student1','Student2','Student3','Student4','Student5','Student6']
df

Unnamed: 0,Name,ID,Roll,Year,Sex
Student1,Towhid,17101135,135,4th,Male
Student2,Hasib,17101118,118,4th,Male
Student3,Tamjid,171011152,152,4th,Male
Student4,Sadi,17101150,150,4th,Male
Student5,Nahian,17101114,114,4th,Male
Student6,Sabuj,17101139,139,4th,Male


In [327]:
df.head() #by default it shows first five rows.

Unnamed: 0,Name,ID,Roll,Year,Sex
Student1,Towhid,17101135,135,4th,Male
Student2,Hasib,17101118,118,4th,Male
Student3,Tamjid,171011152,152,4th,Male
Student4,Sadi,17101150,150,4th,Male
Student5,Nahian,17101114,114,4th,Male


In [328]:
df.tail()

Unnamed: 0,Name,ID,Roll,Year,Sex
Student2,Hasib,17101118,118,4th,Male
Student3,Tamjid,171011152,152,4th,Male
Student4,Sadi,17101150,150,4th,Male
Student5,Nahian,17101114,114,4th,Male
Student6,Sabuj,17101139,139,4th,Male


## Accessing (row,col) values

In [329]:
df.Name

Student1    Towhid
Student2     Hasib
Student3    Tamjid
Student4      Sadi
Student5    Nahian
Student6     Sabuj
Name: Name, dtype: object

In [330]:
df['Roll']

Student1    135
Student2    118
Student3    152
Student4    150
Student5    114
Student6    139
Name: Roll, dtype: int64

In [331]:
df[ df.ID >= 17101150 ]

Unnamed: 0,Name,ID,Roll,Year,Sex
Student3,Tamjid,171011152,152,4th,Male
Student4,Sadi,17101150,150,4th,Male


In [332]:
df['Sex'] = 'male'
df.head(2)

Unnamed: 0,Name,ID,Roll,Year,Sex
Student1,Towhid,17101135,135,4th,male
Student2,Hasib,17101118,118,4th,male


### assigning nparray to column

In [340]:
df.Roll = np.arange(6)
df

Unnamed: 0,Name,ID,Roll,Year,Sex,Greater
Student1,Towhid,17101135,0,4th,male,0.0
Student2,Hasib,17101118,1,4th,male,0.0
Student3,Tamjid,171011152,2,4th,male,1.0
Student4,Sadi,17101150,3,4th,male,1.0
Student5,Nahian,17101114,4,4th,male,0.0
Student6,Sabuj,17101139,5,4th,male,0.0


In [333]:
df[ ['Name','Roll'] ] #can not access multiple col for single row.

Unnamed: 0,Name,Roll
Student1,Towhid,135
Student2,Hasib,118
Student3,Tamjid,152
Student4,Sadi,150
Student5,Nahian,114
Student6,Sabuj,139


### df.loc ( value assign, easy (row,col) access)

In [334]:
df.loc['Student1'][ ['Name','Roll'] ]

Name    Towhid
Roll       135
Name: Student1, dtype: object

In [335]:
df.loc['Student1',['Name','Roll'] ]

Name    Towhid
Roll       135
Name: Student1, dtype: object

In [336]:
#create a new col and assign 1 to the conditional position
df.loc[ df.ID >= 17101150,'Greater' ] = 1 

In [337]:
df

Unnamed: 0,Name,ID,Roll,Year,Sex,Greater
Student1,Towhid,17101135,135,4th,male,
Student2,Hasib,17101118,118,4th,male,
Student3,Tamjid,171011152,152,4th,male,1.0
Student4,Sadi,17101150,150,4th,male,1.0
Student5,Nahian,17101114,114,4th,male,
Student6,Sabuj,17101139,139,4th,male,


In [338]:
df.loc[df.Greater.isnull(),'Greater'] = 0
df

Unnamed: 0,Name,ID,Roll,Year,Sex,Greater
Student1,Towhid,17101135,135,4th,male,0.0
Student2,Hasib,17101118,118,4th,male,0.0
Student3,Tamjid,171011152,152,4th,male,1.0
Student4,Sadi,17101150,150,4th,male,1.0
Student5,Nahian,17101114,114,4th,male,0.0
Student6,Sabuj,17101139,139,4th,male,0.0


### del col

In [341]:
del df['Greater']
df.head(1)

Unnamed: 0,Name,ID,Roll,Year,Sex
Student1,Towhid,17101135,0,4th,male


### Dataframe of nested dic

**If we create nested dic as a data of dataframe. pandas will interpret the outer dict keys
as the columns and the inner keys as the row indices**

In [343]:
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

In [345]:
popdf = pd.DataFrame(pop)
popdf

Unnamed: 0,Nevada,Ohio
2000,,1.5
2001,2.4,1.7
2002,2.9,3.6
