### Working with Pandas

content of the csv file (file.csv) is as follows:
```
name,city,happiness(0-10),height(cm),weight(kg)
John,Kolkata,7,182,72.2
Michael,Delhi,6,168,65.8
David,Mumbai,8,163,59.9
Sarah,Chennai,9,155,52.7
Daniel,Kolkata,7,179,71.1
Emily,Delhi,6,172,66.3
Olivia,Mumbai,8,158,61.7
Ethan,Chennai,9,147,48.9
Sophia,Kolkata,7,183,73.5
Matthew,Delhi,6,169,66.9
Karen,Mumbai,8,161,59.1
James,Chennai,9,152,51.3
Zoe,Kolkata,7,177,70.8
Logan,Delhi,6,173,65.1
Hannah,Mumbai,8,159,61.3
Liam,Chennai,9,148,49.8
Emma,Kolkata,7,181,72.7
Ava,Delhi,6,170,66.1
Noah,Mumbai,8,162,60.3
Mia,Chennai,9,153,52.5
Benjamin,Kolkata,7,178,70.1
Aria,Delhi,6,171,65.5
William,Mumbai,8,160,59.7
Grace,Chennai,9,150,50.1

```

In [56]:
# read csv file
import pandas as pd

df = pd.read_csv('file.csv')

In [57]:
# access a column of data
df['city'].head() # head is used to display the first 5 rows

0    Kolkata
1      Delhi
2     Mumbai
3    Chennai
4    Kolkata
Name: city, dtype: object

In [58]:
# access multiple column of data
df[['name', 'city']].head()

Unnamed: 0,name,city
0,John,Kolkata
1,Michael,Delhi
2,David,Mumbai
3,Sarah,Chennai
4,Daniel,Kolkata


**iloc vs loc**
- iloc is used to access rows and columns by integer index -eg. 0
- loc is used to access rows and columns by label -eg. city

In [59]:
# access a row of data
df.iloc[0] # access the first row

name                  John
city               Kolkata
happiness(0-10)          7
height(cm)             182
weight(kg)            72.2
Name: 0, dtype: object

In [60]:
# Access multiple rows
df.iloc[0:2] # access the first 5 rows

Unnamed: 0,name,city,happiness(0-10),height(cm),weight(kg)
0,John,Kolkata,7,182,72.2
1,Michael,Delhi,6,168,65.8


In [61]:
# Access a specific cell
df.loc[0, 'city'] # the city of the first row

'Kolkata'

In [62]:
# Access multiple cells
df.loc[0:2, ['name', 'city']] # the name and city of the first 3 rows

Unnamed: 0,name,city
0,John,Kolkata
1,Michael,Delhi
2,David,Mumbai


In [63]:
# Select rows based on a condition
new_df = df[df['height(cm)'] > 155].head() # select rows where age is greater than 25
new_df_2 = df[df['city']=='Kolkata'].head() # select rows where city is Kolkata
new_df, new_df_2

(      name     city  happiness(0-10)  height(cm)  weight(kg)
 0     John  Kolkata                7         182        72.2
 1  Michael    Delhi                6         168        65.8
 2    David   Mumbai                8         163        59.9
 4   Daniel  Kolkata                7         179        71.1
 5    Emily    Delhi                6         172        66.3,
       name     city  happiness(0-10)  height(cm)  weight(kg)
 0     John  Kolkata                7         182        72.2
 4   Daniel  Kolkata                7         179        71.1
 8   Sophia  Kolkata                7         183        73.5
 12     Zoe  Kolkata                7         177        70.8
 16    Emma  Kolkata                7         181        72.7)

In [64]:
# select row based on multiple conditions
new_df_3 = df[(df['height(cm)'] >= 155) & (df['height(cm)'] < 175)].head() # selecting a value between 2 numbers is considered as a multiple condition
new_df_4 = df[(df['city']=='Kolkata') | (df['city']=='Mumbai')].head() # using the or operator
new_df_3, new_df_4

(      name     city  happiness(0-10)  height(cm)  weight(kg)
 1  Michael    Delhi                6         168        65.8
 2    David   Mumbai                8         163        59.9
 3    Sarah  Chennai                9         155        52.7
 5    Emily    Delhi                6         172        66.3
 6   Olivia   Mumbai                8         158        61.7,
      name     city  happiness(0-10)  height(cm)  weight(kg)
 0    John  Kolkata                7         182        72.2
 2   David   Mumbai                8         163        59.9
 4  Daniel  Kolkata                7         179        71.1
 6  Olivia   Mumbai                8         158        61.7
 8  Sophia  Kolkata                7         183        73.5)

In [65]:
# get the unique values of a column
df['city'].unique()

array(['Kolkata', 'Delhi', 'Mumbai', 'Chennai'], dtype=object)

In [66]:
# 2d dataframe array can be converted into a python dictionary
dict(df.head())
dict(df.head())['name']

0       John
1    Michael
2      David
3      Sarah
4     Daniel
Name: name, dtype: object

In [67]:
# 1d dataframe array (eg. a column, row) can be converted into a python list or a dictionary
list(df['name'].head()) # column list
list(df.iloc[0]) # row list
dict(df.iloc[0])

{'name': 'John',
 'city': 'Kolkata',
 'happiness(0-10)': 7,
 'height(cm)': 182,
 'weight(kg)': 72.2}