### What is a dataframe?
The most important thing in panda. It is a tabular representation of the data in the form rows and columns.

### Topics Covered - 
1. Creating Dataframe
2. Dealing with rows and columns
3. Operations: min, max, std, describe
4. Conditional selection
5. set_index

### Creating Dataframe

In [63]:
import pandas as pd
df = pd.read_csv("data/weather_data.csv")
df # or df[:] to print everything

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [64]:
weather_data = {
    "day": ['1/1/2017', '1/2/2017', '1/3/2017', '1/4/2017', '1/5/2017', '1/6/2017'],
    "temperature": [32, 35, 28, 24, 32, 31],
    "windspeed": [6, 7, 2, 7, 4, 2],
    "event": ["Rain", "Sunny", "Snow", "Snow", "Rain", "Sunny"]
}

new_df = pd.DataFrame(weather_data)
new_df

Unnamed: 0,day,event,temperature,windspeed
0,1/1/2017,Rain,32,6
1,1/2/2017,Sunny,35,7
2,1/3/2017,Snow,28,2
3,1/4/2017,Snow,24,7
4,1/5/2017,Rain,32,4
5,1/6/2017,Sunny,31,2


In [65]:
df.shape # Gives number of rows and columns

(6, 4)

In [66]:
rows, cols = df.shape

In [67]:
rows

6

In [68]:
cols

4

In [69]:
# To print only few rows
df.head() # Prints first 5 rows

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


In [70]:
df.head(2)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny


In [71]:
df.tail() # Prints last 5 rows

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [72]:
df[2:5] # To print 2-4 rows

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


In [73]:
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

In [74]:
df.day

0    1/1/2017
1    1/2/2017
2    1/3/2017
3    1/4/2017
4    1/5/2017
5    1/6/2017
Name: day, dtype: object

In [75]:
df.temperature

0    32
1    35
2    28
3    24
4    32
5    31
Name: temperature, dtype: int64

In [76]:
new_df.event

0     Rain
1    Sunny
2     Snow
3     Snow
4     Rain
5    Sunny
Name: event, dtype: object

In [77]:
type(df["event"])

pandas.core.series.Series

In [78]:
type(new_df["event"])

pandas.core.series.Series

In [79]:
df[["event", "day", "temperature"]]

Unnamed: 0,event,day,temperature
0,Rain,1/1/2017,32
1,Sunny,1/2/2017,35
2,Snow,1/3/2017,28
3,Snow,1/4/2017,24
4,Rain,1/5/2017,32
5,Sunny,1/6/2017,31


In [80]:
df["temperature"].mean()

30.333333333333332

In [81]:
df.describe() # Shows the statistics of the numerical data

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


### Conditional Selection

In [82]:
df[df["temperature"] >= 32] # Condition comes inside the square brackets

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
4,1/5/2017,32,4,Rain


In [83]:
df[df.temperature == df.temperature.max()]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny


In [84]:
df["day"][df.temperature == df.temperature.max()]

1    1/2/2017
Name: day, dtype: object

In [85]:
df[["day", "temperature"]][df.temperature == df.temperature.max()]

Unnamed: 0,day,temperature
1,1/2/2017,35


### Set Index

In [86]:
df.index # stop index is not included

RangeIndex(start=0, stop=6, step=1)

In [90]:
df.set_index("day", inplace=True)
df

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain
1/6/2017,31,2,Sunny


In [91]:
df.loc["1/3/2017"]

temperature      28
windspeed         2
event          Snow
Name: 1/3/2017, dtype: object

In [88]:
df.reset_index(inplace=True)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [92]:
df.set_index("event", inplace=True)
df

Unnamed: 0_level_0,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1
Rain,32,6
Sunny,35,7
Snow,28,2
Snow,24,7
Rain,32,4
Sunny,31,2


In [93]:
df.loc["Snow"]

Unnamed: 0_level_0,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1
Snow,28,2
Snow,24,7
