# Pandas Tutorial 2: Dataframe Basics

This tutorial introduces the basics of DataFrames in Pandas. A DataFrame is the primary data structure in Pandas, used to represent tabular data with rows and columns. 

**Topics covered:**
- Creating a DataFrame
- Manipulating Rows and Columns
- Performing Operations: `min`, `max`, `std`, `describe`
- Conditional Selection
- Using `set_index`

In [1]:
import pandas as pd

In [2]:
# Reading data in the form of a CSV file
df_1 = pd.read_csv("weather_data.csv")
df_1

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [99]:
weather_data = {
    'day': ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017'],
    'temperature': [32,35,28,24,32,31],
    'windspeed': [6,7,2,7,4,2],
    'event': ['Rain', 'Sunny', 'Snow', 'Snow', 'Rain', 'Sunny']
}
# Reading data in the form of a dictionary 
df = pd.DataFrame(weather_data)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [100]:
# Returns tuple of (rows, columns) wrt DataFrame
df.shape

(6, 4)

In [101]:
# Assigns tuple values into variables `rows` and `columns`
rows, columns = df.shape

In [102]:
columns

4

In [103]:
df.head()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


In [104]:
# Selects rows 2, 3, and 4 (NOT 5) 
df[2:5] 

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


In [107]:
# First 2 rows of df
df.head(2)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny


In [108]:
# Last 2 rows of df
df.tail(2)

Unnamed: 0,day,temperature,windspeed,event
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [109]:
# Column labels of df
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

In [110]:
df['event'] # Access 'event' column of df
# df.event  # An alternative syntax

0     Rain
1    Sunny
2     Snow
3     Snow
4     Rain
5    Sunny
Name: event, dtype: object

In [113]:
# Return a new DataFrame containing only `event`, `day`, and `temperature` columns from df
df[['event','day','temperature']]

Unnamed: 0,event,day,temperature
0,Rain,1/1/2017,32
1,Sunny,1/2/2017,35
2,Snow,1/3/2017,28
3,Snow,1/4/2017,24
4,Rain,1/5/2017,32
5,Sunny,1/6/2017,31


In [114]:
# Maximum value from `temperature` column of df
df['temperature'].max()

35

In [115]:
# Minimum value from `temperature` column of df
df['temperature'].min()

24

In [116]:
# Std Deviation of values in `temperature` column of df
df['temperature'].std()

3.8297084310253524

In [117]:
# Generates descriptive statistics for the numerical columns in df
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


In [118]:
# Conditional Selection: `temperature` has values >= 32 (Returns new df) 
df[df.temperature>=32]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
4,1/5/2017,32,4,Rain


In [120]:
# Conditional Selection: `temperature` has maximum value (Returns new df) 
df[df.temperature==df['temperature'].max()]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2017,35,7,Sunny


In [121]:
# Conditional Selection: `temperature` has maximum value (Returns new df)
# Filter only the `day` and `temperature` for the above
df[['day','temperature']][df.temperature==df['temperature'].max()]

Unnamed: 0,day,temperature
1,1/2/2017,35


In [122]:
# Returns index (row labels) of df
df.index

RangeIndex(start=0, stop=6, step=1)

In [124]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [126]:
df.set_index('day', inplace=True)
# Sets the `day` column as the index of the DataFrame (modifies original df)

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain
1/6/2017,31,2,Sunny


Now, the dates (e.g., **1/1/2017**, **1/2/2017**) are used as the index of the DataFrame df

**Note:** The `set_index` function by default returns a new DataFrame, but using `inplace=True` modifies the original DataFrame directly

In [127]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [None]:
# The row of df where the index is '1/4/2017'
df.loc['1/4/2017']

In [128]:
# Restores the default integer index
df.reset_index(inplace=True)
df

Unnamed: 0,index,day,temperature,windspeed,event
0,0,1/1/2017,32,6,Rain
1,1,1/2/2017,35,7,Sunny
2,2,1/3/2017,28,2,Snow
3,3,1/4/2017,24,7,Snow
4,4,1/5/2017,32,4,Rain
5,5,1/6/2017,31,2,Sunny


In [129]:
# Set event (Rain, Sunny, Snow) as the index
df.set_index('event',inplace=True)
df

Unnamed: 0_level_0,index,day,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Rain,0,1/1/2017,32,6
Sunny,1,1/2/2017,35,7
Snow,2,1/3/2017,28,2
Snow,3,1/4/2017,24,7
Rain,4,1/5/2017,32,4
Sunny,5,1/6/2017,31,2


In [130]:
# The row of df where the index is 'Snow'
df.loc['Snow']

Unnamed: 0_level_0,index,day,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Snow,2,1/3/2017,28,2
Snow,3,1/4/2017,24,7
