# Exploring the Bike Sharing DataSet

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

# help in creating the graphs in the notebook otherwise they will pop up

import numpy as np
import pandas as pd
import seaborn as sns

In [None]:
train_data = pd.read_csv('train.csv')
train_data.head()

In [None]:
train_data.shape

# Attribute Information:

- instant: record index
- season : season (1:spring, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1: 2012)
- mnth : month (1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not (extracted from [Web Link])
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
+ weathersit : 
    - 1: Clear, Few clouds, Partly cloudy
    - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered

In [None]:
train_data.info()

In [None]:
train_data.describe().T

# Univariate Analysis

## Continuous Data

In [None]:
sns.displot(train_data['temp'], bins = 50, kde = True).set(title = "Distribution of Temperature");

Everytime the bike is shared, the temp is noted.


##### Most people won't rent a bike when it is too hot or too cold. More bikes are rented when the temperature is pleasant.
##### High number of bikes are rented when the temp is around 0.35 and 0.60

## Categorical Variable

In [None]:
train_data['season'].unique()

In [None]:
train_data['season'].value_counts() 

# 1:spring, 2:summer, 3:fall, 4:winter

In [None]:
print (3980 + 4409) # season 1 and 2
print (2512 + 2134) # season 3 and 4

# 1:spring, 2:summer, 3:fall, 4:winter

In [None]:
sns.countplot(y = train_data.season, palette = 'Set2').set(title = "Seasons");
# y - horizontal bar

#### Season 1 and 2 have twice the number of instances of Bike Sharing than season 3 and 4. This, intuitively, makes sense as season 1 & 2 are spring and summer which would be pleasant and season 3 & 4 would be fall & winter which will be cold.

# Bivariate Analysis

## Continuous & Continuous

### Scatter Plot

In [None]:
plot = sns.jointplot(x = train_data.temp, y = train_data.atemp, kind = 'scatter');
plot.fig.suptitle("Temp Vs Atemp");

#### Temp and atemp have a positive linear relation

### Correlation

In [None]:
train_data.temp.corr(train_data.atemp)
np.round(train_data.temp.corr(train_data.atemp),2)

## Categorical & Continuous

### Boxplots of Continuous Variable over the Categories of Categorical Variable

In [None]:
sns.boxplot(y = train_data.cnt, x = train_data.season, palette ='Set2').set(title = "Total # Bikes Rented By Seasons");

# 1:spring, 2:summer, 3:fall, 4:winter

**The cnt column is an important column for analysis and the outliers are good for the business because we are working with Bike Sharing Data. So, we will not treat them.**

## Categorical & Categorical

### Pivot Tables And Stacked Bar Chart

In [None]:
train_data.head()

In [None]:
# Average number of bikes rented hourly on a working and a non-workingday

tbl = train_data.pivot_table(columns = "workingday", index = "hr", values = "cnt", aggfunc = "mean")
tbl = round (tbl)
tbl

# col of table = the categories of workingday
# rows of the tables = index === hr 
# data in the table is aggfunc applied to the 'cnt' col === mean(cnt)

In [None]:
tbl.plot(kind = 'bar', stacked = True);

##### Shows us hourly distribution of count of rented bikes. 
- In the hours 0-6, when people will be sleeping, we have low amount of rented bikes. 
- Around 7th to 9th hour and 17th to 19th hour, we see a hike in the number of biked rented. This would be the hours when people go and come back from work on a working day.