### About the Dataset

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able to rent a bike from any location where there is a kiosk and return to a different kiosk. 

#### Data Attributes
<ul>
<li>datetime - hourly + timestamp</li>
<li>season - 1 = spring, 2 = summer, 3 = fall, 4 = winter</li>
<li>holiday - whether the day is considered a holiday</li>
<li>workingday - whether the day is neither a weekend nor holiday </li>
<li>weather -  
<ul><li>1: clear, few clouds, partly cloudy</li>
<li>2: Mist + cloudy, Mist + broken clouds, Mist + few clouds, Mist</li>
<li>3: light snow, light rain, thnderstorm, scattered clouds </li>
<li> heavy rain, ice pallets, snow + fog</li>
</ul>
</li>
<li>temp - temperature in Celsius</li>
<li>atemp - "feels like" temperature in Celcius</li>
<li>humidity - relative humidity</li>
<li>windspeed - wind speed</li>
<li>casual - number of non-registered user rentals initiated</li>
<li>registered - number of registered user rentals initiated</li>
<li>count - number of total rentals </li>

</ul>

In [9]:
import pylab
import calendar
import numpy as np
import pandas as pd
import seaborn as sn
from scipy import stats
import missingno as msno
from datetime import datetime
import matplotlib.pyplot as plt
import warnings
pd.options.mode.chained_assignment = None
warnings.filterwarnings("ignore", category = DeprecationWarning)
%matplotlib inline

In [10]:
#get the data
dailyData = pd.read_csv("train.csv")

In [11]:
#shape of the data
dailyData.shape

(10886, 12)

In [13]:
#data sample
dailyData.head(5)

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


In [14]:
#data types
dailyData.dtypes

datetime       object
season          int64
holiday         int64
workingday      int64
weather         int64
temp          float64
atemp         float64
humidity        int64
windspeed     float64
casual          int64
registered      int64
count           int64
dtype: object

As we see from the above results, the columns "season","holiday","workingday" and "weather" should be of "categorical" data type.But the current data type is "int" for those columns. Let us transform the dataset in the following ways so that we can get started up with our EDA

Create new columns "date,"hour","weekDay","month" from "datetime" column.

Coerce the datatype of "season","holiday","workingday" and weather to category.

Drop the datetime column as we already extracted useful features from it.

#### creating new columns from "Datetime" column

In [22]:
dailyData["date"] = dailyData.datetime.apply(lambda x: x.split()[0])

dailyData["hour"] = dailyData.datetime.apply(lambda x: x.split()[1].split(":")[0])

dailyData["weekday"] = dailyData.date.apply(lambda dateString : calendar.day_name[datetime.strptime(dateString, "%Y-%m-%d").weekday()])

dailyData["month"] = dailyData.date.apply(lambda dateString : calendar.month_name[datetime.strptime(dateString, "%Y-%m-%d").month])

dailyData["season"] = dailyData.season.map({1: "Spring", 2 : "Summer", 3 : "Fall", 4 :"Winter" })

dailyData["weather"] = dailyData.weather.map({1: " Clear + Few clouds + Partly cloudy + Partly cloudy",\
                                        2 : " Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist ", \
                                        3 : " Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds", \
                                        4 :" Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog " })

In [23]:
### Coercing to Category Type

categoryVariableList = ["hour", "weekday", "month", "season", "weather", "holiday", "workingday"]

for var in categoryVariableList:
    dailyData[var] = dailyData[var].astype("category")


In [24]:
### Dropping unnecessay columns

dailyData = dailyData.drop(["datetime"], axis = 1)