# Seoul Bike Sharing Demand Dataset (by [UCI](https://archive.ics.uci.edu/ml/index.php))

The original dataset taken from the project's page on the UCI website which can be found here: <https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand/>

The description for the dataset's fields can be found at the dataset's page

## Importing Packages

In [None]:
import numpy as np
import pandas as pd

## Downloading Dataset

In [None]:
import os
import subprocess
import requests
import tqdm.notebook as tqdm

original_dataset_file = '../../static/datasets/original/bike_demand.csv'
dataset_file = '../../static/datasets/bike_demand.csv'


if not os.path.isfile(original_dataset_file):
    response = requests.get('https://archive.ics.uci.edu/ml/machine-learning-databases/00560/SeoulBikeData.csv', stream=True)
    with open(original_dataset_file, 'wb') as fid:
        total_length = int(response.headers.get('content-length'))
        for chunk in tqdm.tqdm(response.iter_content(chunk_size=1024), desc='Downloading', total=(total_length / 1024) + 1): 
            if chunk:
                fid.write(chunk)
                fid.flush()

## Loading the Dataset

In [None]:
full_dataset = pd.read_csv(original_dataset_file, encoding='cp1252')

## Displaying the First 10 Rows of the Dataset

In [None]:
print('Dataset size: {}'.format(len(full_dataset)))
full_dataset.head(10)

Dataset size: 8760


Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes
5,01/12/2017,100,5,-6.4,37,1.5,2000,-18.7,0.0,0.0,0.0,Winter,No Holiday,Yes
6,01/12/2017,181,6,-6.6,35,1.3,2000,-19.5,0.0,0.0,0.0,Winter,No Holiday,Yes
7,01/12/2017,460,7,-7.4,38,0.9,2000,-19.3,0.0,0.0,0.0,Winter,No Holiday,Yes
8,01/12/2017,930,8,-7.6,37,1.1,2000,-19.8,0.01,0.0,0.0,Winter,No Holiday,Yes
9,01/12/2017,490,9,-6.5,27,0.5,1928,-22.4,0.23,0.0,0.0,Winter,No Holiday,Yes


## Cleaning Up the Data

In [None]:
dataset = full_dataset.copy()   # Creat a copy of the data

## Remove Non-Functioning Day
dataset = dataset.loc[dataset['Functioning Day'] == 'Yes'].drop('Functioning Day', axis=1)

## Remove Holidays
dataset = dataset.loc[dataset['Holiday']=='No Holiday'].drop('Holiday', axis=1)

## Remove Dew point temperature
dataset = dataset.drop('Dew point temperature(°C)', axis=1)

## Remove Seasons field
dataset = dataset.drop('Seasons', axis=1)

dataset['Day of the week'] = pd.DatetimeIndex(dataset['Date'], dayfirst=True).dayofweek

print('Dataset size: {}'.format(len(dataset)))
dataset.head(10)

Dataset size: 8057


Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Day of the week
0,01/12/2017,254,0,-5.2,37,2.2,2000,0.0,0.0,0.0,4
1,01/12/2017,204,1,-5.5,38,0.8,2000,0.0,0.0,0.0,4
2,01/12/2017,173,2,-6.0,39,1.0,2000,0.0,0.0,0.0,4
3,01/12/2017,107,3,-6.2,40,0.9,2000,0.0,0.0,0.0,4
4,01/12/2017,78,4,-6.0,36,2.3,2000,0.0,0.0,0.0,4
5,01/12/2017,100,5,-6.4,37,1.5,2000,0.0,0.0,0.0,4
6,01/12/2017,181,6,-6.6,35,1.3,2000,0.0,0.0,0.0,4
7,01/12/2017,460,7,-7.4,38,0.9,2000,0.0,0.0,0.0,4
8,01/12/2017,930,8,-7.6,37,1.1,2000,0.01,0.0,0.0,4
9,01/12/2017,490,9,-6.5,27,0.5,1928,0.23,0.0,0.0,4


## Save the Clean Dataset

In [None]:
dataset.to_csv(dataset_file, index=False)