# Proyek Analisis Data: Bike Sharing 
- **Nama:** Walker Valentinus Simanjuntak
- **Email:** iss21012@students.del.ac.id
- **ID Dicoding:** walkersimanjuntak

## Menentukan Pertanyaan Bisnis

- Pertanyaan 1
- Pertanyaan 2

## Import Semua Packages/Library yang Digunakan

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Wrangling

### Gathering Data

In [2]:
day_df = pd.read_csv('Data/day.csv')
hour_df = pd.read_csv('Data/hour.csv')

### Assessing Data

In [39]:
pd.set_option('display.max_columns', None) 
def data_overview(day_df, head=5):
    print(" SHAPE ".center(4, '-'))
    print('Rows:{}'.format(day_df.shape[0]))
    print('Columns:{}'.format(day_df.shape[1]))
    print("\n")
    print(" MISSING VALUES ".center(4, '-'))
    print(day_df.isnull().sum())
    print("\n")
    print(" DUPLICATED VALUES ".center(4, '-'))
    print(day_df.duplicated().sum())
    print("\n")
    print(" HEAD ".center(4, '-'))
    print(day_df.head(3))
    print("\n")
    print(" DATA TYPES ".center(4, '-'))
    print(day_df.dtypes)
    print("\n")
    print("DATA DAY SUMMARRY".center(4, '-'))
    print(day_df.describe())

data_overview(day_df)

 SHAPE 
Rows:731
Columns:16


 MISSING VALUES 
instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64


 DUPLICATED VALUES 
0


 HEAD 
   instant      dteday  season  yr  mnth  holiday  weekday  workingday  \
0        1  2011-01-01       1   0     1        0        6           0   
1        2  2011-01-02       1   0     1        0        0           0   
2        3  2011-01-03       1   0     1        0        1           1   

   weathersit      temp     atemp       hum  windspeed  casual  registered  \
0           2  0.344167  0.363625  0.805833   0.160446     331         654   
1           2  0.363478  0.353739  0.696087   0.248539     131         670   
2           1  0.196364  0.189405  0.437273   0.248309     120        1229   

    cnt  
0   985  
1   801  
2  1349  



In [38]:
pd.set_option('display.max_columns', None) 
def data_overview(hour_df, head=5):
    print(" SHAPE ".center(4, '-'))
    print('Rows:{}'.format(hour_df.shape[0]))
    print('Columns:{}'.format(hour_df.shape[1]))
    print("\n")
    print(" MISSING VALUES ".center(4, '-'))
    print(hour_df.isnull().sum())
    print("\n")
    print(" DUPLICATED VALUES ".center(4, '-'))
    print(hour_df.duplicated().sum())
    print("\n")
    print(" HEAD ".center(4, '-'))
    print(hour_df.head(3))
    print("\n")
    print(" DATA TYPES ".center(4, '-'))
    print(hour_df.dtypes)
    print("\n")
    print("DATA HOUR SUMMARRY".center(4, '-'))
    print(hour_df.describe())

data_overview(hour_df)

 SHAPE 
Rows:17379
Columns:17


 MISSING VALUES 
instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64


 DUPLICATED VALUES 
0


 HEAD 
   instant      dteday  season  yr  mnth  hr  holiday  weekday  workingday  \
0        1  2011-01-01       1   0     1   0        0        6           0   
1        2  2011-01-01       1   0     1   1        0        6           0   
2        3  2011-01-01       1   0     1   2        0        6           0   

   weathersit  temp   atemp   hum  windspeed  casual  registered  cnt  
0           1  0.24  0.2879  0.81        0.0       3          13   16  
1           1  0.22  0.2727  0.80        0.0       8          32   40  
2           1  0.22  0.2727  0.80        0.0       5          27   32  


 DATA TYPES 
instant         i

### Cleaning Data

#### Because there is no missing value or duplicated data, we check the outlier

In [36]:
Q1 = hour_df['cnt'].quantile(0.25)
Q3 = hour_df['cnt'].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5*IQR
upper_bound = Q3 + 1.5*IQR

outliers = hour_df[(hour_df['cnt'] < lower_bound) | (hour_df['cnt'] > upper_bound)]
# outliers.style.background_gradient(cmap='BrBG')

outliers.shape

(505, 17)

In [37]:
# Delete Outlier
cleaned_outliers = hour_df[~((hour_df['cnt'] < lower_bound) | (hour_df['cnt'] > upper_bound))]

cleaned_outliers.shape

(16874, 17)

#### Convert Dteday to datetime

In [40]:
day_df['dteday'] = pd.to_datetime(day_df['dteday'])
hour_df['dteday'] = pd.to_datetime(hour_df['dteday'])

#### Merge Dataset

In [44]:
all_df = pd.merge(
    left = day_df,
    right = hour_df,
    how = 'left',
    left_on='instant',
    right_on='instant',
    suffixes=('_day', '_hour')
)

In [45]:
all_df.head()

Unnamed: 0,instant,dteday_day,season_day,yr_day,mnth_day,holiday_day,weekday_day,workingday_day,weathersit_day,temp_day,atemp_day,hum_day,windspeed_day,casual_day,registered_day,cnt_day,dteday_hour,season_hour,yr_hour,mnth_hour,hr,holiday_hour,weekday_hour,workingday_hour,weathersit_hour,temp_hour,atemp_hour,hum_hour,windspeed_hour,casual_hour,registered_hour,cnt_hour
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


## Exploratory Data Analysis (EDA)

### Explore Dataset

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2