# A Study on Bike Rentals in Seoul
### Group name: Bike Experts
#### Team members: Yuyang Wang, Muhan Yang, Zijian Cheng


## Overall Task Analysis

Initially, the visualization platform should feature Three main tabs: "Overview", "Trending" and "Clustering", with four sub-tabs under each main tab corresponding to the four seasons.

In the "Overview" tab, the visualization should display the total number of bike rentals per day, encoded on the y-axis. An area plot will depict the distribution of rental numbers throughout the year, with the data encoded on the x-axis (Characterize Distribution). A tooltip for each day should display the date, total number of rentals, and whether the day falls on a weekend or a holiday. Clicking on a specific day should reveal a detailed distribution of rental numbers for each hour of that day.

For the second part "Trending", the visualization is aiming for finding the trend between amount of bike rental and the temperature or time in a day, therefore a line plot is used to show the trending(Correlate). The average of amount of renting is encoded in the y-axis channel, and the temperature or time in the day is encoded in the x-axis channel. There should be a dropdown menu for the audience to select which trend they want to see. There should also be 4 tabs to allow the audience to switch between the 4 seasons(Filter). Tooltip should also be available for this visualization, showing the average amount of bike rental(Retrieve value), temperature/time of the day.

Under the "Clustering" tab, a scatter plot is used to show the clustering of data points. The y-axis channel is encoded with the amount of rental, the x-axis channel is encoded with temperature or humidity. There should be a dropdown menu for the audience to select between temperature and humidity to display different visualizations to study different clusters. A tooltip is included for displying the number of rental and the corresponding temperature or humidity.

These structured visualizations aim to provide a comprehensive insight into bike rental trends in Seoul, catering to the needs of both the citizens and rental companies, thereby aiding in making informed decisions.

### Five distinct tasks corresponding to low-fidelity sketches:

1. Correlate - What is the correlation between amount of rental and time on weekends and workdays throughout the year?
2. Sort & Filter - Rank month by the highest number of bike rental on holidays and non-holidays.
3. Characterize distribution - What is the daily bike rental distribution for each season?
4. Clustering - Are there clusters of amount of rental regarding to temperature or humidity?
5. Retrive value - Showing details of the average rental numbers for each hour?

In [1]:
import pandas as pd
import altair as alt

# Read the dataset
url = 'https://raw.githubusercontent.com/yuwangy/Seoul_bike_rental_viz/main/SeoulBikeData.csv'
data = pd.read_csv(url, encoding='latin-1')
data.head()

# Add 'Month' as ordinal attribute
data['Date'] = pd.to_datetime(data['Date'], format='%d/%m/%Y')

# Create 'Month' column based on date
data['Month'] = data['Date'].dt.month

# Add 'Weekend' as nominal attribute, determine if the date is on a weekend 
data['Weekend'] = data['Date'].dt.dayofweek.apply(lambda x: 'Yes' if x >= 5 else 'No')

data.head(10)

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day,Month,Weekend
0,2017-12-01,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
1,2017-12-01,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
2,2017-12-01,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
3,2017-12-01,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
4,2017-12-01,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
5,2017-12-01,100,5,-6.4,37,1.5,2000,-18.7,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
6,2017-12-01,181,6,-6.6,35,1.3,2000,-19.5,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
7,2017-12-01,460,7,-7.4,38,0.9,2000,-19.3,0.0,0.0,0.0,Winter,No Holiday,Yes,12,No
8,2017-12-01,930,8,-7.6,37,1.1,2000,-19.8,0.01,0.0,0.0,Winter,No Holiday,Yes,12,No
9,2017-12-01,490,9,-6.5,27,0.5,1928,-22.4,0.23,0.0,0.0,Winter,No Holiday,Yes,12,No


### Interactive Visualization

In [2]:
# Activate the VegaFusion data transformer
import altair as alt
alt.data_transformers.enable("vegafusion")

DataTransformerRegistry.enable('vegafusion')

In [3]:
# Monthly Bike Rentals trends, on weekends and weekdays respectively

bike_month_linechart = alt.Chart(data).mark_line(
    point = alt.OverlayMarkDef(shape = 'square', size = 60, color = 'red')
).encode(
    alt.X('Month:O'),
    alt.Y('mean(Rented Bike Count):Q'),
    alt.Color('Weekend:N'),
    tooltip=[alt.Tooltip('mean(Rented Bike Count)', title='Average Count'), 'Month:O', 'Weekend:N'],
).properties(
    width = 500,
    height = 300,
    title = 'Monthly Bike Rentals on Weekends & Weekdays',
)

bike_month_linechart

In [4]:
# Hourly Bike Rentals Barchart showing detailed boxplot statistics 

selection = alt.selection_single(
    fields = ['Hour'], nearest = True, on = 'mouseover', empty = 'none',
)

# Create the barchart
bike_hour_barchart = alt.Chart(data).mark_bar().encode(
    alt.X('Hour:O'),
    alt.Y('mean(Rented Bike Count):Q'),
    alt.Color('mean(Rented Bike Count):Q'),
    tooltip = [alt.Tooltip('mean(Rented Bike Count):Q', title = 'Mean'),
              alt.Tooltip('min(Rented Bike Count):Q', title = 'Min'),
              alt.Tooltip('q1(Rented Bike Count):Q', title = 'Q1'),
              alt.Tooltip('median(Rented Bike Count):Q', title = 'Median'),
              alt.Tooltip('q3(Rented Bike Count):Q', title = 'Q3'),
              alt.Tooltip('max(Rented Bike Count):Q', title = 'Max')]
).properties(
    width = 500,
    height = 300,
    title = 'Hourly Bike Rentals Barchart',
).add_selection(
    selection
)

# Create the box plot
bike_hour_boxplot = alt.Chart(data).mark_boxplot().encode(
    x = 'Hour:O',
    y = 'Rented Bike Count:Q'
).transform_filter(
    selection
).properties(
    width = 100,
    height = 300,
    title = 'Hourly Bike Rentals Boxplot',
)

# Combine both charts, concatenating horizontally
combined_chart = alt.hconcat(bike_hour_barchart, bike_hour_boxplot, spacing = 25).properties(
    title = 'Hourly Bike Rentals with Detailed Boxplot'
)

display(combined_chart)

