## Team Name: <3
## Veronica Alejandro
## Lily Hoefner
## Title: 2017 Mini-Challenge 1

A nature preserve used by local residents and tourists for day-trips, overnight camping or just passing through, is seeing a decrease in nesting pairs of a popular bird. Since the park collects data on vehicles, we were tasked with analyzing patterns and perhaps finding any unusal behaviors that may require further investigation and find out why the nesting pairs are decreasing.

In [2]:
# Import the necessary libraries and data
import altair as alt
import pandas as pd
import numpy as np
from datetime import datetime as dt
from dateutil.parser import parse

df = pd.read_csv('Lekagul Sensor Data.csv', parse_dates = ['Timestamp'])
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

### Car types
1 - 2 axle car/motorcycle

2 - 2 axle truck

3 - 3 axle truck

4 - 4 axle+ truck

5 - 2 axle bus

6 - 3 axle bus

2P - Park service vehicles

### Sensor types
- Entrances: all vehicles may pass
- General gates: all vehicles may pass
- Gates: only park service vehicles may pass
- Ranger stops: record all traffic (so any vehicle) passing by, but are used by the park service workesr
- Camping: all vehicles passing by 
 


## Visualization 1

How does vehicular traffic change throughout the day for each day of the week?

In [None]:
# create a column for day of the month
df['day'] = df['Timestamp'].dt.day

# create a column for the month number
df['month'] = df['Timestamp'].dt.month

# create a column for week of the year
df['week'] = df['Timestamp'].dt.week

# create a column for day_name
df['day_name'] = df['Timestamp'].dt.day_name()

# create a column for hour of the day
df['hour'] = df['Timestamp'].dt.hour

# create a column for hour of the day
df['year'] = df['Timestamp'].dt.year

df.dtypes


In [None]:
# Visualization 1

df2 = df.groupby(['hour','day_name', 'car-type']).size().reset_index(name='counts')

options = list(df2['car-type'].unique())
labels = [option + '' for option in options]

dropdown_category = alt.binding_select(options=options + [None], name='Select Vehicle Type', labels = labels + ['All'])
selection_category = alt.selection_single(fields=['car-type'], bind=dropdown_category)

base = alt.Chart(df2).mark_line(strokeWidth=2).encode(
    x = alt.X('hour'), 
    y = alt.Y('counts', axis = alt.Axis(title = 'Counts')),
    facet = alt.Facet('day_name:N', columns = 1, align = 'each',
                      title = 'Total Vehicle Activity per Hour',
                      sort = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']),
    tooltip = 'counts',
    color = alt.Color(
        'car-type:N',
        scale=alt.Scale(domain=('1', '2', '3', '4', '5', '6', '2P'), 
                        range=['#ea5545', '#ef9b20', '#ede15b', '#bdcf32', 
                               '#87bc45', "#27aeef", "#b33dc6"]),
        title = 'Vehicle Type')
    ).properties(width=400, height=60)
  
filter_vehicle = base.add_selection(
    selection_category
).transform_filter(
    selection_category
)

filter_vehicle

## Visualization 2

What is the time each vehicle spent in the nature preserve? What is the distribution of car type for the days spent?

In [None]:
# Time each vehicle spent in nature preserve 

df1 = df.drop_duplicates(subset='car-id', keep="first")
df2 = df.drop_duplicates(subset='car-id', keep="last")

merged = pd.merge(df1, df2, on='car-id')

merged['days'] = merged['Timestamp_y'] - merged['Timestamp_x']
merged = merged.sort_values('days')
merged = merged.astype({'days': 'timedelta64[D]'})
merged = merged[merged['days'] >= 1]

merged.head()

In [None]:
# Visualization 2

interval = alt.selection_interval()

base = alt.Chart(merged).mark_point().encode(
    x = 'days', 
    color = alt.Color(
        'car-type_x:N',
        scale = alt.Scale(domain=('1', '2', '3', '4', '5', '6', '2P'), 
                        range=['#ea5545', '#ef9b20', '#ede15b', '#bdcf32', 
                               '#87bc45', "#27aeef", "#b33dc6"]),
        title = 'Vehicle Type'),
    tooltip = ['car-id', 'days', 'car-type_x']
).properties(
    width = 800,
    title = 'Days Spent in the Park by Vehicle Type'
).add_selection(
    interval
)

hist = alt.Chart(merged).mark_bar().encode (
x = 'count()',
y = alt.Y('car-type_x', title = 'Vehicle Type'),
color = alt.Color(
        'car-type_x:N',
        scale = alt.Scale(domain=('1', '2', '3', '4', '5', '6', '2P'), 
                        range=['#ea5545', '#ef9b20', '#ede15b', '#bdcf32', 
                               '#87bc45', "#27aeef", "#b33dc6"]),
        title = 'Vehicle Type')
).properties (
width = 800,
height = 80
).transform_filter (
interval)

base & hist

## Visualization 3

Is there a particular time of year that people prefer to camp? Which types of vehicles visit the campsites most often? 


In [None]:
# Visualization 3
interval = alt.selection_interval(encodings=['x'])
camping = df[(df['gate-name'] == 'camping0') | 
             (df['gate-name'] == 'camping1') |
             (df['gate-name'] == 'camping2') |
             (df['gate-name'] == 'camping3') |
             (df['gate-name'] == 'camping4') |
             (df['gate-name'] == 'camping5') |
             (df['gate-name'] == 'camping6') |
             (df['gate-name'] == 'camping7') |
             (df['gate-name'] == 'camping8') ]
             
df['Date'] = df['Timestamp'].dt.strftime('%Y-%m-%d')

base = alt.Chart(camping).mark_line(size=2).encode(
    x = alt.X('Date:T', axis=alt.Axis(tickSize=0)), 
    y = alt.Y('count()', axis=alt.Axis(title='Counts')),
    color=alt.Color(
        'car-type:N',
        scale=alt.Scale(domain=('1', '2', '3', '4', '5', '6', '2P'), 
                        range=['#ea5545', '#ef9b20', '#ede15b', 
                               '#bdcf32', '#87bc45', "#27aeef", "#b33dc6"]),
        title='Vehicle Type')
    )

chart = base.encode(
    x = alt.X('Date:T', scale=alt.Scale(domain=interval.ref()))
).properties(
    width=750,
    height=300,
    title='Camping Gate Activity Over Time'
)

view = base.add_selection(
    interval
).properties(
    width=750,
    height=50,
)

chart & view

## Visualization 4

What is the distribution of stays in campgrounds? What is the most popular campground?

In [None]:
import plotly.graph_objects as go 
import plotly.express as px

In [None]:
campings = ['camping0', 'camping1', 'camping2', 'camping3', 'camping4', 'camping5', 'camping6', 'camping7', 'camping8']
df_camp = df[df['gate-name'].isin(campings)]['gate-name'].value_counts().rename_axis('campground').reset_index(name = 'counts')
df_camp

In [None]:
# Visualization 4
colormap = {'camping0': '#797D62', 'camping1': '#9B9B7A',
            'camping2': '#D9AE94', 'camping3': '#E5C59E',
            'camping4': '#F1DCA7', 'camping5': '#F8D488',
            'camping6': '#E4B074', 'camping7': '#D08C60',
            'camping8': '#997B66'}
fig = px.bar(df_camp, x = 'campground', y = 'counts', title = 'Distribution of Campground Stays', hover_name = 'counts',
             color = 'campground', color_discrete_map = colormap)
fig.show()

## Visualization 5

What entrance is most utilized? What entrance does each type of vehicle utilize the most?

In [None]:
# Visualization 5
entrances = df[(df['gate-name'] == 'entrance0') | 
             (df['gate-name'] == 'entrance1') |
             (df['gate-name'] == 'entrance2') |
             (df['gate-name'] == 'entrance3') |
             (df['gate-name'] == 'entrance4')]

options = list(df2['car-type'].unique())
labels = [option + '' for option in options]

dropdown_category = alt.binding_select(options=options + [None], name='Select Vehicle Type', labels = labels + ['All'])
selection_category = alt.selection_single(fields=['car-type'], bind=dropdown_category)

base = alt.Chart(entrances).mark_bar().encode(
    x = alt.X('gate-name', axis=alt.Axis(title='Entrance'),  
          scale = alt.Scale(domain = ['entrance0', 'entrance1', 'entrance2', 'entrance3', 'entrance4']), stack = 'zero'),
    y = alt.Y('count()', axis=alt.Axis(title='Visits'), title = 'Visits', scale=alt.Scale(domain=[0, 9000])),
    color = alt.Color(
        'car-type:N',
        scale = alt.Scale(domain=('1', '2', '3', '4', '5', '6', '2P'), 
                        range=['#ea5545', '#ef9b20', '#ede15b', '#bdcf32', 
                               '#87bc45', "#27aeef", "#b33dc6"]),
        title = 'Vehicle Type'),
    tooltip = 'count()'
).properties(
    width=500,
    title="Utilization of Each Entrance by Vehicle Type"
).configure_facet(
    spacing=8
).add_selection(
    selection_category
).transform_filter(
    selection_category
).interactive()


base

## Visualization 6

Which restricted gates are utilized the most? Are vehicles other than park service vehicles passing through the restricted gates? How does this activity change over time?

In [None]:
# Visualization 6

gates = df[(df['gate-name'] == 'gate0') | 
             (df['gate-name'] == 'gate1') |
             (df['gate-name'] == 'gate2') |
             (df['gate-name'] == 'gate3') |
             (df['gate-name'] == 'gate4') |           
             (df['gate-name'] == 'gate5') |
             (df['gate-name'] == 'gate6') |
             (df['gate-name'] == 'gate7') |
             (df['gate-name'] == 'gate8')]


select_month = alt.selection_single(name="SelectorName", 
                                fields=['month'],
                                bind = alt.binding_range(min=1, max=12, step=1, name='Select Month'), 
                                init={'month': 1})

alt.Chart(gates).mark_bar().encode(
    x = alt.X('gate-name', axis=alt.Axis(title='Gate'), 
          sort=alt.EncodingSortField(field="gate-name", op="count", order='descending')),
    y = alt.Y('count()', axis=alt.Axis(title='Visits'), title = 'Visits'),
    color = alt.Color(
        'car-type:N',
        scale = alt.Scale(domain=('1', '2', '3', '4', '5', '6', '2P'), 
                        range=['#ea5545', '#ef9b20', '#ede15b', '#bdcf32', 
                               '#87bc45', "#27aeef", "#b33dc6"]),
        title = 'Vehicle Type')
).properties(
    width=500,
    title="Restricted Gate Visits over Time"
).add_selection(
    select_month
).transform_filter(
    select_month
).configure_facet(
    spacing=8
)

### Conclusion

- Not enough data to determine exactly what is happening or causing the nesting pairs to decrease in population 
- We can look into the unusual patterns we found and any outliers in the data 
- Then we can maybe find what is going on by investigating this suspicious activity
