# Insights from City Supply and Demand Data

## Contents

1. [Assignment](#Assigment)
2. [Data Exploration](#Data-Exploration)

### Assigment

Using the provided dataset, answer the following questions:

1. Which date had the most completed trips during the two week period?

2. What was the highest number of completed trips within a 24 hour period?

3. Which hour of the day had the most requests during the two week period?

4. What percentages of all zeroes during the two week period occurred on weekend (Friday at 5 pm to Sunday at 3 am)? Tip: The local time value is the start of the hour (e.g. 15 is the hour from 3:00pm - 4:00pm)

5. What is the weighted average ratio of completed trips per driver during the two week period? Tip: "Weighted average" means your answer should account for the total trip volume in each hour to determine the most accurate number in whole period.

6. In drafting a driver schedule in terms of 8 hours shifts, when are the busiest 8 consecutive hours over the two week period in terms of unique requests? A new shift starts in every 8 hours. Assume that a driver will work same shift each day.

7. True or False: Driver supply always increases when demand increases during the two week period. Tip: Visualize the data to confirm your answer if needed.

8. In which 72 hour period is the ratio of Zeroes to Eyeballs the highest?

9. If you could add 5 drivers to any single hour of every day during the two week period, which hour should you add them to? Hint: Consider both rider eyeballs and driver supply when choosing

10. True or False: There is exactly two weeks of data in this analysis

11. Looking at the data from all two weeks, which time might make the most sense to consider a true "end day" instead of midnight? (i.e when are supply and demand at both their natural minimums) Tip: Visualize the data to confirm your answer if needed.


### Data Exploration

In [1]:
import pandas as pd

In [31]:
data = pd.read_csv('uber_dataset.csv')
data.head(20)

Unnamed: 0,Date,Time (Local),Eyeballs,Zeroes,Completed Trips,Requests,Unique Drivers
0,10-Sep-12,7,5,0,2,2,9
1,,8,6,0,2,2,14
2,,9,8,3,0,0,14
3,,10,9,2,0,1,14
4,,11,11,1,4,4,11
5,,12,12,0,2,2,11
6,,13,9,1,0,0,9
7,,14,12,1,0,0,9
8,,15,11,2,1,2,7
9,,16,11,2,3,4,6


In [20]:
data.iloc[:10]

Unnamed: 0,Date,Time (Local),Eyeballs,Zeroes,Completed Trips,Requests,Unique Drivers
0,10-Sep-12,7,5,0,2,2,9
1,,8,6,0,2,2,14
2,,9,8,3,0,0,14
3,,10,9,2,0,1,14
4,,11,11,1,4,4,11
5,,12,12,0,2,2,11
6,,13,9,1,0,0,9
7,,14,12,1,0,0,9
8,,15,11,2,1,2,7
9,,16,11,2,3,4,6


This means that during the hour beginning at 4pm (hour 16), on September 10th, 2012, 11 people opened the Uber app (Eyeballs). 2 of them did not see any car (Zeroes) and 4 of them requested a car (Requests). Of the 4 requests, only 3 complete trips actually resulted (Completed Trips). During this time, there were a total of 6 drivers who logged in (Unique Drivers).

In [22]:
data.shape

(336, 7)

In [23]:
data.describe()

Unnamed: 0,Time (Local),Eyeballs,Zeroes,Completed Trips,Requests,Unique Drivers
count,336.0,336.0,336.0,336.0,336.0,336.0
mean,11.5,19.901786,4.252976,4.0625,5.529762,7.895833
std,6.93251,16.902862,5.795391,5.672581,7.399416,5.884296
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,5.75,9.0,1.0,0.0,1.0,3.0
50%,11.5,17.0,3.0,2.0,3.0,8.0
75%,17.25,25.0,5.0,5.0,6.25,11.0
max,23.0,99.0,59.0,36.0,46.0,30.0


In [27]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 336 entries, 0 to 335
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Date              15 non-null     object
 1   Time (Local)      336 non-null    int64 
 2   Eyeballs          336 non-null    int64 
 3   Zeroes            336 non-null    int64 
 4   Completed Trips   336 non-null    int64 
 5   Requests          336 non-null    int64 
 6   Unique Drivers    336 non-null    int64 
dtypes: int64(6), object(1)
memory usage: 18.5+ KB


In [28]:
data.isnull().sum()

Date                321
Time (Local)          0
Eyeballs              0
Zeroes                0
Completed Trips       0
Requests              0
Unique Drivers        0
dtype: int64

As you can see, only the "date" column has null values. When we imported the data, we saw that our "date" column constantly assigns a NULL value until the start of the next day. So we will fill our NULL values using forward fill method.

In [32]:
data = data.fillna(method = 'ffill')
data.head(20)

Unnamed: 0,Date,Time (Local),Eyeballs,Zeroes,Completed Trips,Requests,Unique Drivers
0,10-Sep-12,7,5,0,2,2,9
1,10-Sep-12,8,6,0,2,2,14
2,10-Sep-12,9,8,3,0,0,14
3,10-Sep-12,10,9,2,0,1,14
4,10-Sep-12,11,11,1,4,4,11
5,10-Sep-12,12,12,0,2,2,11
6,10-Sep-12,13,9,1,0,0,9
7,10-Sep-12,14,12,1,0,0,9
8,10-Sep-12,15,11,2,1,2,7
9,10-Sep-12,16,11,2,3,4,6
