# Over View of the Project
- Import your data into a Pandas DataFrame.
- Merge your DataFrames.
- Create a bubble chart that showcases the average fare versus the total number of rides with bubble size based on the total number of drivers for each city type, including urban, suburban, and rural.
- Determine the mean, median, and mode for the following:
   - The total number of rides for each city type.
   - The average fares for each city type.
   - The total number of drivers for each city type.
- Create box-and-whisker plots that visualize each of the following to determine if there are any outliers:
   - The number of rides for each city type.
   - The fares for each city type.
   - The number of drivers for each city type.
- Create a pie chart that visualizes each of the following data for each city type:
   - The percent of total fares.
   - The percent of total rides.
   - The percent of total drivers.

In [1]:
# Add Matplotlib inline magic command
%matplotlib inline
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd

import numpy as np
import statistics

In [2]:
# Files to load
city_data_to_load = "Resources/city_data.csv"
ride_data_to_load = "Resources/ride_data.csv"

In [3]:
# Read the city data file an store it in a pandas DataFrame
city_data_df = pd.read_csv(city_data_to_load)
city_data_df.head(10)

Unnamed: 0,city,driver_count,type
0,Richardfort,38,Urban
1,Williamsstad,59,Urban
2,Port Angela,67,Urban
3,Rodneyfort,34,Urban
4,West Robert,39,Urban
5,West Anthony,70,Urban
6,West Angela,48,Urban
7,Martinezhaven,25,Urban
8,Karenberg,22,Urban
9,Barajasview,26,Urban


In [4]:
# Read the ride data file and store it in a pandas DataFrame.
ride_data_df = pd.read_csv(ride_data_to_load)
ride_data_df.head(10)

Unnamed: 0,city,date,fare,ride_id
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344
5,South Latoya,2019-03-11 12:26:48,9.52,1994999424437
6,New Paulville,2019-02-27 11:17:56,43.25,793208410091
7,Simpsonburgh,2019-04-26 00:43:24,35.98,111953927754
8,South Karenland,2019-01-08 03:28:48,35.09,7995623208694
9,North Jasmine,2019-03-09 06:26:29,42.81,5327642267789


### Inspect the City Data DataFrame
1. Get all teh rows that contains null values
2. Make sure the driver_count columns has an integer data type.
3. Find out how mamy data points there are for each type of city.

In [5]:
# Get the columns and the rows that are not null.
city_data_df.count()

city            120
driver_count    120
type            120
dtype: int64

In [7]:
# Get the columns and the rows that are not null.
city_data_df.isnull().sum()

city            0
driver_count    0
type            0
dtype: int64

In [8]:
# Check if teh driver_count column is numerical data type or not
# Get the data types of each column.
city_data_df.dtypes

city            object
driver_count     int64
type            object
dtype: object

In [9]:
# Check how many type of city are there using uniqlu() method
# Get the data types of each column.
city_data_df["type"].unique()

array(['Urban', 'Suburban', 'Rural'], dtype=object)

In [10]:
# Get the number of data points from the Urban cities
sum(city_data_df['type']=='Urban')

66

In [14]:
# Get the 'Surburban' and 'Rural'
sum(city_data_df['type']=='Suburban')

36

In [15]:
# Get the 'Rural'
sum(city_data_df['type']=='Rural')

18

### Inspect Ride Data DataFrame

In [16]:
# Get the columns and the rows that are not null.
ride_data_df.count()

city       2375
date       2375
fare       2375
ride_id    2375
dtype: int64

In [17]:
# Get the columns and the rows that are not null.
ride_data_df.isnull().sum()

city       0
date       0
fare       0
ride_id    0
dtype: int64

In [18]:
# Get the data types of each column.
ride_data_df.dtypes

city        object
date        object
fare       float64
ride_id      int64
dtype: object

### Merge DataFrames

In [19]:
#  new_df = pd.merge(leftdf, rightdf, 
#                    on=["column_leftdf", "column_rightdf"])

# We may have to merg the DataFrames using the 'how=' parameter either left, right, inner, or outer.

In [21]:
# Combine that data into a single dataset on"city" column
pyber_data_df = pd.merge(ride_data_df, city_data_df,
                         how="left", on=['city', 'city'])
# Display the DataFrame
pyber_data_df.head(10)

Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Lake Jonathanshire,2019-01-14 10:14:22,13.83,5739410935873,5,Urban
1,South Michelleport,2019-03-04 18:24:09,30.24,2343912425577,72,Urban
2,Port Samanthamouth,2019-02-24 04:29:00,33.44,2005065760003,57,Urban
3,Rodneyfort,2019-02-10 23:22:03,23.44,5149245426178,34,Urban
4,South Jack,2019-03-06 04:28:35,34.58,3908451377344,46,Urban
5,South Latoya,2019-03-11 12:26:48,9.52,1994999424437,10,Urban
6,New Paulville,2019-02-27 11:17:56,43.25,793208410091,44,Urban
7,Simpsonburgh,2019-04-26 00:43:24,35.98,111953927754,21,Urban
8,South Karenland,2019-01-08 03:28:48,35.09,7995623208694,4,Urban
9,North Jasmine,2019-03-09 06:26:29,42.81,5327642267789,33,Urban


In [26]:
test_pyber_data_df = pd.merge(ride_data_df, city_data_df,
                         how="right", on=['city', 'city'])
# Display the DataFrame
test_pyber_data_df.head(25)

Unnamed: 0,city,date,fare,ride_id,driver_count,type
0,Richardfort,2019-02-24 08:40:38,13.93,5628545007794,38,Urban
1,Richardfort,2019-02-13 12:46:07,14.0,910050116494,38,Urban
2,Richardfort,2019-02-16 13:52:19,17.92,820639054416,38,Urban
3,Richardfort,2019-02-01 20:18:28,10.26,9554935945413,38,Urban
4,Richardfort,2019-04-17 02:26:37,23.0,720020655850,38,Urban
5,Richardfort,2019-04-21 03:44:04,9.54,3698147103219,38,Urban
6,Richardfort,2019-02-03 00:14:26,29.04,4982665519010,38,Urban
7,Richardfort,2019-02-08 15:50:12,16.55,2270463070874,38,Urban
8,Richardfort,2019-04-03 15:07:34,40.77,9496210735824,38,Urban
9,Richardfort,2019-02-19 14:09:20,27.11,8690324801449,38,Urban
