# Tourist or Resident Perspective

In this notebook, we will analyze the taxi data from the perspective of a tourist or resident who wants to explore interesting parts of the city.

Taxi dropoffs are used as a proxy for the "popularity" of a given location. We will rank-order the taxi zones contained in the data by popularity at various times of the day and week.

Then, we will use machine learning to discover hotspots of activity. This will be achieved by applying a clustering algorithm (unsupervised learning) to Latitude and Longitude information contained in the data.

In [1]:
%%time
#We will import the necessary Python libraries in this step. The %%time command keeps track of the execution time
import sqlite3         # Provides powerful relational database query capabilities using the SQL language
import pandas as pd    # Pandas provides a powerful DataFrame to manipulate and analyze tabular data

Wall time: 1.22 s


In [12]:
%%time
#We connect to a SQLite database. This database was prepared using the notebook "00 Prepare Taxi Trip Data"
#Examine the contents of this database by looking at the sqlite_master table
cn = sqlite3.connect('taxiJul.db')
master = pd.read_sql_query("SELECT * from sqlite_master;",cn)
master

Wall time: 3.01 ms


In [10]:
#We are interested in the taxiJulEnrich view
#Read the first row of this view to examine the columns available
sample = pd.read_sql_query("SELECT * from taxiJulEnrich LIMIT 1;",cn)
sample.columns

Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'pickup_longitude',
       'pickup_latitude', 'RatecodeID', 'store_and_fwd_flag',
       'dropoff_longitude', 'dropoff_latitude', 'payment_type', 'fare_amount',
       'extra', 'mta_tax', 'tip_amount', 'tolls_amount',
       'improvement_surcharge', 'total_amount', 'pickup_OBJECTID',
       'pickup_Shape_Leng', 'pickup_Shape_Area', 'pickup_zone',
       'pickup_LocationID', 'pickup_borough', 'dropoff_OBJECTID',
       'dropoff_Shape_Leng', 'dropoff_Shape_Area', 'dropoff_zone',
       'dropoff_Location', 'dropoff_borough', 'count', 'pu_year', 'pu_month',
       'pu_day', 'pu_hour', 'pu_minute', 'pu_second', 'pu_weekday', 'do_year',
       'do_month', 'do_day', 'do_hour', 'do_minute', 'do_second', 'do_weekday',
       'pu_latlong'],
      dtype='object')

In [None]:
%%time
#In this step, we read the SQLite
df = pd.read_sql_query("SELECT *,CAST(`do_weekday` AS INTEGER) AS ndo_weekday FROM `01_tourist_resident`;", cn)

In [None]:
df.dtypes

In [None]:
df.columns

In [None]:
weekdays = pd.read_json('{"weekday":{"0":"Sunday","1":"Monday","2":"Tuesday","3":"Wednesday","4":"Thursday","5":"Friday","6":"Saturday"}}')

In [None]:
weekdays

In [None]:
merged = df.merge(weekdays,how='left',left_on='ndo_weekday',right_index=True)

In [None]:
merged.dtypes

In [None]:
%%time
merged.to_csv('01_Tourist_Resident.csv', index_label='ROWID')
#merged.to_excel('01_Tourist_Resident.xlsx',sheet_name='Data',index_label='ROWID')
#weekdays.to_excel('01_Tourist_Resident.xlsx',sheet_name='Weekdays')

In [None]:
?merged.to_csv

In [None]:
%%time
sample = pd.read_sql_query("SELECT * FROM taxiJulEnrich LIMIT 1000;", cn);

In [None]:
sample.columns

In [None]:
minutes = sample[['pu_minute','tip_amount']]

In [None]:
minutes

In [None]:
counts

In [None]:
n = counts[0]