# 0 Download And Prepare Taxi Trip Data

We will start by downloading the data from the New York City Taxi and Limousine Commissions's website. For the purposes of our analysis, we have picked the yellow taxi trip data for July 2015. We specifically chose a 2015 data set because it contains individual dropoffs and pickups at the latitude/longitude level, which will come in handy in Step 3. Beginning in 2016, data is no longer provided at the latitude/longitude level.

In [1]:
#The first step is to download and install wget.exe, which is a tool you can use to download subsequent data files
#If you are running Windows, you can get Wget for Windows here: http://gnuwin32.sourceforge.net/packages/wget.htm
#If you have MacOS or Linux, you should already have wget.exe

In [None]:
#The next step is to download the taxi trip data using wget (installed in the previous step)
!wget --continue https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-07.csv
!ren yellow_tripdata_2015-07.csv taxiJul.csv

In [3]:
%%time
import pandas as pd
sample = pd.read_csv('taxiJul.csv',nrows=10)
sample.columns

Wall time: 5.01 ms


In [None]:
%%time
#Create the table
commands=""".mode csv
.headers on
DROP TABLE taxiJul;
CREATE TABLE taxiJul(
  "VendorID" TEXT,
  "tpep_pickup_datetime" TEXT,
  "tpep_dropoff_datetime" TEXT,
  "passenger_count" NUMERIC,
  "trip_distance" NUMERIC,
  "pickup_longitude" TEXT,
  "pickup_latitude" TEXT,
  "RatecodeID" TEXT,
  "store_and_fwd_flag" TEXT,
  "dropoff_longitude" TEXT,
  "dropoff_latitude" TEXT,
  "payment_type" TEXT,
  "fare_amount" NUMERIC,
  "extra" NUMERIC,
  "mta_tax" NUMERIC,
  "tip_amount" NUMERIC,
  "tolls_amount" NUMERIC,
  "improvement_surcharge" NUMERIC,
  "total_amount" NUMERIC,
  "pickup_OBJECTID" TEXT,
  "pickup_Shape_Leng" NUMERIC,
  "pickup_Shape_Area" NUMERIC,
  "pickup_zone" TEXT,
  "pickup_LocationID" TEXT,
  "pickup_borough" TEXT,
  "dropoff_OBJECTID" TEXT,
  "dropoff_Shape_Leng" NUMERIC,
  "dropoff_Shape_Area" NUMERIC,
  "dropoff_zone" TEXT,
  "dropoff_Location" TEXT,
  "dropoff_borough" TEXT
);
.import taxiJul_enh2.csv taxiJul
"""
print(commands)
with open('CreateTableTaxiJul.sql', 'w') as f:
  f.write(commands)


In [None]:
%%time
!sqlite3 taxiJul.db <CreateTableTaxiJul.sql

In [None]:
%%time
#Define the view
commands=""".mode csv
.headers on
DROP VIEW taxiJulEnrich; 
CREATE VIEW taxiJulEnrich AS SELECT *,
1 AS `count`,
strftime('%Y',`tpep_pickup_datetime`) AS `pu_year`, 
strftime('%m',`tpep_pickup_datetime`) AS `pu_month`, 
strftime('%d',`tpep_pickup_datetime`) AS `pu_day`, 
strftime('%H',`tpep_pickup_datetime`) AS `pu_hour`, 
strftime('%M',`tpep_pickup_datetime`) AS `pu_minute`, 
strftime('%S',`tpep_pickup_datetime`) AS `pu_second`, 
strftime('%w',`tpep_pickup_datetime`) AS `pu_weekday`,
strftime('%Y',`tpep_dropoff_datetime`) AS `do_year`, 
strftime('%m',`tpep_dropoff_datetime`) AS `do_month`, 
strftime('%d',`tpep_dropoff_datetime`) AS `do_day`, 
strftime('%H',`tpep_dropoff_datetime`) AS `do_hour`, 
strftime('%M',`tpep_dropoff_datetime`) AS `do_minute`, 
strftime('%S',`tpep_dropoff_datetime`) AS `do_second`, 
strftime('%w',`tpep_dropoff_datetime`) AS `do_weekday`,
`pickup_latitude`||','||`pickup_longitude` AS `pu_latlong`
FROM taxiJul; 
SELECT * from taxiJulEnrich LIMIT 10;"""
print(commands)
with open('CreateViewTaxiJulEnrich.sql', 'w') as f:
  f.write(commands)


In [None]:
%%time
!sqlite3 taxiJul.db <CreateViewTaxiJulEnrich.sql

In [None]:
%%time
!sqlite3 taxiJul.db "DROP TABLE `zones`;"
!ECHO .import taxi+_zone_lookup.csv zones>script.txt & sqlite3 -csv -header taxiJul.db <script.txt
!sqlite3 -header taxiJul.db "SELECT * FROM `zones`;"

# Prepare data for 01 Tourist/Resident Perspective

In [None]:
!sqlite3 -header taxiJul.db "EXPLAIN QUERY PLAN SELECT `dropoff_zone`,`do_day`,`do_weekday`,`do_hour`,sum(`count`) AS `ridecount`,avg(`tip_amount`),max(`tip_amount`),sum(`tip_amount`),avg(`total_amount`),max(`total_amount`),sum(`total_amount`) FROM taxiJulEnrich GROUP BY `dropoff_zone`,`do_day`,`do_weekday`,`do_hour`;"

In [None]:
!sqlite3 taxiJul.db "EXPLAIN QUERY PLAN CREATE INDEX `tourist` ON `taxiJul` (`dropoff_zone`,`do_day`,`do_weekday`,`do_hour`);"

In [None]:
!sqlite3 -header taxiJul.db "DROP TABLE `01_Tourist_Resident`; CREATE TABLE `01_Tourist_Resident` AS SELECT `dropoff_zone`,`dropoff_borough`,`do_day`,`do_weekday`,`do_hour`,sum(`count`) AS `ridecount`,avg(`tip_amount`),max(`tip_amount`),sum(`tip_amount`),avg(`total_amount`),max(`total_amount`),sum(`total_amount`) FROM taxiJulEnrich GROUP BY `dropoff_zone`,`dropoff_borough`,`do_day`,`do_weekday`,`do_hour`;"

In [None]:
!sqlite3 -header taxiJul.db "SELECT COUNT(*) FROM `01_Tourist_Resident`;"

In [None]:
!sqlite3 -header taxiJul.db "SELECT `dropoff_borough`,sum(`ridecount`) FROM `01_Tourist_Resident` GROUP BY `dropoff_borough`;"

In [None]:
!echo .schema 01_Tourist_Resident >script.txt & sqlite3 taxiJul.db <script.txt