### Taxi Ride Length Prediction 

#### Senario: 

Liftoff(company) wants it's taxi drivers to target longer rides, as the longer the ride the more money it makes.  LiftOff has the following theory:

* the `pickup location` of a taxi ride can help `predict the length of the ride`.  

LiftOff asks us to do some analysis to write a function that will allow it to **predict the length of a taxi ride for any given location **.

Our technique will be the following:
  * **Collect** Obtain the data containing all of the taxi information, and only select the attributes of taxi trips that we need 
  * ** Explore ** Examine the attributes of our data, and plot some of our data on a map
  * ** Train ** Write our nearest neighbors formula, and change the number of nearby trips to predict the length of a new trip
  * ** Predict ** Use our function to predict trip lengths of new locations

## Data: Collect, Explore 

- Open Source: [NYC Open Data](https://opendata.cityofnewyork.us/) collects information about NYC taxi trips and provides this data on [its website](https://data.cityofnewyork.us/Transportation/2014-Yellow-Taxi-Trip-Data/gn7m-em8n).

- A subset 'trips.json' (1000 obs from more then 1 million obs with 5G data, see above link )
    - downloaded from [lean.co.github](https://github.com/learn-co-curriculum/nearest-neighbors-lab)

In [3]:
import json
#load json file
trips_f = open('data_trips.json')
# convert json to [{}..] format
trips = json.load(trips_f)

In [4]:
len(trips)

1000

 - description of the keys: see pdf file or go to website

In [5]:
trips[0:2]

[{'dropoff_datetime': '2014-11-26T22:31:00.000',
  'dropoff_latitude': '40.746769999999998',
  'dropoff_longitude': '-73.997450000000001',
  'fare_amount': '52',
  'imp_surcharge': '0',
  'mta_tax': '0.5',
  'passenger_count': '1',
  'payment_type': 'CSH',
  'pickup_datetime': '2014-11-26T21:59:00.000',
  'pickup_latitude': '40.64499',
  'pickup_longitude': '-73.781149999999997',
  'rate_code': '2',
  'tip_amount': '0',
  'tolls_amount': '5.3300000000000001',
  'total_amount': '57.829999999999998',
  'trip_distance': '18.379999999999999',
  'vendor_id': 'VTS'},
 {'dropoff_datetime': '2014-02-22T17:54:37.000',
  'dropoff_latitude': '40.781844999999997',
  'dropoff_longitude': '-73.979073',
  'fare_amount': '7.5',
  'imp_surcharge': '0',
  'mta_tax': '0.5',
  'passenger_count': '1',
  'payment_type': 'CSH',
  'pickup_datetime': '2014-02-22T17:47:23.000',
  'pickup_latitude': '40.766931',
  'pickup_longitude': '-73.982097999999993',
  'rate_code': '1',
  'store_and_fwd_flag': 'N',
  'tip_

In [12]:
# select data {distance, pickup_latitude, pickup_longitude}
def parse_trips(trips):
    keys = ['pickup_latitude','pickup_longitude','trip_distance']
    trips_parsed=[]
    for trip in trips:
        trip_p={}
        for key in keys:
            trip_p[key]=trip[key]
        trips_parsed.append(trip_p)
    return trips_parsed 

In [17]:
parsed_trips = parse_trips(trips)
parsed_trips[0:2]

[{'pickup_latitude': '40.64499',
  'pickup_longitude': '-73.781149999999997',
  'trip_distance': '18.379999999999999'},
 {'pickup_latitude': '40.766931',
  'pickup_longitude': '-73.982097999999993',
  'trip_distance': '1.3'}]

`dict comprehension`, `d.iteritems()`

```python
dict_int = {k:int(v) for k,v in d.items()}
# or 
dict_int = dict((k,int(v)for k,v in d.items())
```


In [19]:
# value: str to float
def float_values(trips):
    trips_float = []
    for trip in trips:
        trip_float = {k:float(v) for k,v in trip.items()}
        trips_float.append(trip_float)
    return trips_float    

In [22]:
cleaned_trips = float_values(parsed_trips)
cleaned_trips[0:2]

[{'pickup_latitude': 40.64499,
  'pickup_longitude': -73.78115,
  'trip_distance': 18.38},
 {'pickup_latitude': 40.766931,
  'pickup_longitude': -73.982098,
  'trip_distance': 1.3}]

In [23]:
# folium map - manhattan 
import folium

In [26]:
manhattan_map = folium.Map(location=[40.7589, -73.9851], zoom_start=11)
manhattan_map

In [28]:
# create circle marker(at Time Square), add to the map 
marker = folium.CircleMarker(location=[40.7589, -73.9851], radius=10)
marker.add_to(manhattan_map)
manhattan_map

In [29]:
first_trip = {'pickup_latitude': 40.64499, 'pickup_longitude': -73.78115,  'trip_distance': 18.38}
first_trip

{'pickup_latitude': 40.64499,
 'pickup_longitude': -73.78115,
 'trip_distance': 18.38}

In [30]:
# trip location (function) 
# [lat, lon]
def location(trip):
    return [trip['pickup_latitude'],trip['pickup_longitude']]

In [32]:
first_location = location(first_trip)
first_location

[40.64499, -73.78115]

In [33]:
# create marker for each location
def to_marker(location):
    marker = folium.CircleMarker(location, radius=6)
    return marker

In [45]:
# convert list of trips to list of trip_markers
def markers_from_trips(trips):
    # turn list of trips to list of locations
    locs = [location(trip) for trip in trips]
    # turn list of locations to list of trip markers
    markers = [to_marker(location) for location in locs]
    return markers

In [34]:
# test marker for one location
time_square_marker=to_marker([40.7589, -73.9851])

In [36]:
time_square_marker

<folium.vector_layers.CircleMarker at 0x1064bc7b8>

In [38]:
time_square_marker and time_square_marker.location

[40.7589, -73.9851]

In [37]:
time_square_marker.options

'{\n  "bubblingMouseEvents": true,\n  "color": "#3388ff",\n  "dashArray": null,\n  "dashOffset": null,\n  "fill": false,\n  "fillColor": "#3388ff",\n  "fillOpacity": 0.2,\n  "fillRule": "evenodd",\n  "lineCap": "round",\n  "lineJoin": "round",\n  "opacity": 1.0,\n  "radius": 6,\n  "stroke": true,\n  "weight": 3\n}'

In [43]:
json.loads(time_square_marker.options)['radius']

6

```python
json.load?
Signature: json.load(fp, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
Docstring:
Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
a JSON document) to a Python object.

json.loads?
Signature: json.loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
Docstring:
Deserialize ``s`` (a ``str`` instance containing a JSON
document) to a Python object.
```

In [51]:
# check trip markers

In [46]:
trip_markers = markers_from_trips(cleaned_trips)

In [47]:
cleaned_trips[0:4]

[{'pickup_latitude': 40.64499,
  'pickup_longitude': -73.78115,
  'trip_distance': 18.38},
 {'pickup_latitude': 40.766931,
  'pickup_longitude': -73.982098,
  'trip_distance': 1.3},
 {'pickup_latitude': 40.77773,
  'pickup_longitude': -73.951902,
  'trip_distance': 4.5},
 {'pickup_latitude': 40.795678,
  'pickup_longitude': -73.971049,
  'trip_distance': 2.4}]

In [50]:
# check
[trip_marker.location for trip_marker in trip_markers[0:4]]

[[40.64499, -73.78115],
 [40.766931, -73.982098],
 [40.77773, -73.951902],
 [40.795678, -73.971049]]

In [53]:
# func: creat folium map from location
def map_from(location, zoom_amount):
    return folium.Map(location=location, zoom_start=zoom_amount)

In [54]:
time_square_map = map_from([40.7589, -73.9851],15)
time_square_map

In [55]:
time_square_map.location

[40.7589, -73.9851]

In [56]:
time_square_map.zoom_start

15

In [57]:
manhattan_map=map_from([40.7589, -73.9851], 13)
manhattan_map

In [75]:
# add list of trip_markers to a map_obj
def add_markers(markers, map_obj):
    for marker in markers:
        marker.add_to(map_obj)
    return map_obj  

In [76]:
map_with_markers=add_markers(trip_markers, manhattan_map)
map_with_markers

In [72]:
manhattan_map

In [68]:
map_with_markers

In [None]:
# Find nearest neighbors given location (latitude, longitude)


# Qs
1. how to limit output decimal points = 2?