## Documentations:
- Mapbox Mapmatching general doc: https://www.mapbox.com/api-documentation/?language=Python#match-object
- Mapmatching Python doc: https://github.com/mapbox/mapbox-sdk-py/blob/master/docs/mapmatching.md#map-matching
- GeoJson doc: http://python-geojson.readthedocs.io/en/latest/#geojson-objects

## Limitation:
- only 100 coordinates per request (hence there is need to dissect some trips)
- only 60 requests per minute
- "works best with a sample rate of 5 seconds between points" (hence not every point in the data is used) [Not being implemented yet]

### GeoJson Object Requirements:
- type: feature
- properties: coordinate times
- LineString (for each coordinate, longitude first then latitude)

### Mapbox Mapmatching API return
The API returns a response object that can be converted to JSON. The response will have several matches if the data are not clean. Each sub-match will indicate a confidence ranged between 0 and 1, with closer to 1 being higher confidence. The response also includes tracepoints, which indicate what points are being matched and the route name for the matched route.

### Steps:
1. Select samples that are 5 seconds apart [not done yet]
2. convert csv to Geojson (see requirement above; each trip returns one GeoJson object)
3. Make API requests
3. Select the best matching (if confidence is not too low) and store the routes for that trip
4. Special cases:
    - more than 100 coordinates in one trip (need to stack them together)
    - too many submatches or submatches have very low confidence

In [1]:
import pandas as pd
from mapbox import MapMatcher
import json
import requests
from geojson import Point, Feature, LineString

import numpy as np
from scipy import stats, integrate
import matplotlib.pyplot as plt
import seaborn as sns

import ciso8601
import time
import datetime

In [2]:
MAPBOX_ACCESS_KEY = ''
service = MapMatcher(access_token=MAPBOX_ACCESS_KEY)

In [3]:
ROUTE_URL = "https://api.mapbox.com/matching/v5/mapbox/cycling/{0}.json?access_token={1}&overview=full&geometries=geojson&timestamps={2}"

<br\><br\>

In [4]:
all_trips = pd.DataFrame.from_csv("trips_first_1000.csv")
all_trips = all_trips.drop(['haccuracy', 'vaccuracy', 'dat', 'tim', 'dist.tostart','dist.toend', 'partial.dist'], 1)
all_trips['recorded'] = pd.to_datetime(all_trips['recorded'])
all_trips['UNIXtime'] = all_trips['recorded'].apply(lambda x: int(time.mktime(x.timetuple())))

In [42]:
count_df = all_trips.groupby('trip_id').count()

In [43]:
trips_less100 = count_df[count_df['recorded'] <= 100].index ## this is actually trip ids...
trips_more100 = count_df[count_df['recorded'] > 100].index
print('trips with 100 or less coordinates', len(trips_less100))
print('trips with more than 100 coordinates', len(trips_more100))

trips with 100 or less coordinates 63
trips with more than 100 coordinates 775


<br\><br\><br\>

In [136]:
def process_one_trip(df = pd.DataFrame(),trip_id = 0):
    sum_list = [[],[]]
    while df.shape[0] != 0:
        if (df.shape[0]/100) >= 1:
            new_response = get_route_data(df[:100])
        else:
            new_response = get_route_data(df)

        new_list = process_response(new_response, 700)
        sum_list[0] = sum_list[0] + new_list[0]
        sum_list[1] = sum_list[1] + new_list[1]
        df = df.drop(df.index[:100])
    return sum_list

In [24]:
def create_route_url(list_coor, list_time):
    lat_longs = ";".join(["{0},{1}".format(point[0], point[1]) for point in list_coor])
    timestamps = ";".join([str(t) for t in list_time])
    url = ROUTE_URL.format(lat_longs, MAPBOX_ACCESS_KEY, timestamps)
    return url

In [72]:
def get_route_data(df = pd.DataFrame()):
    list_coor = list(zip(list(df['longitude']), list(df['latitude'])))
    list_time = list(df['UNIXtime'])
    route_url = create_route_url(list_coor, list_time)
    result = requests.get(route_url)
    return (result.json())

In [80]:
def process_response(response={}, trip_id = 0):
    trip_coor = [element['geometry']['coordinates'] for element in response['matchings']]
    trip_coor = set([item for sublist in trip_coor for item in sublist])
    trip_tracepoints = set([element['location'] for element in response['tracepoints'] if (element != None)])
    return ([trip_coor, trip_tracepoints])

In [16]:
def select_5secs_apart(df = pd.DataFrame()):
    df = df.set_index('recorded')
    df['deltaT'] = df.index.to_series().diff().dt.seconds.fillna(0)
    df = df.reset_index()
    checker = 0
    delete_list = []
    for i in df.index:
        if i == df.index[-1] or i == df.index[0]:
            continue
        checker = checker + df.iloc[i]['deltaT']
        if checker < 5:
            delete_list.append(i)
        if checker >= 5:
            checker = 0
    df = df.drop(delete_list)
    return df

In [141]:
### NEEDS FIXING!!!!
def export_to_csv(matchpoints_tracepoints_list):
    new_df = pd.DataFrame(trip_coor)
    new_df = new_df.rename(columns={0: "longitude", 1: "latitude"})
    new_df = new_df.reindex(columns=["latitude","longitude"])
    file_name = 'trip_' + str(trip_id) + "_coor.csv"
    new_df.to_csv(file_name)
    