# CITY LIFE. Geospatial analysis

### Task

In this project, you will try to get some valuable insights about customers’
and taxi drivers’ behavior. Maybe it will help a taxi company optimize its
business.
Here are the tasks that you need to do:
1. find out and visualize on a map most popular areas where people ordered a taxi as
well as where they headed to,
2. find out and visualize the most popular routes in different time intervals,
3. find in the dataset locations of the city infrastructure and visualize how the customers were arriving at one of them using an animated map,
4. visualize one day of a taxi driver and how much money he or she earned using an
animated map,
5. visualize one day of the city (working day and weekend day) using an animated
map.

### Imports

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
from IPython.utils import io

# with io.capture_output() as captured:
#     foo()

In [None]:
import geopandas as gpd
# import geoplot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import shapely
import folium
import seaborn as sns
import osmnx as ox
import networkx as nx
import time
import geopy
import taxicab as tc
import warnings

from tqdm import tqdm

import plotly.graph_objects as go


from mpl_toolkits.axes_grid1 import make_axes_locatable

from shapely.geometry import Polygon, LineString, Point, MultiPoint
from shapely.ops import unary_union

from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier, NearestNeighbors

from scipy.spatial import Voronoi, voronoi_plot_2d
from scipy.spatial import ConvexHull

from datetime import datetime
from datetime import timedelta

from geovoronoi import voronoi_regions_from_coords
from geovoronoi.plotting import subplot_for_map, plot_voronoi_polys_with_points_in_area

from kneed import KneeLocator

import plotly.express as px
from jupyter_dash import JupyterDash
from dash import dcc, html
# import dash_html_components as html
from dash.dependencies import Input, Output
from datetime import timezone
import pickle

from utils import *

### Load data

In [None]:
df_map = gpd.read_file('data/chicago_map.shx')

In [None]:
df_map.info()

In [None]:
rush = pd.read_csv('data/rush_hours_empty.csv',
                   parse_dates=['Trip End Timestamp'],
                   dtype={
                       'name': str,
                       'longitude': float,
                       'latitude': float,
                       'num_of_rides': int,
                         })

In [None]:
rush.info()

In [None]:
taxi_loc = pd.read_csv('data/taxi_locations.csv',
                       parse_dates=['Trip Start Timestamp', 
                                    'Trip End Timestamp'],
                       index_col='Trip ID',
                       dtype={
                           'Pickup Centroid Latitude': float,
                           'Pickup Centroid Longitude': float,
                           'Dropoff Centroid Latitude': float,
                           'Dropoff Centroid Longitude': float,
                           'Taxi ID': str,
                           'Fare': float  
                       }
                      )

In [None]:
taxi_loc['Taxi ID'] = taxi_loc['Taxi ID'].astype("string")

In [None]:
taxi_loc.info()

### Data preparation

Set crs to be able to combine dataframes

In [None]:
df_map.set_crs(epsg=4326, inplace=True)
crs = 'EPSG:4326'

Create folium map to visualize data in the shapefile (Chicago map)

In [None]:
m = folium.Map(location=[41.85, -87.65], zoom_start=10, tiles='CartoDB positron')

Add df_map data to folium map to visualize polygons (districts) on the real map

In [None]:
for _, r in df_map.iterrows():
    sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001)
    geo_j = sim_geo.to_json()
    geo_j = folium.GeoJson(data=geo_j,
                           style_function=lambda x: {'fillColor': 'orange'})
    geo_j.add_to(m)

In [None]:
m

### Dropoff and Pickup columns to geometry type

In [None]:
taxi_loc['Pickup Centroid Location'] = gpd.GeoSeries.from_wkt(taxi_loc['Pickup Centroid Location'])
taxi_loc['Dropoff Centroid  Location'] = gpd.GeoSeries.from_wkt(taxi_loc['Dropoff Centroid  Location'])

In [None]:
taxi_loc.info()

### Check for NaN values and duplicates

In [None]:
taxi_loc.isna().sum()

In [None]:
taxi_loc_dropna = taxi_loc.drop_duplicates()

In [None]:
taxi_loc_no_zero_trip = taxi_loc_dropna.drop(index=taxi_loc_dropna[taxi_loc_dropna['Pickup Centroid Location']==taxi_loc_dropna['Dropoff Centroid  Location']].index)

In [None]:
taxi_loc_no_zero_trip.info()

## 1. Most popular areas

### 1.1 Clustering analysis

Conduct clustering analysis of pick point and drop off locations based on their coordinates. Clusters might be different for each of the categories (pickpoints and drop-offs).

### Pickpoints

In [None]:
pickups = taxi_loc_no_zero_trip[['Pickup Centroid Latitude', 'Pickup Centroid Longitude', 'Pickup Centroid Location']]

In [None]:
pickup_coords = pickups[['Pickup Centroid Longitude', 'Pickup Centroid Latitude']]

We need to define the best number of clusters. In order to do this we can use elbow rule

#### K-means algorythm

##### Elbow curve

In [None]:
inertia=[]
for i in tqdm(range(1,20)):
    kmeans = KMeans(n_clusters=i,random_state=42, n_init='auto').fit(pickup_coords)
    inertia.append(kmeans.inertia_)
plt.plot(range(1,20),inertia)
plt.title('Elbow method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.xticks(np.arange(0,20,1))
plt.show()

4? clasters is optimal

##### Silhouette value

In [None]:
# score=-1
# k=0
# for i in tqdm(range(2,20)):
#     kmeans = KMeans(n_clusters=i,random_state=0, n_init='auto').fit(pickup_coords.values)
#     score2 = silhouette_score(pickup_coords.values,kmeans.predict(pickup_coords.values))
#     if score2>score:
#         score=score2
#         k=i
# print('The best number of clusters based on silhouette score is {} with a score of {}.'.format(k,score))

##### Elbow kneed lib

In [None]:
elbow_point = KneeLocator(range(1,20),inertia,curve='convex',direction='decreasing')
print(elbow_point.knee)

Fitting kmeans model for pickpoints

In [None]:
n_clusters = 10

In [None]:
kmeans_pick = KMeans(
    init="k-means++",
    n_clusters=n_clusters,
    n_init='auto',
    max_iter=300,
    random_state=42
)

In [None]:
with warnings.catch_warnings(record=True):

    kmeans_pick.fit(pickups[pickups.columns[0:2]]) # Compute k-means clustering.
    pickups['pickup_cluster_kmeans'] = kmeans_pick.fit_predict(pickups[pickups.columns[0:2]])
    pickups_centers = kmeans_pick.cluster_centers_ # Coordinates of cluster centers.
    pickups_labels = kmeans_pick.predict(pickups[pickups.columns[0:2]]) # Labels of each point
pickups.head(5)

Creating a geo dataframe from cluster centers points

In [None]:
pickups_centers = pd.DataFrame(pickups_centers, columns=['lat', 'long'])

In [None]:
pickup_centers_coords = pickups_centers[['long', 'lat']].values

In [None]:
geometry = [Point(xy) for xy in zip(pickups_centers['long'], pickups_centers['lat'])]
geo_pickups_centers = gpd.GeoDataFrame(pickups_centers, 
                          crs = crs, 
                          geometry = geometry)

In [None]:
geo_pickups_centers

#### DBSCAN algorythm

Another clustering algorythm, but too slow for our number of data

In [None]:
# pickup_coords_val = pickup_coords.values #Create a numpy array


In [None]:
# nearest_neighbors = NearestNeighbors(n_neighbors=6)
# neighbors = nearest_neighbors.fit(pickup_coords_val)
# distances, indices = neighbors.kneighbors(pickup_coords_val)
# distances = np.sort(distances[:,5], axis=0)
# i = np.arange(len(distances))
# knee = KneeLocator(i, distances, S=1, curve='convex', direction='increasing', interp_method='polynomial')
# # We use the polynomial interp_method in this one to get a smooth line
# fig = plt.figure(figsize=(5, 5))
# knee.plot_knee()
# plt.xlabel("Points")
# plt.ylabel("Distance")
# plt.show()
# print('The elbow point is at around {}'.format(round(distances[knee.knee],3)))

In [None]:
# dbscan = DBSCAN(eps=0.045)
# dbscan.fit(pickup_coords_val)
# dbscan_predictions = dbscan.labels_
# pickups['Cluster_DBSCAN'] = dbscan_predictions
# vectorizer = np.vectorize(lambda x: colors[x % len(colors)])
# # pickups['Colors_DBSCAN'] = vectorizer(dbscan_predictions)

Other clustering algos

Mean-shift algorythm, Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM), Agglomerative Hierarchical Clustering

### Drop-offs

In [None]:
dropoffs = taxi_loc_no_zero_trip[['Dropoff Centroid Latitude', 'Dropoff Centroid Longitude', 'Dropoff Centroid  Location']]
dropoff_coords = dropoffs[['Dropoff Centroid Latitude', 'Dropoff Centroid Longitude']]

#### Elbow curve

In [None]:
inertia=[]
for i in tqdm(range(1,20)):
    kmeans = KMeans(n_clusters=i,random_state=0, n_init='auto').fit(dropoff_coords)
    inertia.append(kmeans.inertia_)
plt.plot(range(1,20),inertia)
plt.title('Elbow method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.xticks(np.arange(0,20,1))
plt.show()

4 clusters is optimal

##### Elbow kneed lib

In [None]:
elbow_point = KneeLocator(range(1,20),inertia,curve='convex',direction='decreasing')
print(elbow_point.knee)

In [None]:
n_clusters = 15

In [None]:
kmeans_drop = KMeans(
    init="random",
    n_clusters=n_clusters,
    n_init='auto',
    max_iter=300,
    random_state=42
)

In [None]:
with warnings.catch_warnings(record=True):

    kmeans_drop.fit(dropoffs[dropoffs.columns[0:2]]) # Compute k-means clustering.
    dropoffs['dropoff_cluster_kmeans'] = kmeans_drop.fit_predict(dropoffs[dropoffs.columns[0:2]])
    dropoffs_centers = kmeans_drop.cluster_centers_ # Coordinates of cluster centers.
    dropoffs_labels = kmeans_drop.predict(dropoffs[dropoffs.columns[0:2]]) # Labels of each point
dropoffs.head(5)

In [None]:
dropoffs_centers = pd.DataFrame(dropoffs_centers, columns=['lat', 'long'])
dropoff_centers_coords = dropoffs_centers[['long', 'lat']].values

In [None]:
geometry = [Point(xy) for xy in zip(dropoffs_centers['long'], dropoffs_centers['lat'])]
geo_dropoffs_centers = gpd.GeoDataFrame(dropoffs_centers, 
                          crs = crs, 
                          geometry = geometry)

In [None]:
geo_dropoffs_centers

### 1.2 Cluster borders and centroids

Draw the borders and centroids of the clusters on a map. You will have two maps.

Prepare data and make polygons for pickup clusters (Voronoi diagram)

<!-- ### Pickups -->

In [None]:
geometry = pickups['Pickup Centroid Location']
geo_pickups = gpd.GeoDataFrame(pickups, 
                          crs = crs, 
                          geometry = geometry)

In [None]:
area = df_map.unary_union

In [None]:
geo_pickups_centers

In [None]:
with warnings.catch_warnings(record=True):
    pickup_polys, pickup_pts = voronoi_regions_from_coords(pickup_centers_coords, area)

In [None]:
pickup_polys_df = pd.DataFrame.from_dict(pickup_polys, orient='index')
pickup_polys_df = gpd.GeoDataFrame(pickup_polys_df, geometry=0)


Prepare data and make polygons fro dropoff clusters

In [None]:
geometry = dropoffs['Dropoff Centroid  Location']
geo_dropoffs = gpd.GeoDataFrame(dropoffs, 
                          crs = crs, 
                          geometry = geometry)

In [None]:
with warnings.catch_warnings(record=True):

    dropoff_polys, dropoff_pts = voronoi_regions_from_coords(dropoff_centers_coords, area)
    dropoff_polys_df = pd.DataFrame.from_dict(dropoff_polys, orient='index')
    dropoff_polys_df = gpd.GeoDataFrame(dropoff_polys_df, geometry=0)

In [None]:
# fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,figsize = (10,10))
# df_map.plot(ax=ax1, color='#554d4d', legend=True, )
# geo_pickups.plot(ax=ax1, cmap='tab20', column='pickup_cluster_kmeans', markersize=5)
# pickup_polys_df.plot(ax=ax1, cmap='tab20', column=0, alpha=0.8)
# ax1.set_title('Chicago pickpoints')
# geo_pickups_centers.plot(ax=ax1, color='#ce79a5', markersize=30)

# df_map.plot(ax=ax2, color='#554d4d', legend=True, )
# dropoff_polys_df.plot(ax=ax2, cmap='tab20', column=0, alpha=0.8)
# geo_dropoffs.plot(ax=ax2, cmap='tab20', column='dropoff_cluster_kmeans', markersize=5)
# ax2.set_title('Chicago dropoffs')
# geo_dropoffs_centers.plot(ax=ax2, color='#ce79a5', markersize=30)


# plt.show()

### 1.3. Largest number of pick points and drop-offs

<!-- ### 7687

Use a color scale to show which clusters (i.e. areas) have the largest number of pick
points and drop-offs.

#### Pickups

In [None]:
pickups_clusters = pickups.groupby('pickup_cluster_kmeans')['Pickup Centroid Location'].count()
pickups_clusters.sort_values(ascending=False)
pickups_clusters_df = pd.DataFrame(pickups_clusters)
pickups_clusters_df = pickups_clusters_df.rename(columns={'Pickup Centroid Location': 'kmeans_cluster_observations'}).reset_index()

In [None]:
pickup_polys_df_ = pickup_polys_df.reset_index().rename(columns={'index': 'pickup_cluster_kmeans'})
pickup_polys_df_ = pickup_polys_df_.merge(pickups_clusters_df, on='pickup_cluster_kmeans')

#### Dropoffs

In [None]:
dropoffs_clusters = dropoffs.groupby('dropoff_cluster_kmeans')['Dropoff Centroid  Location'].count()
dropoffs_clusters.sort_values(ascending=False)
dropoffs_clusters_df = pd.DataFrame(dropoffs_clusters)
dropoffs_clusters_df = dropoffs_clusters_df.rename(columns={'Dropoff Centroid  Location': 'kmeans_cluster_observations'}).reset_index()

In [None]:
dropoff_polys_df_ = dropoff_polys_df.reset_index().rename(columns={'index': 'dropoff_cluster_kmeans'})
dropoff_polys_df_ = dropoff_polys_df_.merge(dropoffs_clusters_df, on='dropoff_cluster_kmeans')

In [None]:
dropoff_polys_df_

In [None]:
# fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize = (12,10))
# df_map.plot(ax=ax1, color='#554d4d')
# # geo_pickups.plot(ax=ax1, cmap='Set2', column='pickup_cluster_kmeans', markersize=5)
# divider = make_axes_locatable(ax1)
# cax = divider.append_axes("right", size="5%", pad=0.1)
# pickup_polys_df_.plot(ax=ax1, cax=cax, cmap='viridis', column='kmeans_cluster_observations', alpha=0.8, legend=True)
# ax1.set_title('Chicago pickpoints')
# # geo_pickups_centers.plot(ax=ax1, color='#e2c1d2', markersize=30)

# df_map.plot(ax=ax2, color='#554d4d')
# divider = make_axes_locatable(ax2)
# cax = divider.append_axes("right", size="5%", pad=0.1)
# dropoff_polys_df_.plot(ax=ax2, cax=cax, cmap='viridis', column='kmeans_cluster_observations', alpha=0.8, legend=True)
# # geo_dropoffs.plot(ax=ax2, cmap='Set2', column='dropoff_cluster_kmeans', markersize=5)
# ax2.set_title('Chicago dropoffs')
# # geo_dropoffs_centers.plot(ax=ax2, color='#e2c1d2', markersize=30)


# plt.show()

## 2. Most popular routes

### 2.1 Clustering analysis

Conduct a clustering analysis of taxi rides based on the coordinates of the pick
point location and drop off.

In [None]:
routes = taxi_loc_no_zero_trip[['Pickup Centroid Longitude', 'Pickup Centroid Latitude', 'Dropoff Centroid Longitude', 'Dropoff Centroid Latitude']]
routes_val = routes.values

In [None]:
inertia=[]
for i in tqdm(range(1,20)):
    kmeans = KMeans(n_clusters=i,random_state=42, n_init='auto').fit(routes_val)
    inertia.append(kmeans.inertia_)
plt.plot(range(1,20),inertia)
plt.title('Elbow method')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.xticks(np.arange(0,20,1))
plt.show()

In [None]:
elbow_point = KneeLocator(range(1,20),inertia,curve='convex',direction='decreasing')
print(elbow_point.knee)

In [None]:
n_clusters = 10

In [None]:
kmeans_routes = KMeans(
    init="random",
    n_clusters=n_clusters,
    n_init='auto',
    max_iter=300,
    random_state=42
)

In [None]:
with warnings.catch_warnings(record=True):

    kmeans_routes.fit(routes_val) # Compute k-means clustering.
    routes['routes_cluster_kmeans'] = kmeans_routes.fit_predict(routes_val)
    routes_centers = kmeans_routes.cluster_centers_ # Coordinates of cluster centers.
    routes_labels = kmeans_routes.predict(routes_val) # Labels of each point
routes.head(5)

In [None]:
routes_centers_df = pd.DataFrame(routes_centers, columns=['Pickup Centroid Longitude', 'Pickup Centroid Latitude', 'Dropoff Centroid Longitude', 'Dropoff Centroid Latitude'])

In [None]:
geometry = [MultiPoint([(x,y),(j,k)]) for x,y,j,k in zip(routes_centers_df['Pickup Centroid Longitude'], 
                                           routes_centers_df['Pickup Centroid Latitude'],
                                          routes_centers_df['Dropoff Centroid Longitude'],
                                           routes_centers_df['Dropoff Centroid Latitude'])]
geo_routes_centers = gpd.GeoDataFrame(routes_centers_df, crs=crs, geometry=geometry)
geo_routes_centers = geo_routes_centers.reset_index()

In [None]:
routes_clusters = routes.groupby('routes_cluster_kmeans')['Pickup Centroid Longitude'].count()


In [None]:
top_clusters = routes_clusters.sort_values(ascending=False)[0:5].index

In [None]:
top_clusters

### 2.2 Centroids

Draw centroids of the top 5 popular clusters (routes).

In [None]:
geo_top_routes = geo_routes_centers.iloc[top_clusters]

In [None]:
fig, ax = plt.subplots(figsize = (12,10))
df_map.plot(ax=ax, color='#554d4d')
# divider = make_axes_locatable(ax1)
# cax = divider.append_axes("right", size="5%", pad=0.1)
geo_top_routes.plot(ax=ax, cmap='tab20', column='index',)
ax.set_title('Chicago routes')

plt.show()

### 2.3 Draw routes

Draw routes between centroids of a cluster – not a direct line from point A to point
B, but a path that takes into account the city streets.

In [None]:
ox.settings.log_console=True
ox.settings.use_cache=True
place = 'Chicago, United States'
mode = 'drive' # 'drive', 'bike', 'walk'
# graph = ox.graph_from_place(place, network_type = mode)

In [None]:
G = ox.graph_from_polygon(
    polygon = area, 
    retain_all=False, 
    network_type = mode,
#     truncate_by_edge=True, 
#     simplify=False,
)
G = ox.add_edge_speeds(G)
G = ox.add_edge_travel_times(G)
G = ox.utils_graph.get_largest_component(G, strongly=True)

In [None]:
top_routes = [get_route(G, (r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']), (r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude'])) for _, r in geo_top_routes.iterrows()]

In [None]:
top_routes_taxicab = [tc.distance.shortest_path(G, (r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']), (r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude'])) for _, r in geo_top_routes.iterrows()]

In [None]:
rc = ['r', 'y', 'c', 'g', 'white']
fig, ax = ox.plot_graph_routes(G, top_routes, route_colors=rc, node_size=0)

In [None]:
palette = sns.color_palette("tab10", 10).as_hex()

In [None]:
geo_top_routes.reset_index(drop=True, inplace=True)

In [None]:
rc = palette[:5]
route_map = ox.plot_route_folium(G, top_routes_taxicab[0][1], tiles='openstreetmap', color=rc[0], opacity=0.5)

for i in range(len(top_routes[1:])):
    route_map = ox.plot_route_folium(G, top_routes_taxicab[i+1][1], route_map=route_map, color=rc[i+1], opacity=0.5)
    
for i, r in geo_top_routes.iterrows():
    folium.CircleMarker(
        location=[r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']],
        radius=10,
        popup="Centroid pickpoint coords",
        color=rc[i],
#         fill=True,
#         fill_color="#3186cc",
    ).add_to(route_map)
    folium.CircleMarker(
        location=[r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude']],
        radius=10,
        popup="Centroid dropoff coords",
        color=rc[i],
        fill=True,
        fill_color=rc[i],
        fill_opacity=0.7,
    ).add_to(route_map)

In [None]:
route_map

## 3. City infrastructure

1. Find locations of the city infrastructure (airports, stadiums, parks, universities)
using the data and make up your own algorithm of how to find them. Find at least
6 of that kind of location.

In [None]:
def get_location_by_address(address):
    """This function returns a location as raw from an address
    will repeat until success"""
    time.sleep(1)
    try:
        return app.geocode(address).raw
    except:
        return get_location_by_address(address)

In [None]:
from geopy.geocoders import Nominatim

In [None]:
app = Nominatim(user_agent="CityLife")


In [None]:
address = "10000 West O'Hare Ave Chicago IL 60666"

In [None]:
# location = get_location_by_address(address)

In [None]:
# location = app.geocode(address).raw

2. Find the rush hour for each of the locations – timestamp when the location had the
largest number of dropoffs and save the information into that file : "rush_hours_empty.csv"
(in attachment).

3. Visualize one day of any of the locations including that rush hour showing how
people from different places were coming to the location and then were leaving it
on an animated map.

4. When the ride finishes, it should disappear on the map. In other words, it should
look like firing neurons in the brain.

5. The map should display time.

## 4. One day of a taxi driver

1. For the taxi driver with ID (2ea4ad2950f3bbdfdcfa7adb48e0dcee49d8a714b7024342f0302
eeb9e891dfd55a6f35bb7bc7af06398fb4f55583e1659cb11b432848296bfd2b7d3084e7de1)
visualize his or her rides during the day (2019-05-31). Display the current amount
of money earned.

2. When the ride finishes, it should NOT disappear on the animated map

3. Every time the ride finishes the counter of money should be updated.

4. The map should display time.

In [None]:
id_taxi = '2ea4ad2950f3bbdfdcfa7adb48e0dcee49d8a714b7024342f0302eeb9e891dfd55a6f35bb7bc7af06398fb4f55583e1659cb11b432848296bfd2b7d3084e7de1'

In [None]:
date = datetime.strptime('2019-05-31', '%Y-%m-%d')
end_date = date + timedelta(days=1)

In [None]:
date_range_taxi = pd.date_range(date, end_date, freq='15min', inclusive='left')

In [None]:
date_range_time = [x.strftime('%H:%M') for x in date_range_taxi]

Let's select trip data for a particular taxi ID and date

In [None]:
with warnings.catch_warnings(record=True):

    taxi_data = taxi_loc[taxi_loc['Taxi ID'] == id_taxi][(taxi_loc['Trip Start Timestamp'] >= date) & (taxi_loc['Trip Start Timestamp'] < end_date)]

In [None]:
taxi_data.sort_values('Trip Start Timestamp', inplace=True)

If we look at the data we can see that multiple trips for one taxi driver start at the same time, but have different ids and sometimes different dropoffs. We assume that this happens when a driver takes several passengers at the same time. Each passenger is assigned a distinct trip id value.
We need to make some aggregations bypass this situation

In [None]:
grouped1 = taxi_data.groupby(['Trip Start Timestamp',
                   'Trip End Timestamp',
                   'Pickup Centroid Latitude',
                  'Pickup Centroid Longitude',
                  'Dropoff Centroid Latitude',
                  'Dropoff Centroid Longitude'], as_index=False)['Fare'].sum()
grouped2 = grouped1.groupby('Trip Start Timestamp',  as_index=False).agg({
    'Trip End Timestamp': 'max',
    'Pickup Centroid Latitude': 'min',
    'Pickup Centroid Longitude': 'min',
    'Dropoff Centroid Latitude': 'max',
    'Dropoff Centroid Longitude': 'max',
    'Fare': 'sum'
})

Lets make a list of routes for all rides

In [None]:
taxi_routes = [get_route(G, (r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']), (r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude'])) for _, r in grouped2.iterrows()]

Adding routes and route's lat and long clumns to a dataframe

In [None]:
grouped2['route'] = pd.Series(taxi_routes)
grouped2['lat'], grouped2['long'] = zip(*grouped2['route'].map(get_nodes_coords))

In [None]:
grouped2['new_time_start'] = grouped2['Trip Start Timestamp'].apply(lambda x: x.strftime('%H:%M'))
grouped2['new_time_end'] = grouped2['Trip End Timestamp'].apply(lambda x: x.strftime('%H:%M'))


In [None]:
frames = []
slider_steps = []

lats, lons = [], []
money_earned = 0
for time in date_range_time:
    df = grouped2[grouped2.new_time_start == time]
#     data = []
    for i, r in df.iterrows():
        lat, lon = r['lat'], r['long']
        lats.extend(lat)
        lons.extend(lon)
        lats.append(None)
        lons.append(None)
        money_earned+=r['Fare']

    data = [go.Scattermapbox(
        name=f'Time: {time}',
#         mode="lines",
        lon=lons,
        lat=lats,
#         marker={'size': 10},
#         line=dict(width=4.5),
        connectgaps = False
        )
           ]
    annotation = dict(
        text=f"Money earned: {money_earned}",
        showarrow=False,
        ax=10,
        ay=-40,
        bordercolor="#c7c7c7",
        borderwidth=2,
        borderpad=4,
        bgcolor="#ff7f0e",
        opacity=0.8,
        font=dict(
            family="Courier New, monospace",
            size=16,
            color="#ffffff"
            ),
        align="center",
        )
    frames.append(go.Frame(
        data=data,
        name=time,
        layout=go.Layout(
            annotations=[annotation]
        )
    ))
    slider_step = {"args": [
        [time],
        {"frame": {"duration": 300, "redraw": True},
         "mode": "immediate",
         "transition": {
             "duration": 300,
                       }}
        ],
        "label": time,
        "method": "animate"}
    slider_steps.append(slider_step)
    
sliders_dict = {
    "active": 0,
    "yanchor": "top",
    "xanchor": "left",
    "currentvalue": {
        "font": {"size": 20},
        "prefix": "Time: ",
        "visible": True,
        "xanchor": "right"
    },
    "transition": {"duration": 300, "easing": "cubic-in-out"},
    "pad": {"b": 10, "t": 50},
    "x": 0,
    "y": 0,
    "steps": slider_steps
}
    
layout = go.Layout(
    mapbox_style="stamen-terrain",
    margin={"r":0,"t":0,"l":0,"b":0},
    mapbox = {
                'center': {'lat': 41.87, 
                'lon': -87.6},
                'zoom': 10
    },
    updatemenus=[
        dict(
            type="buttons",
            buttons=[
                dict(
                    label="Play",
                    method="animate",
                    args=[None]),
#                 dict(
#                     label="Pause",
#                     method="animate",
#                     args=[
#                         None,
#                         {"frame": {"duration": 0, "redraw": False},
#                          "mode": "immediate",
#                          "transition": {"duration": 0}}])
                    ]),

    ],
    sliders=[sliders_dict]
)                

fig = go.Figure(
    data=go.Scattermapbox({
     'connectgaps': False,
     'lat': [None],
     'line': {'width': 3.5},
     'lon': [None],
     'marker': {'size': 10},
     'mode': 'lines',
     'name': 'Time: 00:00'
    }),
    frames=frames,
    layout=layout
)

    
fig.show()

Plotting with Dash and plotly

In [None]:
# app = JupyterDash(__name__)
# app.layout = html.Div([
#     html.H4('One day of taxi driver'),
#     dcc.Graph(id="graph"),
#     html.Label(
#         'Time',
#         style={"margin-top": "30px"}
#     ),
#     dcc.RangeSlider(
#         id='time_start',
#         min=unixTimeMillis(date),
#         max=unixTimeMillis(end_date),
#         value = [unixTimeMillis(date), unixTimeMillis(end_date)],
#         marks = getMarks(date_range_taxi, 4)
#     ),
#     ])


# @app.callback(
#     Output("graph", "figure"),
#     [Input("time_start", "value")])
    
# def update_line_chart(time_start):
# #     time_start = time_start[0]
#     taxi_trips = grouped2[(grouped2['Trip Start Timestamp'].apply(unixTimeMillis) <= time_start[1]) 
#                           & (grouped2['Trip End Timestamp'].apply(unixTimeMillis) >= time_start[0])]
#     fig = go.Figure(go.Scattermapbox())
#     fig.update_layout(mapbox_style="stamen-terrain",
#                       mapbox_center_lat = 41.87,
#                       mapbox_center_lon=-87.6,
#                      )
#     fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0},
#                       mapbox = {
#                           'center': {'lat': 41.87, 
#                           'lon': -87.6},
#                           'zoom': 11})
#     money_earned = 0
#     for _, r in taxi_trips.iterrows():
#         lat, long = get_nodes_coords(r['route'])
#         plot_path(fig, lat,
#                         long,
#                         (r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']),
#                         (r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude']))
#         money_earned += r['Fare']
#     fig.add_annotation(
#         text=f"Money earned: {money_earned}",
#         showarrow=False,
#         ax=10,
#         ay=-40,
#         bordercolor="#c7c7c7",
#         borderwidth=2,
#         borderpad=4,
#         bgcolor="#ff7f0e",
#         opacity=0.8,
#         font=dict(
#             family="Courier New, monospace",
#             size=16,
#             color="#ffffff"
#             ),
#         align="center",
#         )
#     return fig


# app.run_server(mode='inline', port=8049)


## 5. One day of the city

1. Visualize all the rides in the city during the day (2019-05-16) on an animated map.

2. When the ride finishes, it should disappear on the map. In other words, it should
look like firing neurons in the brain.

3. The map should display time

In [None]:
city_day = datetime.strptime('2019-05-16', '%Y-%m-%d')
city_end_day = city_day + timedelta(days=1)

In [None]:
date_range_city = pd.date_range(start=city_day,end=city_end_day,freq='15min')

1. DO NOT DELETE!!! Create Dataframe for a city day with routes and save it to file. (Due to long time calculations keep commented unless changes are made to graph or dataframe selection criteria)

In [None]:
# day_rides = taxi_loc_no_zero_trip[(taxi_loc_no_zero_trip['Trip Start Timestamp'] >= city_day) & (taxi_loc_no_zero_trip['Trip Start Timestamp'] < city_end_day )]
# day_rides = day_rides.sort_values('Trip Start Timestamp', ascending=True)
# day_rides.reset_index(inplace=True)

In [None]:
# city_routes = [get_route(G, (r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']), (r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude'])) for _, r in day_rides.iterrows()]


In [None]:
# day_rides['route'] = pd.Series(city_routes)
# day_rides['lat'], day_rides['long'] = zip(*day_rides['route'].map(get_nodes_coords))

In [None]:
# day_rides.to_pickle('city_routes_df')

2. Or load previousely created dataframe from a file

In [None]:
day_rides = pd.read_pickle('city_routes_df')

In [None]:
day_rides.info()

Plot one day of a city with Dash plotly

In [None]:
day_rides['new_time'] = day_rides['Trip Start Timestamp'].apply(lambda x: x.strftime('%H:%M'))
time_range = day_rides['new_time'].unique()
frames = []
slider_steps = []

for time in time_range:
    df = day_rides[day_rides.new_time == time]
#     data = []
    lats, lons = [], []
    for i, r in df.iterrows():
        lat, lon = r['lat'], r['long']
        lats.extend(lat)
        lons.extend(lon)
        lats.append(None)
        lons.append(None)

    data = [go.Scattermapbox(
        name=f'Time: {time}',
#         mode="lines",
        lon=lons,
        lat=lats,
#         marker={'size': 10},
#         line=dict(width=4.5),
        connectgaps = False
        )
           ]
    frames.append(go.Frame(
        data=data,
        name=time
    ))
    slider_step = {"args": [
        [time],
        {"frame": {"duration": 300, "redraw": True},
         "mode": "immediate",
         "transition": {
             "duration": 300,
                       }}
        ],
        "label": time,
        "method": "animate"}
    slider_steps.append(slider_step)
    
sliders_dict = {
    "active": 0,
    "yanchor": "top",
    "xanchor": "left",
    "currentvalue": {
        "font": {"size": 20},
        "prefix": "",
        "visible": True,
        "xanchor": "right"
    },
    "transition": {"duration": 300, "easing": "cubic-in-out"},
    "pad": {"b": 10, "t": 50},
    "x": 0,
    "y": 0,
    "steps": slider_steps
}
    
layout = go.Layout(
    mapbox_style="stamen-terrain",
    margin={"r":0,"t":0,"l":0,"b":0},
    mapbox = {
                'center': {'lat': 41.87, 
                'lon': -87.6},
                'zoom': 10
    },
    updatemenus=[
        dict(
            type="buttons",
            buttons=[
                dict(
                    label="Play",
                    method="animate",
                    args=[None]),
#                 dict(
#                     label="Pause",
#                     method="animate",
#                     args=[
#                         None,
#                         {"frame": {"duration": 0, "redraw": False},
#                          "mode": "immediate",
#                          "transition": {"duration": 0}}])
                    ]),

    ],
    sliders=[sliders_dict]
)                

fig = go.Figure(
    data=go.Scattermapbox({
     'connectgaps': False,
     'lat': [None],
     'line': {'width': 3.5},
     'lon': [None],
     'marker': {'size': 10},
     'mode': 'lines',
     'name': 'Time: 00:00'
    }),
    frames=frames,
    layout=layout
)

    
fig.show()

In [None]:
# app = JupyterDash(__name__)
# app.layout = html.Div([
#     html.H4('One day of a city'),
#     dcc.Graph(id="graph"),
#     html.Label(
#         'Time',
#         style={"margin-top": "30px"}
#     ),
#     dcc.Slider(
#         id='time_start',
#         min=unixTimeMillis(city_day),
#         max=unixTimeMillis(city_end_day),
#         value = unixTimeMillis(city_day),
#         marks = getMarks(date_range_city, 4)
#     ),
#     ])


# @app.callback(
#     Output("graph", "figure"),
#     [Input("time_start", "value")])
    
# def update_line_chart(time_start):
#     taxi_trips = day_rides[(day_rides['Trip Start Timestamp'].apply(unixTimeMillis) <= time_start)
#                        & (day_rides['Trip End Timestamp'].apply(unixTimeMillis) >= time_start)]

#     fig = go.Figure(go.Scattermapbox())
#     fig.update_layout(mapbox_style="stamen-terrain",
#                      )
#     fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0},
#                       mapbox = {
#                           'center': {'lat': 41.87, 
#                           'lon': -87.6},
#                           'zoom': 11})

#     for _, r in taxi_trips.iterrows():
#         lat, long = r['lat'], r['long']
#         plot_path(fig, lat,
#                         long,
#                         (r['Pickup Centroid Latitude'], r['Pickup Centroid Longitude']),
#                         (r['Dropoff Centroid Latitude'], r['Dropoff Centroid Longitude']))

#     fig.add_annotation(
#         text=f"Time: {unixToDatetime(time_start)}",
#         showarrow=False,
#         ax=10,
#         ay=-40,
#         bordercolor="#c7c7c7",
#         borderwidth=2,
#         borderpad=4,
#         bgcolor="#ff7f0e",
#         opacity=0.8,
#         font=dict(
#             family="Courier New, monospace",
#             size=16,
#             color="#ffffff"
#             ),
#         align="center",
#         )
#     return fig


# app.run_server(mode='inline', port=8012)


In [None]:
day_rides

## Bonus

- Design your visualizations as a website with interactive elements (for example, where you can choose a date or driver id, choose a location, etc).
- Try to achieve a better result with the intersection – at least 5 elements.

Save some data for the site

In [None]:
geo_pickups_centers.to_pickle('data/pickup_centers_df')

In [None]:
taxi_loc_dropna.to_pickle('data/taxi_loc_dropdup_df')

In [None]:
!ls -lh

## Links

- https://www.kaggle.com/code/jrw2200/nyc-taxi-trips-clustering/notebook
- https://www.kaggle.com/code/bryanlambo/clustering-geolocation-data-taxi-rank-dataset

- https://levelup.gitconnected.com/clustering-gps-co-ordinates-forming-regions-4f50caa7e4a1
- https://gis.stackexchange.com/questions/439823/draw-polygons-around-a-set-of-points-and-create-clusters-in-python
- https://www.youtube.com/watch?v=7m0Bq1EGPPg
- https://github.com/Coding-with-Adam/Dash-by-Plotly/tree/master/Good_to_Know/Dash2.0/Jupyter
- https://medium.com/analytics-vidhya/clustering-taxi-geolocation-data-to-predict-location-of-taxi-service-stations-pt-1-2471303e0965
- https://geographicdata.science/book/notebooks/10_clustering_and_regionalization.html
- https://medium.com/codex/clustering-geographic-data-on-an-interactive-map-in-python-60a5d13d6452
- https://medium.com/thelorry-product-tech-data/the-clustering-algorithm-with-geolocation-data-d6dd07ed36a

- https://www.youtube.com/watch?v=V9dk3EqaK3k&list=PLh3I780jNsiTnCs2LNt4ckbV-c2HatCFg&index=10
- https://docs.google.com/document/d/1X7TJUt3UI1ZBgvVqGMahUJ3REgrSZ92bHJgX6nIdABM/edit

- https://plotly.com/python/maps/

- https://medium.com/analytics-vidhya/clustering-taxi-geolocation-data-to-predict-location-of-taxi-service-stations-pt-1-2471303e0965

- https://colab.research.google.com/github/yennanliu/yennj12_blog_V2/blob/master/_notebooks/2020-04-03-nyc-taxi-eda.ipynb

- https://towardsdatascience.com/clustering-in-geospatial-applications-which-model-should-you-use-59a039332c45

- https://hal.science/hal-02947181/document
- https://medium.com/nuances-of-programming/python-django-и-osrm-маршрут-на-интерактивной-онлайн-карте-df30c6edfbc5
- https://github.com/IshanJainAI/Location-Intelligence-GeoSpatial/blob/main/30%20Python%20Libraries%20for%20Location%20Intelligence.pdf
- https://geoffboeing.com/2016/11/osmnx-python-street-networks/

- https://habr.com/ru/post/502958/

- https://github.com/nathanrooy/taxicab
- https://towardsdatascience.com/clustering-geospatial-data-f0584f0b04ec

- https://towardsdatascience.com/find-and-plot-your-optimal-path-using-plotly-and-networkx-in-python-17e75387b873
- https://medium.com/analytics-vidhya/interative-map-with-osm-directions-and-networkx-582c4f3435bc
- https://developers.arcgis.com/python/guide/part2-find-routes/