# Culture venues clustering in Toulouse

--- 

**Author**: Akim van Eersel  
**Date**: 2020-12-15

## Urban places with a high population density

In Toulouse, France, like many other cities, some neighborhoods are well-known to have many bars and very close to each others.  
However the issue referred into this work concerns clustering but not with regard to bars. The theme is the one of culture and related places for the public.

**Since there are clusters of bars, are there geographic groupings of cultural points?**  
**And if so, are these cultural places more or less grouped according to their category?**

This analysis seeks to present a distribution of cultural places referenced on Foursquare, with a summary analysis on their geographical distribution and with respect to their category.   
*But it might be useful for getting more in-depth conclusions with additional work.*

## Data collection stage

+ The main points with their location and category will be retrieved from Foursquare databases using their URL API.
    + However, Foursquare's data is relatively biased since a very small fraction of all the cultural places is identified.

+ From [Data.toulouse-metropole](https://data.toulouse-metropole.fr/explore/dataset/comptages-pietons/information/?sort=annee&location=16,43.60208,1.44634&basemap=jawg.streets) webpage, `comptages-pietons` dataset counts the pedestrian flows in different streets of Toulouse, *from Toulouse Métropole, with last data input on 2020-02-13, is made available under the [Open Database License](http://opendatacommons.org/licenses/odbl/1.0/)*.
    + These points could help better to cluster the cultural venues.
    
---

+ From the Foursquare API, 29 sites were collected and grouped among 4 cultural category bins : `show`, `exposition-formation`, `play`, `monument`.
+ Pedestrain flow counts are made out of 3136 measures in 96 differents streets over 5 years.
    + After cleaning irrelevant values, and grouping by measurement addresses and years to get the median value, 79 data points remain.

## Foursquare cultural venues

*Note: Left click on point to get their name.*

In [15]:
# Latitude and longitude of Toulouse center
lat_tls = 43.6016
lng_tls = 1.4407

import folium

import json, requests
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
client_id='3HKNONV5WQ0YZJUOEYSE0221FMCZQDZHKNQ0W4ZG1P5XW2KW',
client_secret='0TIKF1OFORWQTIPFVVLMGHHQST3LPPTBLMKXATPVIUPFJXUU',
v='20180323',
ll=str(lat_tls) + ',' + str(lng_tls),
    
query='loisirs', # Ask for cultural venues, even if 'loisirs' in French means 'hobbies' 
                 # but that matches enough and that's the more appropriate Foursquare option
    
limit=200,
locale='fr',
radius=1800
)

resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
items = data['response']['groups'][0]['items']

from pandas import json_normalize

dataframe = json_normalize(items) # flatten JSON

filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

# Drop useless columns
dataframe_filtered.drop(columns=['labeledLatLngs','distance','postalCode','cc','city','state','country','formattedAddress','crossStreet','id'],
                       inplace=True)

subcategorie_labels = {
    # Watch a show
    "spectacle": ["Opéra", "Théâtre", "Salle de spectacle", "Complexe des arts de la scène", "Complexe des arts de la\xa0scène"],
    "cinema": ["Multiplexe", "Cinéma indépendant", "Cinéma"],
    "music": ["Salle de concert", "Auditorium"],

    # Visit a place
    "monument": ["Hôtel de ville", "Base militaire", "Église", "Site historique"],

    # Play
    "sport": ["Stade de basketball", "Stade de football", "Café des sports", "Salle de danse", "Stade"],
    "game": ["Salle d'arcade", "Parc d'attractions"],

    # Interact with culture
    "exposition-formation": ["Musée d'art", "Musée des sciences", "Galerie d'art", "Musée d'histoire", "École"],

    # Enjoy green
    "parc": ["Parc", "Jardin"]
}

hicategorie_labels = {
    "show": ["spectacle", "cinema", "music"],
    "monument": ["monument"],
    "play": ["sport", "game", "parc"],
    "exposition-formation": ["exposition-formation"]
}

# All sub-categories names
for i, cat_val in enumerate(subcategorie_labels.values()):
    filt = list(map(lambda cat_venu: cat_venu in cat_val,
                    dataframe_filtered['categories']))
    dataframe_filtered.loc[filt,'categories'] = list(subcategorie_labels.keys())[i]

# All higher-categories names
for i, cat_val in enumerate(hicategorie_labels.values()):
    filt = list(map(lambda cat_venu: cat_venu in cat_val,
                    dataframe_filtered['categories']))
    dataframe_filtered.loc[filt,'hicategories'] = list(hicategorie_labels.keys())[i]
    
from matplotlib import cm

# Color array by categories
keys = list(dataframe_filtered['hicategories'].unique())
color_range = list(np.linspace(0, 1, len(keys), endpoint=False))

colors = [cm.colors.to_hex(cm.tab10(x)) for x in color_range]
color_dict = dict(zip(keys, colors)) 

# ------------------------------
# Create FeatureGroup to add a legend layer
categories = dataframe_filtered['hicategories'].unique()
folium_lgd = dict()
for col, cat in zip(colors, categories):
    folium_lgd[cat] = folium.FeatureGroup(name='<span style=\\"color: {col};\\">{cat}</span>'.format(col=col, cat=cat))

# ------------------------------
# Generate the map
toulouse_map = folium.Map(location=[lat_tls, lng_tls], zoom_start=14, tiles="cartoDB positron")

for lat, lng, name, cat in zip(dataframe_filtered['lat'], dataframe_filtered['lng'],
                               dataframe_filtered['name'], dataframe_filtered['hicategories']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        fill=True,
        opacity=0,
        fill_color=color_dict[cat],
        fill_opacity=0.8,
        parse_html=False).add_to(folium_lgd[cat])
    folium_lgd[cat].add_to(toulouse_map)

folium.map.LayerControl('topright', collapsed=False).add_to(toulouse_map)

<folium.map.LayerControl at 0x15ec9221f40>

In [16]:
toulouse_map

## Pedestrian flow counts

*Note: Left click on point to get addresses and median value.*

In [23]:
## 

import geopandas
import pandas as pd

url = "https://data.toulouse-metropole.fr/explore/dataset/comptages-pietons/download/?format=geojson&timezone=Europe/Berlin&lang=fr"

tls_pedestrian = geopandas.read_file(url)

# Remove useless features
tls_pedestrian.drop(["transposee", "jours"], axis=1, inplace=True)


# Slice most recent data
filt = tls_pedestrian['annee'] >= 2015
tls_pedestrian = tls_pedestrian.loc[filt]


# Transform geometry feature type into a simple lat-lng list
tls_pedestrian['latitude'] = tls_pedestrian['geometry'].bounds['miny']
tls_pedestrian['longitude'] = tls_pedestrian['geometry'].bounds['minx']


# Change features types and names
tls_pedestrian['comptage'] = tls_pedestrian['comptage'].astype(int)
tls_pedestrian['annee'] = tls_pedestrian['annee'].astype(str)
tls_pedestrian['latitude'] = tls_pedestrian['latitude'].astype(str)
tls_pedestrian['longitude'] = tls_pedestrian['longitude'].astype(str)

tls_pedestrian.columns = ["day_time", "count", "address", "year", "geometry", "latitude", "longitude"]


# Format day_time feature labels
day_lab = ["journee", "midi", "matin"]
tls_pedestrian["day_time"] = list(map(lambda time: 'day' if time in day_lab else 'night',
                                      tls_pedestrian["day_time"]))


# ------------------------------
# Clean the index
tls_pedestrian.reset_index(drop=True, inplace=True)

# Grouping by addresses and years
tls_ped_grpby = tls_pedestrian.groupby(['address', 'year'], as_index=False).median()

# Remove null values
filt = tls_ped_grpby['count'] != 0
tls_ped_grpby = tls_ped_grpby[filt]

# Median values between years
tls_ped_grpby = tls_ped_grpby.groupby(['address'], as_index=False).median().round()


# ------------------------------
# Store and clean the final dataset 'tls_ped_grpby' without replacing the original 'tls_pedestrian'
tls_ped_grpby = tls_ped_grpby.join(
    tls_pedestrian[['address','latitude','longitude']].drop_duplicates().set_index('address'),
    on="address")

tls_ped_grpby.drop_duplicates('address', inplace=True)

import branca

# Hex color function to transform values from specified cmap, here Oranges
colorscale = branca.colormap.linear.Greys_09.scale(tls_ped_grpby['count'].min(), tls_ped_grpby['count'].max())

# ------------------------------
# Create FeatureGroup to add a legend layer
folium_lgd = dict()
for col, cat in zip(colors, categories):
    folium_lgd[cat] = folium.FeatureGroup(name='<span style=\\"color: {col};\\">{cat}</span>'.format(col=col, cat=cat))

# ------------------------------

# Initial layout map
toulouse_map = folium.Map(location=[lat_tls, lng_tls], zoom_start=14, tiles="cartoDB positron")

# Adding counted pedestrian spots with circles
for lat, lng, address, count in zip(tls_ped_grpby['latitude'], tls_ped_grpby['longitude'], tls_ped_grpby['address'], tls_ped_grpby['count']):
    label = '{} | Count : {}'.format(address, int(count))
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=(count/300)+2,
        popup=label,
        color='#F5B254',
        opacity=0.5,
        fill=True,
        fill_color=colorscale(count),
        fill_opacity=0.8,
        parse_html=False).add_to(toulouse_map)

In [24]:
toulouse_map

## Proximity criteria

The chosen way to link the two databases is to select the nearest pedestrian flow metrics from each Foursquare site. Here, this criterion will be a simple disk area centered on the different cultural places. If a point is present in an area of a disk, then it will be defined as being near the site at the center of the disk.

*Note: Use down arrow key to see the second part of this dual slide.*

In [29]:
##

int_range_slider = 140
# ------------------------------
# Create FeatureGroup to add a legend layer
folium_lgd = dict()
for col, cat in zip(colors, categories):
    folium_lgd[cat] = folium.FeatureGroup(name='<span style=\\"color: {col};\\">{cat}</span>'.format(col=col, cat=cat))

# ------------------------------

# Initial layout map
toulouse_map = folium.Map(location=[lat_tls, lng_tls], zoom_start=14, tiles="cartoDB positron")

# Adding cultural venues radius
for lat, lng, name, cat in zip(dataframe_filtered['lat'], dataframe_filtered['lng'], dataframe_filtered['name'], dataframe_filtered['categories']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=int_range_slider,
        fill=True,
        opacity=0,
        fill_color='#F09122',
        fill_opacity=0.5,
        parse_html=False).add_to(toulouse_map)

# Adding counted pedestrian spots with circles
for lat, lng, address, count in zip(tls_ped_grpby['latitude'], tls_ped_grpby['longitude'], tls_ped_grpby['address'], tls_ped_grpby['count']):
    label = '{} | Count : {}'.format(address, int(count))
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=(count/300)+2,
        popup=label,
        color='#F5B254',
        opacity=0.4,
        fill=True,
        fill_color=colorscale(count),
        fill_opacity=0.8,
        parse_html=False).add_to(toulouse_map)

# Adding cultural venues
for lat, lng, name, cat in zip(dataframe_filtered['lat'], dataframe_filtered['lng'],
                               dataframe_filtered['name'], dataframe_filtered['hicategories']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        fill=True,
        opacity=0,
        fill_color=color_dict[cat],
        fill_opacity=0.8,
        parse_html=False).add_to(folium_lgd[cat])
    folium_lgd[cat].add_to(toulouse_map)

folium.map.LayerControl('topright', collapsed=False).add_to(toulouse_map)

<folium.map.LayerControl at 0x15efffd8b80>

In [30]:
toulouse_map

### Filter Foursquare sites from proximity pedestrian flow counts 

In [31]:
##
import geopy.distance

# Select 
dist_venu_ped = dataframe_filtered[['name', 'lat', 'lng']]

# geopy.distance fct need a tuple of lat and lng, so a dedicated column is created
dist_venu_ped['coord'] = list(zip(dist_venu_ped['lat'], dist_venu_ped['lng']))

# For each Foursquare venue, distances with all ped flows are calculated
tempo = pd.DataFrame()
i = 0
for lat, lng in zip(tls_ped_grpby['latitude'], tls_ped_grpby['longitude']):
    tempo[i] = list(map(lambda coord: geopy.distance.geodesic(coord, (lat,lng)).meters,
                        dist_venu_ped['coord']))
    i += 1

# Store in columns named as ped flow addresses, all the distances calculated
tempo.columns = tls_ped_grpby['address']
tempo = tempo.applymap(int)

# Merge initial Foursquare venues df with the above distance df
dist_venu_ped = pd.concat([dist_venu_ped, tempo], axis='columns')

# For all addresses (start col 4), 
# is the distance less than or equal to the radius of the proximity area ?
filt = dist_venu_ped.iloc[:,4:] <= int_range_slider

# Filter addresses which aren't close enough from at least to 1 venue
filt = filt.sum() >= 1
prox_venu_ped = tls_ped_grpby[filt.reset_index(drop=True)] # need to drop the addresses index so it can be pass to the df

# For all addresses (start col 4), 
# is the distance less than or equal to the radius of the proximity area ?
filt = dist_venu_ped.iloc[:,4:] <= int_range_slider

# Filter Foursquare venues which aren't close enough from at least to 1 venue
filt = filt.sum(1) >= 1
venu_prox = dataframe_filtered[filt].reset_index(drop=True)

# ------------------------------
# Create FeatureGroup to add a legend layer
folium_lgd = dict()
for col, cat in zip(colors, categories):
    folium_lgd[cat] = folium.FeatureGroup(name='<span style=\\"color: {col};\\">{cat}</span>'.format(col=col, cat=cat))

# ------------------------------

# Initial layout map
toulouse_map = folium.Map(location=[lat_tls, lng_tls], zoom_start=14, tiles="cartoDB positron")

# Adding cultural venues radius
for lat, lng, name, cat in zip(venu_prox['lat'], venu_prox['lng'], venu_prox['name'], venu_prox['categories']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=int_range_slider,
        fill=True,
        opacity=0,
        fill_color='#F09122',
        fill_opacity=0.5,
        parse_html=False).add_to(toulouse_map)

# Adding counted pedestrian spots at Foursquare venues proximity
for lat, lng, address, count in zip(prox_venu_ped['latitude'], prox_venu_ped['longitude'], prox_venu_ped['address'], prox_venu_ped['count']):
    label = '{} | Count : {}'.format(address, int(count))
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=(count/300)+2,
        popup=label,
        color='#F5B254',
        opacity=0.4,
        fill=True,
        fill_color=colorscale(count),
        fill_opacity=0.8,
        parse_html=False).add_to(toulouse_map)

# Adding cultural venues
for lat, lng, name, cat in zip(venu_prox['lat'], venu_prox['lng'],
                               venu_prox['name'], venu_prox['hicategories']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        fill=True,
        opacity=0,
        fill_color=color_dict[cat],
        fill_opacity=0.8,
        parse_html=False).add_to(folium_lgd[cat])
    folium_lgd[cat].add_to(toulouse_map)

folium.map.LayerControl('topright', collapsed=False).add_to(toulouse_map)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dist_venu_ped['coord'] = list(zip(dist_venu_ped['lat'], dist_venu_ped['lng']))


<folium.map.LayerControl at 0x15ec9795880>

In [32]:
toulouse_map

## Machine learning clustering

### Without pedestrian flow counts

After finding the best fitting parameters with dendrograms selection for hierarchical algorithm, the map view of each venue colored by their cluster group is giving an interesting cleavage.

*Note: Use down arrow key to see the second part of this dual slide.*

In [34]:
##
# Replace the last "monument" category as "exposition-formation"
filt = venu_prox['hicategories'] == "monument"
venu_prox.loc[filt, 'hicategories'] = "exposition-formation"

# Filters to select proximity venues and flows counts addresses
filt = dist_venu_ped.iloc[:,4:] <= int_range_slider
filt_venu = filt.sum(1) >= 1

# ------------------------------
# Apply filters
dist_prox = dist_venu_ped
dist_prox.iloc[:,4:] = dist_venu_ped.iloc[:,4:][filt]

# ------------------------------
# Set flow count metrics per venue

# Closest location flow count
filt_min = dist_prox.iloc[:,4:].idxmin(1).dropna()
venu_prox['min_pedcount'] = prox_venu_ped.set_index('address').loc[filt_min, 'count'].reset_index(drop=True)

# Median flow count
# weird way, but my brain can't figure out different way for now
for i, row in enumerate(filt[filt_venu].T): #from proximate flow count (filt), get the rows (T) of proximate venues (filt_venu)
    filt_med = filt[filt_venu].columns[filt.iloc[row,:]]
    venu_prox.loc[i, 'median_pedcount'] = prox_venu_ped.set_index('address').loc[filt_med, 'count'].median()

# ------------------------------
# Adding a dummy variable, which points if 'hicategories' is a show (1) or not (0, ⇔ 'exposition-formation')
venu_prox['show_cat'] = (venu_prox['hicategories'] == "show")*1

from sklearn.cluster import AgglomerativeClustering

# ML algo
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'ward')
agglom.fit(dataframe_filtered[['lat','lng']])

# Allocate cluster labels
dataframe_filtered['hier_cluster_lab'] = agglom.labels_

# Sort df to create a legend map ordered
dataframe_filtered.sort_values('hier_cluster_lab', inplace=True)

# ------------------------------
keys = list(dataframe_filtered['hier_cluster_lab'].unique())
color_range = list(np.linspace(0, 1, len(keys), endpoint=False))

colors = [cm.colors.to_hex(cm.tab10(x)) for x in color_range]
color_dict = dict(zip(keys, colors)) 

# ------------------------------
# Create FeatureGroup to add a legend layer
folium_lgd = dict()
for col, cat in zip(colors, keys):
    folium_lgd[cat] = folium.FeatureGroup(name='<span style=\\"color: {col};\\">Cluster {cat}</span>'.format(col=col, cat=cat))

# ------------------------------

# Initial layout map
toulouse_map = folium.Map(location=[lat_tls, lng_tls], zoom_start=14, tiles="cartoDB positron")

# Adding cultural venues
for lat, lng, name, cat, clust in zip(dataframe_filtered['lat'], dataframe_filtered['lng'],
                               dataframe_filtered['name'], dataframe_filtered['hicategories'],
                               dataframe_filtered['hier_cluster_lab']):
    label = '{} | {}'.format(name, cat)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        fill=True,
        opacity=0,
        fill_color=color_dict[clust],
        fill_opacity=0.8,
        parse_html=False).add_to(folium_lgd[clust])
    folium_lgd[clust].add_to(toulouse_map)

folium.map.LayerControl('topright', collapsed=False).add_to(toulouse_map)

<folium.map.LayerControl at 0x15ecc227b20>

In [35]:
toulouse_map

### With pedestrian flow counts

On proximity venues categories, `monument` is single, it will be better to rebrand it with to a more relevant category from the two others reamining, which in this case is `exposition-formation`.   
There remains only two categories left: `show` and `exposition-formation` with respectively 7 and 5 venues.

In [37]:
## 
dum_dic = dict()
for hicat, dum in enumerate(dataframe_filtered['hicategories'].unique()):
    dum_dic[dum] = hicat

dataframe_filtered['dum_hicat'] = dataframe_filtered['hicategories'].apply(lambda hicat: dum_dic[hicat])

X1 = venu_prox[['lat','lng','show_cat','median_pedcount']]

agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'ward')

agglom.fit(X1)

venu_prox['hier_cluster_lab'] = agglom.labels_

# Sort df to create a legend map ordered
venu_prox.sort_values('hier_cluster_lab', inplace=True)

# ------------------------------
keys = list(venu_prox['hier_cluster_lab'].unique())
color_range = list(np.linspace(0, 1, len(keys), endpoint=False))

colors = [cm.colors.to_hex(cm.tab10(x)) for x in color_range]
color_dict = dict(zip(keys, colors)) 

# ------------------------------
# Create FeatureGroup to add a legend layer
folium_lgd = dict()
for col, cat in zip(colors, keys):
    folium_lgd[cat] = folium.FeatureGroup(name='<span style=\\"color: {col};\\">Cluster {cat}</span>'.format(col=col, cat=cat))

# ------------------------------

# Initial layout map
toulouse_map = folium.Map(location=[lat_tls, lng_tls], zoom_start=14, tiles="cartoDB positron")

# Adding cultural venues
for lat, lng, name, cat, met, clust in zip(venu_prox['lat'], venu_prox['lng'],
                               venu_prox['name'], venu_prox['hicategories'],
                               venu_prox['median_pedcount'], venu_prox['hier_cluster_lab']):
    label = '''{} | {} | {}'''.format(name, cat, met)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        fill=True,
        opacity=0,
        fill_color=color_dict[clust],
        fill_opacity=0.8,
        parse_html=False).add_to(folium_lgd[clust])
    folium_lgd[clust].add_to(toulouse_map)

folium.map.LayerControl('topright', collapsed=False).add_to(toulouse_map)

<folium.map.LayerControl at 0x15eced09fd0>

In [38]:
toulouse_map

## Discussion

+ Without pedestrian flow counts:
    - Considering only the geographical aspect, results seems surprisingly adequate. Generally speaking each category seems distributed with a dominance in one specific cluster. 
    - However, due to the representation bias of each category *(14-6-6-3)* and their low occurrences, clustering is necessarily less suited to certain category (such as `monument`).
    - Upper right cluster, n°1, includes 8 *(57%)* of `show` venues. This group is strongly linked both to the geographical layout and `show` sites. 
    - While for the other clusters, the geographical aspect seems to have a more important place in view of the distribution of categories.
    
---

+ With pedestrian flow counts:
    - Among the 4 clusters, the two venue categories are mainly distributed in a different cluster.  
    - However, with fewer data points, the usual upper right cluster mostly made of `show` vaguely remains but must be heavily sliced.
    - The median of the pedestrian flow counts plays an important role in the clustering effect, probably more than category feature.
        * **This data points and method doesn't show relevant or different conclusions from before. For now, adding pedestrian flow counts is too much impacting the data by reducing it. Thus, unfortunately but expected, this analysis is inconclusive.**

## Conclusion

> *So from the original problematic,*  
>**since there are clusters of bars, are there geographic groupings of cultural points?**  
>**And if so, are these cultural places more or less grouped according to their category?**

In a sense, it is possible to say that Toulouse has at least one cluster of cultural places, and that it is strongly linked to a cultural category, namely `show` places.   
But these data points do not make it possible to judge cultural venues clusters in the same way the bars/pubs clusters which are very dense and less dispersed.

**However, adding categories and/or pedestrain flows counts in clustering algorithms are not adding any value.**