## K-visitation

This is the code accompanying our paper 'Recurrent visitations expose the paradox of human mobility in the 15-Minute City vision'. 

In this paper, we introduce the novel *K-Visitation* framework, which quantifies the minimal number of visitations, K, required for an individual to satisfy all essential amenity exposure. This framework operates under two orderings: (i) K-Frequency ($K_{Freq}$), in which visitations are selected by empirical visitation frequency, characterising their recurrent behaviour, and (ii) K-Distance ($K_{Dist}$), in which visitations are selected by proximity to home, representing their proximate baseline scenario.

This notebook provides an operationalised sample of this framework using an anonymised dataset, which does not contain any identifable spatial or user information.

### Load libraries and data

In [1]:
import sys
import os
from pathlib import Path
import pandas as pd
import numpy as np
from tqdm import tqdm

# Setup project root path
project_root = Path.cwd()
if 'notebooks' in str(project_root):
    project_root = project_root.parent

# Add src to path
src_path = str(project_root / 'src')
if src_path not in sys.path:
    sys.path.insert(0, src_path)

# Set data path
data_dir = project_root / 'data'

In [2]:
# Load the anonymised stay location data sample

# For privacy reasons, we excluded spatial informations from this dataset, and user_ids were hashed. 
# Visitation attributes needed have been added in previous steps, which includes:
# 1. Euclidean distance from visitation to home location;
# 2. Frequency of visitation (within the same H3 hexagon);
# 3. Number of POIs in each category within 400m buffer of each visitation.
# Therefore, it is not possible to identify the exact locations of stay points from the given sampled data.

data_path = data_dir / 'sample_anonymised_data.csv'
stay_locations = pd.read_csv(data_path)
stay_locations.head()

Unnamed: 0,user_id,visit_freq,home_dist,work_dist,CIVIC_RELIGION,CULTURE,DINING,EDUCATION,FITNESS,GROCERIES,HEALTHCARE,PARK,RETAIL,SERVICE,TOURISM,TRANSPORT
0,084cd8d4-91be-4239-a740-4d703da8277a,4,40988.573884,55.286772,0,0,0,0,0,0,0,0,0,0,0,1
1,084cd8d4-91be-4239-a740-4d703da8277a,12,13982.441687,31973.660751,0,0,1,4,0,0,0,0,0,0,0,0
2,084cd8d4-91be-4239-a740-4d703da8277a,2,4791.578636,29.420798,0,0,7,15,4,0,4,0,3,9,0,7
3,084cd8d4-91be-4239-a740-4d703da8277a,2,5618.226375,36452.489513,0,1,3,6,0,2,2,0,2,8,0,0
4,084cd8d4-91be-4239-a740-4d703da8277a,2,3283.091357,1586.24919,0,1,5,8,3,0,5,1,3,0,1,2


### Framework operation

In [3]:
from k_visitation import calculate_both_k_places

In [4]:
amenity_list = [
    'CIVIC_RELIGION', 'CULTURE', 'DINING', 'EDUCATION', 'FITNESS', 'GROCERIES', 'HEALTHCARE', 'RETAIL', 'SERVICE', 'TRANSPORT'
    ]

smallest_values = np.ones(len(amenity_list), dtype=int)


# Apply the K-places calculation
places_k = calculate_both_k_places(stay_locations, amenity_list, smallest_values)

places_k.head()

Unnamed: 0,user_id,visit_freq,home_dist,work_dist,CIVIC_RELIGION,CULTURE,DINING,EDUCATION,FITNESS,GROCERIES,HEALTHCARE,PARK,RETAIL,SERVICE,TOURISM,TRANSPORT,k_dist,k_dist_status,k_freq,k_freq_status
0,084cd8d4-91be-4239-a740-4d703da8277a,21,54.015246,40978.83876,1,0,2,0,1,0,0,0,0,0,2,3,1,complete,1,complete
1,084cd8d4-91be-4239-a740-4d703da8277a,12,13982.441687,31973.660751,0,0,1,4,0,0,0,0,0,0,0,0,0,complete,1,complete
2,084cd8d4-91be-4239-a740-4d703da8277a,5,2194.577648,11775.266646,2,4,22,8,12,1,14,0,16,32,2,19,1,complete,1,complete
3,084cd8d4-91be-4239-a740-4d703da8277a,4,2269.91507,39707.062529,0,2,5,8,5,1,3,0,2,8,2,15,0,complete,0,complete
4,084cd8d4-91be-4239-a740-4d703da8277a,4,40988.573884,55.286772,0,0,0,0,0,0,0,0,0,0,0,1,0,complete,0,complete


In [5]:
# Check per-user K-freq and K-dist places (deduplicating stay_gid10 per user)
user_k_counts = (
    places_k.groupby("user_id")
    .agg(
        k_freq_places=("k_freq", lambda s: (s == 1).sum()),
        k_dist_places=("k_dist", lambda s: (s == 1).sum()),
    )
    .reset_index()
)

print(user_k_counts.head())

                                user_id  k_freq_places  k_dist_places
0  084cd8d4-91be-4239-a740-4d703da8277a              3              3
1  1b77f881-e3a5-4937-aa4b-bda8419cbb69              7              7
2  29a55739-f004-2791-5ec4-d0792d22f7b5              1              1
3  2ae0928e-22e9-40ca-ad99-d306f12c433c              6              8
4  31c74126-dbdc-4c0b-b896-d5467cb49c65              6              7


### Alignment between recurrent and proximate mobility

In [6]:
from k_visitation import calculate_qk_alignment

In [7]:
# Calculate qK alignment using the no-work variants
user_qk = calculate_qk_alignment(
    places_df=places_k,
    user_col='user_id', 
    k_freq_col='k_freq',  # Using no-work version
    k_dist_col='k_dist'   # Using no-work version
)

In [8]:
user_qk.head()

k_type,user_id,f0d0,f0d1,f1d0,f1d1,qk
0,084cd8d4-91be-4239-a740-4d703da8277a,13,1,1,2,0.5
1,1b77f881-e3a5-4937-aa4b-bda8419cbb69,0,0,0,7,1.0
2,29a55739-f004-2791-5ec4-d0792d22f7b5,11,0,0,1,1.0
3,2ae0928e-22e9-40ca-ad99-d306f12c433c,6,3,1,5,0.555556
4,31c74126-dbdc-4c0b-b896-d5467cb49c65,1,3,2,4,0.444444
