## K-visitation

This is the code accompanying our paper 'Recurrent visitations expose the paradox of human mobility in the 15-Minute City vision'. 

In this paper, we introduce the novel *K-Visitation* framework, which quantifies the minimal number of visitations, K, required for an individual to satisfy all essential amenity exposure. This framework operates under two orderings: (i) K-Frequency ($K_{Freq}$), in which visitations are selected by empirical visitation frequency, characterising their recurrent behaviour, and (ii) K-Distance ($K_{Dist}$), in which visitations are selected by proximity to home, representing their proximate baseline scenario.

This notebook provides an operationalised sample of this framework using an anonymised dataset, which does not contain any identifable spatial or user information.

### Load libraries and data

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm

In [2]:
# Load the anonymised stay location data sample

# For privacy reasons, we excluded spatial informations from this dataset, and user_ids were hashed. 
# Visitation attributes needed have been added in previous steps, which includes:
# 1. Euclidean distance from visitation to home location;
# 2. Frequency of visitation (within the same H3 hexagon);
# 3. Number of POIs in each category within 400m buffer of each visitation.
# Therefore, it is not possible to identify the exact locations of stay points from the given sampled data.

data_path = 'sample-data'
stay_locations = pd.read_csv(f"{data_path}/sample_anonymised_data.csv")
stay_locations.head()

Unnamed: 0,user_id,visit_freq,home_dist,work_dist,CIVIC_RELIGION,CULTURE,DINING,EDUCATION,FITNESS,GROCERIES,HEALTHCARE,PARK,RETAIL,SERVICE,TOURISM,TRANSPORT
0,084cd8d4-91be-4239-a740-4d703da8277a,4,40988.573884,55.286772,0,0,0,0,0,0,0,0,0,0,0,1
1,084cd8d4-91be-4239-a740-4d703da8277a,12,13982.441687,31973.660751,0,0,1,4,0,0,0,0,0,0,0,0
2,084cd8d4-91be-4239-a740-4d703da8277a,2,4791.578636,29.420798,0,0,7,15,4,0,4,0,3,9,0,7
3,084cd8d4-91be-4239-a740-4d703da8277a,2,5618.226375,36452.489513,0,1,3,6,0,2,2,0,2,8,0,0
4,084cd8d4-91be-4239-a740-4d703da8277a,2,3283.091357,1586.24919,0,1,5,8,3,0,5,1,3,0,1,2


### Framework operation

In [3]:
def calculate_k_places(places_df, amenity_list, smallest_values, 
                                sort_column='home_dist', ascending=True, 
                                k_type='k_dist', min_places_per_user=1):
    """
    Calculate K-places using corrected greedy algorithm
    
    Scenarios:
    1. Complete: Requirements satisfied - mark cumulative places as 1, stop
    2. Incomplete: Requirements not satisfied - mark ALL places as 1
    
    Parameters:
    -----------
    places_df : DataFrame
        Places dataframe with user_id and amenity columns
    amenity_list : list
        List of amenity column names
    smallest_values : Series or array
        Minimum required values for each amenity
    sort_column : str
        Column to sort by ('home_dist' for K-dist, 'visit_freq' for K-freq)
    ascending : bool
        Sort order (True for distance, False for frequency)
    k_type : str
        Type identifier for output column
    min_places_per_user : int
        Minimum number of places to select per user
    
    Returns:
    --------
    DataFrame : Original dataframe with K-place indicators added
    """
    
    print(f"Calculating {k_type} places (sorted by {sort_column})...")
    
    # Sort data
    places_sorted = places_df.sort_values(
        by=['user_id', sort_column], 
        ascending=[True, ascending]
    ).reset_index(drop=True)
    
    # Fill missing amenity values
    places_sorted[amenity_list] = places_sorted[amenity_list].fillna(0)
    
    # Convert to numpy for faster computation
    smallest_values_np = smallest_values.to_numpy() if hasattr(smallest_values, 'to_numpy') else np.array(smallest_values)
    
    # Initialize result arrays
    k_indicator = np.zeros(len(places_sorted), dtype=np.int8)
    k_status = np.full(len(places_sorted), 'unassigned', dtype=object)
    
    # Group by user for processing
    user_groups = places_sorted.groupby('user_id')
    total_users = len(user_groups)
    
    print(f"Processing {total_users} users...")
    
    for user_id, user_data in tqdm(user_groups, desc=f"Processing {k_type}", unit="users"):
        indices = user_data.index.tolist()
        user_poi = user_data[amenity_list].to_numpy()
        
        # Initialize tracking variables
        total_poi_access = np.zeros_like(smallest_values_np)
        k_user = np.zeros(len(indices), dtype=np.int8)
        requirements_met = False
        
        # Process each place for this user to find completion point
        for idx, row_poi in enumerate(user_poi):
            # Add current place's amenities
            total_poi_access += row_poi
            
            # Check if requirements are met after adding this place
            is_complete = np.all(total_poi_access >= smallest_values_np)
            
            if is_complete:
                # SCENARIO 1: COMPLETE - Mark cumulative places (0 to idx) as K-places
                k_user[:idx+1] = 1
                requirements_met = True
                break
        
        # SCENARIO 2: INCOMPLETE - If requirements not met after all places
        if not requirements_met:
            # Mark ALL places as K-places
            k_user[:] = 1
        
        # Assign results back to main arrays
        for i, idx in enumerate(indices):
            k_indicator[idx] = k_user[i]
        
        # Determine completion status
        if requirements_met:
            status = 'complete'    # Requirements fully met
        else:
            status = 'incomplete'  # Requirements not met, all places selected
        
        # Apply status to all places for this user
        for idx in indices:
            k_status[idx] = status
    
    # Add results to dataframe
    places_sorted[f'{k_type}'] = k_indicator
    places_sorted[f'{k_type}_status'] = k_status
    
    # Summary statistics
    total_k_places = k_indicator.sum()
    users_complete = (places_sorted.groupby('user_id')[f'{k_type}_status'].first() == 'complete').sum()
    users_incomplete = (places_sorted.groupby('user_id')[f'{k_type}_status'].first() == 'incomplete').sum()
    
    print(f"{k_type} calculation complete!")
    print(f"Total {k_type} places identified: {total_k_places}")
    print(f"Users with complete coverage: {users_complete}")
    print(f"Users with incomplete coverage: {users_incomplete}")
    
    return places_sorted

In [4]:
# Wrapper function for calculating both K-dist and K-freq
def calculate_both_k_places(places_df, amenity_list, smallest_values):
    """Calculate both K-dist and K-freq places with corrected logic"""
    
    print("ðŸ”„ CALCULATING K-PLACES (CORRECTED)")
    print("=" * 50)
    
    # Calculate K-dist places (sorted by distance, ascending)
    places_with_kdist = calculate_k_places(
        places_df=places_df,
        amenity_list=amenity_list,
        smallest_values=smallest_values,
        sort_column='home_dist',
        ascending=True,
        k_type='k_dist',
        min_places_per_user=1
    )
    
    print("\n" + "-" * 50)
    
    # Calculate K-freq places (sorted by frequency, descending)
    places_with_both = calculate_k_places(
        places_df=places_with_kdist,
        amenity_list=amenity_list,
        smallest_values=smallest_values,
        sort_column='visit_freq',
        ascending=False,
        k_type='k_freq',
        min_places_per_user=1
    )
    
    return places_with_both

In [5]:
amenity_list = [
    'CIVIC_RELIGION', 'CULTURE', 'DINING', 'EDUCATION', 'FITNESS', 'GROCERIES', 'HEALTHCARE', 'RETAIL', 'SERVICE', 'TRANSPORT'
    ]

smallest_values = np.ones(len(amenity_list), dtype=int)


# Apply the K-places calculation
places_k = calculate_both_k_places(stay_locations, amenity_list, smallest_values)

places_k.head()

ðŸ”„ CALCULATING K-PLACES (CORRECTED)
Calculating k_dist places (sorted by home_dist)...
Processing 30 users...


Processing k_dist: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:00<00:00, 2141.95users/s]


k_dist calculation complete!
Total k_dist places identified: 125
Users with complete coverage: 28
Users with incomplete coverage: 2

--------------------------------------------------
Calculating k_freq places (sorted by visit_freq)...
Processing 30 users...


Processing k_freq: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:00<00:00, 3906.16users/s]

k_freq calculation complete!
Total k_freq places identified: 107
Users with complete coverage: 28
Users with incomplete coverage: 2





Unnamed: 0,user_id,visit_freq,home_dist,work_dist,CIVIC_RELIGION,CULTURE,DINING,EDUCATION,FITNESS,GROCERIES,HEALTHCARE,PARK,RETAIL,SERVICE,TOURISM,TRANSPORT,k_dist,k_dist_status,k_freq,k_freq_status
0,084cd8d4-91be-4239-a740-4d703da8277a,21,54.015246,40978.83876,1,0,2,0,1,0,0,0,0,0,2,3,1,complete,1,complete
1,084cd8d4-91be-4239-a740-4d703da8277a,12,13982.441687,31973.660751,0,0,1,4,0,0,0,0,0,0,0,0,0,complete,1,complete
2,084cd8d4-91be-4239-a740-4d703da8277a,5,2194.577648,11775.266646,2,4,22,8,12,1,14,0,16,32,2,19,1,complete,1,complete
3,084cd8d4-91be-4239-a740-4d703da8277a,4,2269.91507,39707.062529,0,2,5,8,5,1,3,0,2,8,2,15,0,complete,0,complete
4,084cd8d4-91be-4239-a740-4d703da8277a,4,40988.573884,55.286772,0,0,0,0,0,0,0,0,0,0,0,1,0,complete,0,complete


In [6]:
# Check per-user K-freq and K-dist places (deduplicating stay_gid10 per user)
user_k_counts = (
    places_k.groupby("user_id")
    .agg(
        k_freq_places=("k_freq", lambda s: (s == 1).sum()),
        k_dist_places=("k_dist", lambda s: (s == 1).sum()),
    )
    .reset_index()
)

print(user_k_counts.head())

                                user_id  k_freq_places  k_dist_places
0  084cd8d4-91be-4239-a740-4d703da8277a              3              3
1  1b77f881-e3a5-4937-aa4b-bda8419cbb69              7              7
2  29a55739-f004-2791-5ec4-d0792d22f7b5              1              1
3  2ae0928e-22e9-40ca-ad99-d306f12c433c              6              8
4  31c74126-dbdc-4c0b-b896-d5467cb49c65              6              7


### Alignment between recurrent and proximate mobility

In [7]:
def calculate_qk_alignment(places_df, user_col='user_id', k_freq_col='k_freq', k_dist_col='k_dist'):
    """
    Calculate qK alignment using both Jaccard similarity for each user
    
    This function:
    1. Classifies each place into categories based on K-freq and K-dist indicators
    2. Calculates Jaccard similarity for each user
    3. Returns user-level alignment metrics and place categorizations
    
    Parameters:
    -----------
    places_df : DataFrame
        Stay locations dataframe with user_id and K-place indicators
    user_col : str
        Column name for user identifier
    k_freq_col : str
        Column name for K-freq indicator (0 or 1)
    k_dist_col : str
        Column name for K-dist indicator (0 or 1)
    
    Returns:
    --------
    tuple : (user_alignment_df, places_with_categories_df)
        - user_alignment_df: User-level qK metrics
        - places_with_categories_df: Original dataframe with place categories added
    """
    
    print(f"ðŸ”„ CALCULATING qK ALIGNMENT METRICS")
    print("=" * 60)
    
    # Create a copy to avoid modifying original data
    places_analysis = places_df.copy()
    
    # Ensure K-place indicators are binary (0 or 1)
    places_analysis[k_freq_col] = places_analysis[k_freq_col].fillna(0).astype(int)
    places_analysis[k_dist_col] = places_analysis[k_dist_col].fillna(0).astype(int)
    
    print(f"Total places: {len(places_analysis)}")
    print(f"Users: {places_analysis[user_col].nunique()}")
    
    # Step 1: Assign place categories based on K-freq and K-dist values
    def get_place_category(row):
        k_freq = row[k_freq_col]
        k_dist = row[k_dist_col]
        
        if k_freq == 1 and k_dist == 1:
            return 'f1d1'  # Both methods identify this place
        elif k_freq == 1 and k_dist == 0:
            return 'f1d0'  # Only K-freq identifies this place
        elif k_freq == 0 and k_dist == 1:
            return 'f0d1'  # Only K-dist identifies this place
        elif k_freq == 0 and k_dist == 0:
            return 'f0d0'  # Neither method identifies this place
        else:
            return 'other'
    
    places_analysis['k_type'] = places_analysis.apply(get_place_category, axis=1)
    
    # Step 2: Aggregate by user to count places in each category
    user_place_counts = places_analysis.groupby(user_col)['k_type'].value_counts().unstack(fill_value=0)

    # Ensure all categories exist in the dataframe
    for category in ['f1d1', 'f1d0', 'f0d1', 'f0d0']:
        if category not in user_place_counts.columns:
            user_place_counts[category] = 0
    
    user_place_counts = user_place_counts.reset_index()
    
    print(f"Place category distribution:")
    for category in ['f1d1', 'f1d0', 'f0d1', 'f0d0']:
        total_places = user_place_counts[category].sum()
        print(f"  {category}: {total_places} places")
    
    # Step 3: Calculate Jaccard similarity for each user
    def calculate_jaccard(row):
        """Jaccard = |A âˆ© B| / |A âˆª B| = f1d1 / (f1d1 + f1d0 + f0d1)"""
        numerator = row['f1d1']
        denominator = row['f1d1'] + row['f1d0'] + row['f0d1']
        return numerator / denominator if denominator > 0 else 0.0
    
    # Calculate alignment metrics
    user_place_counts['qk'] = user_place_counts.apply(calculate_jaccard, axis=1)
    
    # # Calculate additional metrics
    # user_place_counts['total_k_places'] = user_place_counts['f1d1'] + user_place_counts['f1d0'] + user_place_counts['f0d1']
    # user_place_counts['total_places'] = user_place_counts['f1d1'] + user_place_counts['f1d0'] + user_place_counts['f0d1'] + user_place_counts['f0d0']
    
    # # Calculate ratios
    # user_place_counts['f1d1_ratio'] = user_place_counts['f1d1'] / user_place_counts['total_places']
    # user_place_counts['f1d0_ratio'] = user_place_counts['f1d0'] / user_place_counts['total_places']
    # user_place_counts['f0d1_ratio'] = user_place_counts['f0d1'] / user_place_counts['total_places']
    # user_place_counts['f0d0_ratio'] = user_place_counts['f0d0'] / user_place_counts['total_places']
    
    # # Calculate K-method coverage
    # user_place_counts['k_freq_coverage'] = (user_place_counts['f1d1'] + user_place_counts['f1d0']) / user_place_counts['total_places']
    # user_place_counts['k_dist_coverage'] = (user_place_counts['f1d1'] + user_place_counts['f0d1']) / user_place_counts['total_places']
    
    print(f"\n ALIGNMENT SUMMARY STATISTICS")
    print("=" * 40)
    print(f"Users analyzed: {len(user_place_counts)}")
    print(f"qK - Mean: {user_place_counts['qk'].mean():.4f}, Median: {user_place_counts['qk'].median():.4f}")
    

    return user_place_counts

In [8]:
# Calculate qK alignment using the no-work variants
user_qk = calculate_qk_alignment(
    places_df=places_k,
    user_col='user_id', 
    k_freq_col='k_freq',  # Using no-work version
    k_dist_col='k_dist'   # Using no-work version
)

ðŸ”„ CALCULATING qK ALIGNMENT METRICS
Total places: 333
Users: 30
Place category distribution:
  f1d1: 85 places
  f1d0: 22 places
  f0d1: 40 places
  f0d0: 186 places

 ALIGNMENT SUMMARY STATISTICS
Users analyzed: 30
qK - Mean: 0.6019, Median: 0.5278
