# Replication Code for "Predicting Human Mobility Using Dense Smartphone GPS Trajectories and Transformer Models"

## Disclaimer on Exact Reproducibility Across GPU Hardware

Even when every source of randomness is fully seeded (Python random, NumPy, PyTorch, data‐loader workers), and the exact same library binaries (PyTorch 2.4.1+cu121, CUDA 12.1, cuDNN 9.1, NumPy 1.23.5, etc.) are installed, bit‐for‐bit identical results can only be guaranteed on the same GPU architecture. For example, our primary experiments were run on an NVIDIA RTX A5000 (Driver 560.35.05, CUDA 12.6), and the deterministic cuDNN kernels selected on that card produce a very specific floating‐point rounding path. Colab typically provides T4, P100, or V100 GPUs, which—even under a “deterministic” build of cuDNN—invoke different optimized kernels and may accumulate minute floating‐point differences over hundreds of weight updates. As a result, anyone who is running the same code on Colab GPUs should expect functionally equivalent behavior (identical losses up to ≈1e-6), but they will not see precisely the same final weights or epoch‐by‐epoch outputs unless they use an RTX A5000 (or another card with identical compute capability and driver).

## Setting up environment

In [None]:
# Uninstall pre-installed torch, torchvision, torchaudio, numpy
!pip uninstall -y torch torchvision torchaudio numpy

Found existing installation: torch 2.8.0+cu126
Uninstalling torch-2.8.0+cu126:
  Successfully uninstalled torch-2.8.0+cu126
Found existing installation: torchvision 0.23.0+cu126
Uninstalling torchvision-0.23.0+cu126:
  Successfully uninstalled torchvision-0.23.0+cu126
Found existing installation: torchaudio 2.8.0+cu126
Uninstalling torchaudio-2.8.0+cu126:
  Successfully uninstalled torchaudio-2.8.0+cu126
Found existing installation: numpy 2.0.2
Uninstalling numpy-2.0.2:
  Successfully uninstalled numpy-2.0.2


In [None]:
# Install torch 2.4.1+cu121, torchvision 0.19.1+cu121 and torchaudio 2.4.1+cu121
!pip install \
    torch==2.4.1+cu121 \
    torchvision==0.19.1+cu121 \
    torchaudio==2.4.1+cu121 \
    --extra-index-url https://download.pytorch.org/whl/cu121

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu121
Collecting torch==2.4.1+cu121
  Downloading https://download.pytorch.org/whl/cu121/torch-2.4.1%2Bcu121-cp312-cp312-linux_x86_64.whl (798.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m798.9/798.9 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision==0.19.1+cu121
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.19.1%2Bcu121-cp312-cp312-linux_x86_64.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m117.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchaudio==2.4.1+cu121
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.4.1%2Bcu121-cp312-cp312-linux_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m88.5 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.4.1+cu121)
  Downloading https://dow

Import modules PyTorch and Numpy, which will be used in model training.

In [None]:
import torch
import numpy as np
print("Colab PyTorch:", torch.__version__, "CUDA:", torch.version.cuda, "cuDNN:", torch.backends.cudnn.version())
print("NumPy:", np.__version__)

Colab PyTorch: 2.4.1+cu121 CUDA: 12.1 cuDNN: 90100
NumPy: 2.3.4


## Downloading Replication Code and Data

This section downloads the published codebase (_SpeedTransformer_) and the three pre‑processed datasets (_MOBIS_, _GeoLife_, and _Miniprogram_) exactly as referenced in **Section “Data and Codes Availability”**.  
The repository zip is fetched from Zenodo and unzipped into the Colab working directory.  
The subsequent cell creates a `data/` folder where the CSV files are stored so that the training scripts can locate them via the relative paths used throughout the notebook.

In [None]:
!wget https://zenodo.org/records/17429944/files/SpeedTransformer.zip -O /content/SpeedTransformer.zip
!unzip /content/SpeedTransformer.zip

--2025-10-26 20:59:33--  https://zenodo.org/records/17429944/files/SpeedTransformer.zip
Resolving zenodo.org (zenodo.org)... 188.185.48.194, 188.185.43.25, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.48.194|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25755626 (25M) [application/octet-stream]
Saving to: ‘/content/SpeedTransformer.zip’


2025-10-26 20:59:36 (8.78 MB/s) - ‘/content/SpeedTransformer.zip’ saved [25755626/25755626]

Archive:  /content/SpeedTransformer.zip
   creating: SpeedTransformer/
  inflating: __MACOSX/._SpeedTransformer  
  inflating: SpeedTransformer/SpeedTransformer.ipynb  
  inflating: __MACOSX/SpeedTransformer/._SpeedTransformer.ipynb  
  inflating: SpeedTransformer/.DS_Store  
  inflating: __MACOSX/SpeedTransformer/._.DS_Store  
   creating: SpeedTransformer/models/
  inflating: __MACOSX/SpeedTransformer/._models  
  inflating: SpeedTransformer/README.md  
  inflating: __MACOSX/SpeedTransformer/._README.md  
  inf

In [None]:
# Download CSV files into the data/ directory
!wget -O /content/SpeedTransformer/data/mobis_processed.csv https://zenodo.org/record/17429944/files/mobis_processed.csv
!wget -O /content/SpeedTransformer/data/geolife_processed.csv https://zenodo.org/record/17429944/files/geolife_processed.csv
!wget -O /content/SpeedTransformer/data/miniprogram_balanced.csv https://zenodo.org/record/17429944/files/miniprogram_balanced.csv

--2025-10-26 20:59:43--  https://zenodo.org/record/17429944/files/mobis_processed.csv
Resolving zenodo.org (zenodo.org)... 188.185.48.194, 188.185.43.25, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.48.194|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: /records/17429944/files/mobis_processed.csv [following]
--2025-10-26 20:59:44--  https://zenodo.org/records/17429944/files/mobis_processed.csv
Reusing existing connection to zenodo.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 31273348118 (29G) [text/plain]
Saving to: ‘/content/SpeedTransformer/data/mobis_processed.csv’

--2025-10-26 21:30:08--  https://zenodo.org/record/17429944/files/geolife_processed.csv
Resolving zenodo.org (zenodo.org)... 188.185.48.194, 188.185.45.92, 188.185.43.25, ...
Connecting to zenodo.org (zenodo.org)|188.185.48.194|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: /records/17429944/files

Organize the data files.

In [None]:
!mv /content/SpeedTransformer /content/A-SpeedTransformer

Move data to a Google Cloud Storage. Import it later after switching to a GPU node later.

In [None]:
# === Configuration ===
PROJECT_ID = "osmandgdelt" # switch to your project_id
BUCKET_NAME = "speed_transformer"  # switch to your bucket name
BUCKET_LOCATION = "us-west1"          # switch to a region close to you
LOCAL_DIR = "/content/A-SpeedTransformer" # your local source directory
GCS_PREFIX = "speed_transformer"  # folder-like prefix in the bucket

# === Install and authenticate ===
%pip -q install -U google-cloud-storage gcsfs

from google.colab import auth
auth.authenticate_user()  # login to your Google account
print("Authenticated.")

# === Create bucket if it doesn't exist ===
import google.cloud.storage as gcs

client = gcs.Client(project=PROJECT_ID)
existing = [b.name for b in client.list_buckets() if b.name == BUCKET_NAME]
if not existing:
    bucket = client.bucket(BUCKET_NAME)
    bucket.storage_class = "STANDARD"
    client.create_bucket(bucket, location=BUCKET_LOCATION)
    print(f"Created bucket: gs://{BUCKET_NAME}/ in {BUCKET_LOCATION}")
else:
    print(f"Bucket exists: gs://{BUCKET_NAME}/")

# === (Option 1) Fast, robust upload with gsutil (recommended for large trees) ===
# Preserves filenames exactly; parallelized and resumable.
import subprocess, shlex
cmd = f'gsutil -m rsync -r -d "{LOCAL_DIR}" "gs://{BUCKET_NAME}/{GCS_PREFIX}"'
print(cmd)
subprocess.check_call(shlex.split(cmd))
print("Upload complete via gsutil rsync.")



[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/290.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m286.7/290.1 kB[0m [31m13.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/199.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.3/199.3 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-cloud-aiplatform 1.121.0 requires google-cloud-storage<3.0.0,>=1.32.0, but you have google-cloud-storage 3.4.1 which is incompatible.
google-adk 1.16.0 requires google-cloud-storage<3.0.0,>=2.18.0, but you

Re-import the data from Google Cloud Storage

In [None]:
# === Configuration ===
PROJECT_ID = "osmandgdelt"
BUCKET_NAME = "speed_transformer"           # your existing bucket
GCS_PREFIX = "speed_transformer"            # folder prefix in GCS
LOCAL_DIR = "/content/A-SpeedTransformer"   # local target directory (important!)

# === Install and authenticate ===
%pip -q install -U google-cloud-storage gcsfs

from google.colab import auth
auth.authenticate_user()  # log in again after switching runtime
print("✅ Authenticated with Google account")

# === Ensure the target directory exists ===
import os
os.makedirs(LOCAL_DIR, exist_ok=True)

# === Sync data from GCS to local ===
import subprocess, shlex

cmd = f'gsutil -m rsync -r "gs://{BUCKET_NAME}/{GCS_PREFIX}" "{LOCAL_DIR}"'
print(f"Running: {cmd}")
subprocess.check_call(shlex.split(cmd))

print(f"✅ Download complete. Files restored to {LOCAL_DIR}")
!ls -lh "{LOCAL_DIR}"


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/290.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m286.7/290.1 kB[0m [31m10.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/199.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.3/199.3 kB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-cloud-aiplatform 1.121.0 requires google-cloud-storage<3.0.0,>=1.32.0, but you have google-cloud-storage 3.4.1 which is incompatible.
google-adk 1.16.0 requires google-cloud-storage<3.0.0,>=2.18.0, but yo

## Appendix H Results & Figure Plotting

In [None]:
!python /content/A-SpeedTransformer/models/rule_based/rule_baseline.py \
  --data_path "/content/A-SpeedTransformer/data/geolife_processed.csv" \
  --label_encoder_path "/content/A-SpeedTransformer/models/rule_based/label_encoder.joblib" \
  --out_dir "/content/A-SpeedTransformer/models/rule_based/experiments/rule_geolife" \
  --auto_calibrate

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
Collecting traj_ids: 1it [00:00,  4.90it/s]
Building windows: 1it [00:00,  3.09it/s]
Building windows: 1it [00:00,  3.84it/s]
[Rule Baseline] Test Accuracy: 0.0220
              precision    recall  f1-score   support

        bike     0.0000    0.0000    0.0000         0
         bus     0.0000    0.0000    0.0000       126
         car     0.0490    0.0656    0.0561       183
       train     0.0000    0.0000    0.0000      3856
        walk     1.0000    0.7961    0.8865       103

    accuracy                         0.0220      4268
   macro avg     0.2098    0.1723    0.1885      4268
weighted avg     0.0262    0.0220    0.0238      4268

object address  : 0x7b0d31a30d00
object refcount : 3
object type     : 0xa2a4e0
object type name: KeyboardInterrupt
object repr     : KeyboardInterrupt()
lost sys.stderr
^C


In [None]:
!python /content/A-SpeedTransformer/models/rule_based/rule_baseline.py \
  --data_path "/content/A-SpeedTransformer/data/mobis_processed.csv" \
  --label_encoder_path "/content/A-SpeedTransformer/models/rule_based/label_encoder.joblib" \
  --out_dir "/content/A-SpeedTransformer/models/rule_based/experiments/rule_mobis" \
  --auto_calibrate

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
Collecting traj_ids: 1it [00:00,  1.46it/s]
Building windows: 1it [00:00,  1.20it/s]
Building windows: 1it [00:00,  1.60it/s]
[Rule Baseline] Test Accuracy: 0.7127
              precision    recall  f1-score   support

        bike     0.2019    0.9255    0.3314        94
         bus     0.0000    0.0000    0.0000        49
         car     0.8705    0.7894    0.8280       945
       train     0.0000    0.0000    0.0000       109
        walk     0.9902    0.7632    0.8620       397

    accuracy                         0.7127      1594
   macro avg     0.4125    0.4956    0.4043      1594
weighted avg     0.7746    0.7127    0.7251      1594

[OK] Saved outputs to: /content/A-SpeedTransformer/models/rule_based/experiments/rule_mobis


## Appendix I Figure Plotting

In [None]:
 !python /content/A-SpeedTransformer/models/run_speedtransformer_rf_visualize_computation.py \
  --root /content/A-SpeedTransformer \
  --do-transformer \
  --do-lstm \
  --do-plot \
  --monitor

## Appendix G Results

In [None]:
!wget -O /content/SpeedTransformer1.zip https://zenodo.org/records/15535357/files/SpeedTransformer.zip
!unzip /content/SpeedTransformer1.zip

In [None]:
!mv /content/A-SpeedTransformer/data/mobis_processed.csv /content/SpeedTransformer/data/mobis_processed.csv
!mv /content/A-SpeedTransformer/data/geolife_processed.csv /content/SpeedTransformer/data/geolife_processed.csv
!mv /content/A-SpeedTransformer/data/miniprogram_balanced.csv /content/SpeedTransformer/data/miniprogram_balanced.csv

In [None]:
# LSTM on Geolife
%cd /content/SpeedTransformer/models/lstm
!python lstm.py --data_path  /content/SpeedTransformer/data/geolife_processed.csv --random_state 1
%cd /content/

In [None]:
# LSTM on Mobis
%cd /content/SpeedTransformer/models/lstm
!python lstm.py --data_path /content/SpeedTransformer/data/mobis_processed.csv --random_state 316
%cd /content/

In [None]:
# SpeedTransformer_v1 on Geolife
%cd /content/SpeedTransformer/models/transformer
!python train.py --data_path /content/SpeedTransformer/data/geolife_processed.csv --random_state 316
%cd /content/

In [None]:
# SpeedTransformer_v1 on Mobis
%cd /content/SpeedTransformer/models/transformer
!python train.py --data_path /content/SpeedTransformer/data/mobis_processed.csv --random_state 1
%cd /content/

In [None]:
## Finetuning - Transformer
%cd /content/SpeedTransformer/models/transformer

# Fine-tune with 2% of data used for training
!python finetune.py \
  --pretrained_model_path /content/SpeedTransformer/models/transformer/mobis/best_model.pth \
  --data_path /content/SpeedTransformer/data/geolife_processed.csv \
  --label_encoder_path /content/SpeedTransformer/models/transformer/mobis/label_encoder.joblib \
  --test_size 0.7786 \
  --val_size 0.2 \
  --random_state 42
%cd /content/

In [None]:
## Finetuning - LSTM
%cd /content/SpeedTransformer/models/lstm

# Fine-tune with 2% of data used for training
!python finetune.py \
  --pretrained_model_path /content/SpeedTransformer/models/lstm/mobis/best_model.pth \
  --data_path /content/SpeedTransformer/data/geolife_processed.csv \
  --scaler_path /content/SpeedTransformer/models/lstm/mobis/scaler.joblib \
  --label_encoder_path /content/SpeedTransformer/models/lstm/mobis/label_encoder.joblib \
  --test_size 0.7893 \
  --val_size 0.2 \
  --random_state 42
%cd /content/

In [None]:
## Miniprogram with pre-trained Transformer
%cd SpeedTransformer/models/transformer

# Fine-tune with 40% of data used for training
!python finetune.py \
  --pretrained_model_path /content/SpeedTransformer/models/transformer/mobis/best_model.pth \
  --data_path /content/SpeedTransformer/data/miniprogram_balanced.csv \
  --label_encoder_path /content/SpeedTransformer/models/transformer/mobis/label_encoder.joblib \
  --test_size 0.55 \
  --val_size 0.2 \
  --random_state 42

# Return to root directory
%cd /content/

In [None]:
## Miniprogram with pre-trained LSTM
%cd SpeedTransformer/models/lstm

# Fine-tune with 40% of data used for training
!python finetune.py \
  --pretrained_model_path /content/SpeedTransformer/models/lstm/mobis/best_model.pth \
  --data_path /content/SpeedTransformer/data/miniprogram_balanced.csv \
  --label_encoder_path /content/SpeedTransformer/models/lstm/mobis/label_encoder.joblib \
  --scaler_path /content/SpeedTransformer/models/lstm/mobis/scaler.joblib \
  --test_size 0.55 \
  --val_size 0.2 \
  --random_state 42

# Return to root directory
%cd /content/