Awesome-Urban-Foundation-Models

An Awesome Collection of Urban Foundation Models (UFMs).

Urban Foundation Models (UFMs) are large-scale models pre-trained on vast multi-source, multi-granularity, and multimodal urban data. It benefits significantly from its pre-training phase, exhibiting emergent capabilities and remarkable adaptability to a broader range of multiple downstream tasks and domains in urban contexts.

Survey Paper

Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models

Authors: Weijia Zhang, Jindong Han, Zhao Xu, Hang Ni, Hao Liu, Hui Xiong

🌟 If you find this resource helpful, please consider starring this repository and citing our survey paper:

@misc{zhang2024urban,
      title={Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models}, 
      author={Weijia Zhang and Jindong Han and Zhao Xu and Hang Ni and Hao Liu and Hui Xiong},
      year={2024},
      eprint={2402.01749},
      archivePrefix={arXiv},
      primaryClass={cs.CY}
}

Outline

Awesome-Urban-Foundation-Models

Taxonomy

1. Language-based Models

1.1 Unimodal Pre-training

Geo-text

(KDD'22) ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps [paper]
(SIGIR'22) MGeo: Multi-Modal Geographic Language Model Pre-Training [paper]

1.2 Unimodal Adaptation

Prompt engineering

(arXiv 2023.10) Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning [paper]
(arXiv 2023.10) GeoLLM: Extracting Geospatial Knowledge from Large Language Models [paper]
(arXiv 2023.05) Towards Human-AI Collaborative Urban Science Research Enabled by Pre-trained Large Language Models [paper]
(arXiv 2023.05) GPT4GEO: How a Language Model Sees the World's Geography [paper]
(arXiv 2023.05) On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence [paper]
(arXiv 2023.05) ChatGPT is on the Horizon: Could a Large Language Model be Suitable for Intelligent Traffic Safety Research and Applications? [paper]
(GIScience'23) Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations [paper]
(SIGSPATIAL'23) Are Large Language Models Geospatially Knowledgeable? [paper]
(SIGSPATIAL'23) Towards Understanding the Geospatial Skills of ChatGPT: Taking a Geographic Information Systems (GIS) Exam [paper]
(SIGSPATIAL'22) Towards a Foundation Model for Geospatial Artificial Intelligence (Vision Paper) [paper]

Model fine-tuning

(arXiv 2023.11) Optimizing and Fine-tuning Large Language Model for Urban Renewal [paper]
(arXiv 2023.09) K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization [paper]
(EMNLP'23) GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding [paper]
(KDD'23) QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search [paper]
(TOIS'23) Improving First-stage Retrieval of Point-of-interest Search by Pre-training Models [paper]
(EMNLP'22) SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation [paper]

2. Vision-based Models

2.1 Unimodal Pre-training

On-site urban visual data

(WWW'23) Knowledge-infused Contrastive Learning for Urban Imagery-based Socioeconomic Prediction [paper]
(CIKM'22) Predicting Multi-level Socioeconomic Indicators from Structural Urban Imagery [paper]
(AAAI'20) Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding [paper]

Remote sensing data

(TGRS'2023) Foundation Model-Based Multimodal Remote Sensing Data Classification [paper]
(arXiv 2023.04) A Billion-scale Foundation Model for Remote Sensing Images [paper]
(TGRS'23) RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling [paper]
(ICCV'23) Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning [paper]
(ICML'23) CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations [paper]
(TGRS'22) Advancing Plain Vision Transformer Toward Remote Sensing Foundation Model [paper]
(TGRS'22) RingMo: A Remote Sensing Foundation Model With Masked Image Modeling [paper]

Grid-based meteorological data

(arXiv 2023.04) FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead [paper]
(arXiv 2023.04) W-MAE: Pre-trained Weather Model with Masked Autoencoder for Multi-variable Weather Forecasting [paper]
(arXiv 2022.02) FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators [paper]
(Nature'23) Accurate Medium-range Global Weather Forecasting with 3D Neural Networks [paper]
(ICML'23) ClimaX: A Foundation Model for Weather and Climate [paper]

2.2 Unimodal Adaptation

Prompt engineering

(TGRS'24) RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model [paper]
(NeurIPS'23) SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [paper]

Model fine-tuning

(arXiv 2023.11) GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure [paper]
(arXiv 2023.02) Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization [paper]
(TGRS'23) RingMo-SAM: A Foundation Model for Segment Anything in Multimodal Remote-Sensing Images [paper]
(IJAEOG'22) Migratable Urban Street Scene Sensing Method based on Vsion Language Pre-trained Model [paper]

3. Trajectory-based Models

3.1 Unimodal Pre-training

Road network trajectory

(KDD'23) Lightpath: Lightweight and scalable path representation learning [paper]
(ICDM'23) Self-supervised Pre-training for Robust and Generic Spatial-Temporal Representations [paper]
(TKDE'23) Pre-Training General Trajectory Embeddings With Maximum Multi-View Entropy Coding [paper]
(ICDE'23) Self-supervised trajectory representation learning with temporal regularities and travel semantics [paper]
(WWW'24) More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning [paper]
(VLDBJ'22) Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training [paper]
(CIKM'21) Robust road network representation learning: When traffic patterns meet traveling semantics [paper]
(IJCAI'21) Unsupervised path representation learning with curriculum negative sampling [paper]
(TIST'20) Trembr: Exploring road networks for trajectory representation learning [paper]
(ICDE'18) Deep representation learning for trajectory similarity computation [paper]
(IJCNN'17) Trajectory clustering via deep representation learning [paper]

Free space trajectory

(AAAI'23) Contrastive pre-training with adversarial perturbations for check-in sequence representation learning [paper]
(KBS'21) Self-supervised human mobility learning for next location prediction and trajectory classification [paper]
(AAAI'21) Pre-training context and time aware location embeddings from spatial-temporal trajectories for user next location prediction [paper]
(KDD'20) Learning to simulate human mobility [paper]

3.2 Unimodal Adaptation

Model fine-tuning

(ToW'23) Pre-Training Across Different Cities for Next POI Recommendation [paper]
(TIST'23) Doing more with less: overcoming data scarcity for poi recommendation via cross-region transfer [paper]
(CIKM'21) Region invariant normalizing flows for mobility transfer [paper]

3.3 Cross-modal Adaptation

Prompt engineering

(arXiv 2023.11) Exploring Large Language Models for Human Mobility Prediction under Public Events [paper]
(arXiv 2023.10) Large Language Models for Spatial Trajectory Patterns Mining [paper]
(arXiv 2023.10) Gpt-driver: Learning to drive with gpt [paper]
(arXiv 2023.10) Languagempc: Large language models as decision makers for autonomous driving [paper]
(arXiv 2023.09) Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving [paper]
(arXiv 2023.08) Where would i go next? large language models as human mobility predictors [paper]
(SIGSPATIAL'22) Leveraging language foundation models for human mobility forecasting [paper]
(arXiv 2024.03) DrPlanner: Diagnosis and Repair of Motion Planners Using Large Language Models [paper]

4. Time Series-based Models

4.1 Unimodal Pre-training

Ordinary time series

(arXiv 2024.03) UniTS: Building a Unified Time Series Model [paper]
(arXiv 2024.02) Timer: Transformers for Time Series Analysis at Scale [paper]
(arXiv 2024.02) Generative Pretrained Hierarchical Transformer for Time Series Forecasting [paper]
(arXiv 2024.02) TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [paper]
(arXiv 2024.01) TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series [paper]
(arXiv 2024.01) Himtm: Hierarchical multi-scale masked time series modeling for long-term forecasting [paper]
(arXiv 2023.12) Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation [paper]
(arXiv 2023.11) PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning [paper]
(arXiv 2023.10) UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series Forecasting [paper]
(arXiv 2023.03) SimTS: Rethinking Contrastive Representation Learning for Time Series Forecasting [paper]
(arXiv 2023.01) Ti-MAE: Self-Supervised Masked Time Series Autoencoders [paper]
(NeurIPS'23) Forecastpfn: Synthetically-trained zero-shot forecasting [paper]
(NeurIPS'23) SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling [paper]
(NeurIPS'23) Lag-llama: Towards foundation models for time series forecasting [paper]
(ICLR'23) A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [paper]
(KDD'23) TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting [paper]
(AAAI'22) TS2Vec: Towards Universal Representation of Time Series [paper]
(ICLR'22) CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting [paper]
(TNNLS'22) Self-Supervised Autoregressive Domain Adaptation for Time Series Data [paper]
(IJCAI'21) Time-Series Representation Learning via Temporal and Contextual Contrasting [paper]
(ICLR'21) Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding [paper]
(AAAI'21) Meta-Learning Framework with Applications to Zero-Shot Time-Series Forecasting [paper]
(AAAI'21) Time Series Domain Adaptation via Sparse Associative Structure Alignment [paper]
(KDD'21) A Transformer-based Framework for Multivariate Time Series Representation Learning [paper]
(KDD'20) Multi-Source Deep Domain Adaptation with Weak Supervision for Time-Series Sensor Data [paper]
(NeurIPS'19) Unsupervised Scalable Representation Learning for Multivariate Time Series [paper]

Spatial-correlated time series

(arXiv 2024.02) UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction [paper]
(NeurIPS'23) GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks [paper]
(CIKM'23) Mask- and Contrast-Enhanced Spatio-Temporal Learning for Urban Flow Prediction [paper]
(CIKM'23) Cross-city Few-Shot Traffic Forecasting via Traffic Pattern Bank [paper]
(KDD'23) Transferable Graph Structure Learning for Graph-based Traffic Forecasting Across Cities [paper]
(KDD'22) Selective Cross-City Transfer Learning for Traffic Prediction via Source City Region Re-Weighting [paper]
(WSDM'22) ST-GSP: Spatial-Temporal Global Semantic Representation Learning for Urban Flow Prediction [paper]
(SIGSPATIAL'22) When Do Contrastive Learning Signals Help Spatio-Temporal Graph Forecasting? [paper]
(KDD'22) Pre-training Enhanced Spatial-temporal Graph Neural Network for Multivariate Time Series Forecasting [paper]
(WWW'19) Learning from Multiple Cities: A Meta-Learning Approach for Spatial-Temporal Prediction [paper]
(IJCAI'18) Cross-City Transfer Learning for Deep Spatio-Temporal Prediction [paper]

4.2 Unimodal Adaptation

Prompt tuning

(arXiv 2023.12) Prompt-based Domain Discrimination for Multi-source Time Series Domain Adaptation [paper]
(arXiv 2023.11) PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning [paper]
(arXiv 2023.05) Spatial-temporal Prompt Learning for Federated Weather Forecasting [paper]
(CIKM'23) PromptST: Prompt-Enhanced Spatio-Temporal Multi-Attribute Prediction [paper]
(IJCAI'23) Prompt Federated Learning for Weather Forecasting: Toward Foundation Models on Meteorological Data [paper]

4.3 Cross-modal Adaptation

Prompt engineering

(TKDE'22) PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting [paper]
(NeurIPS'23) Large Language Models Are Zero-Shot Time Series Forecasters [paper]

Model fine-tuning

(arXiv 2024.02) AutoTimes: Autoregressive Time Series Forecasters via Large Language Models [paper]
(ICLR'24) TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting [paper]
(arXiv 2024.03) UrbanGPT: Spatio-Temporal Large Language Models [paper]
(arXiv 2024.03) TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models [paper]
(arXiv 2024.01) How can large language models understand spatial-temporal data? [paper]
(arXiv 2024.01) Spatial-temporal large language model for traffic prediction [paper]
(arXiv 2023.11) One Fits All: Universal Time Series Analysis by Pretrained LM and Specially Designed Adaptors [paper]
(arXiv 2023.11) GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation [paper]
(arXiv 2023.08) LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs [paper]
(NeurIPS'23) One Fits All: Power General Time Series Analysis by Pretrained LM [paper]

Model reprogramming

(ICLR'24) Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [paper]
(arXiv 2023.08) TEST: Text Prototype Aligned Embedding to Activate LLM’s Ability for Time Series [paper]

5. Multimodal-based Models

5.1 Pre-training

Single-domain models

(WWW'24) When Urban Region Profiling Meets Large Language Models [paper]
(TITS'23) Parallel Transportation in TransVerse: From Foundation Models to DeCAST [paper]

Multi-domain models

(arXiv 2023.12) AllSpark: A Multimodal Spatiotemporal General Model [paper]
(arXiv 2023.10) City Foundation Models for Learning General Purpose Representations from OpenStreetMap [paper]

5.2 Adaptation

Prompt engineering

(arXiv 2023.09) TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models [paper]
(arXiv 2023.07) GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT [paper]
(arXiv 2024.02) Large Language Model for Participatory Urban Planning [paper]

Model fine-tuning

(arXiv 2023.12) Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment [paper]
(arXiv 2023.07) VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View [paper]
(arXiv 2024.02) TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation [paper]

6. Others

6.1 Unimodal Approaches

(EDBT'23) Spatial Structure-Aware Road Network Embedding via Graph Contrastive Learning [paper]
(CIKM'21) GeoVectors: A Linked Open Corpus of OpenStreetMap Embeddings on World Scale [paper]

6.2 Cross-modal Adaptation

(arXiv 2023.12) Large Language Models as Traffic Signal Control Agents: Capacity and Opportunity [paper]
(arXiv 2023.08) Llm powered sim-to-real transfer for traffic signal control [paper]
(arXiv 2023.06) Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning [paper]

7. Contributing

👍 Contributions to this repository are welcome!

If you have come across relevant resources, feel free to open an issue or submit a pull request.

- (*conference|journal*) paper_name [[pdf](link)][[code](link)]

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
figs		figs
LICENSE		LICENSE
README.md		README.md

License

usail-hkust/Awesome-Urban-Foundation-Models

Folders and files

Latest commit

History

Repository files navigation

Awesome-Urban-Foundation-Models

Survey Paper

Outline

Taxonomy

1. Language-based Models

1.1 Unimodal Pre-training

Geo-text

1.2 Unimodal Adaptation

Prompt engineering

Model fine-tuning

2. Vision-based Models

2.1 Unimodal Pre-training

On-site urban visual data

Remote sensing data

Grid-based meteorological data

2.2 Unimodal Adaptation

Prompt engineering

Model fine-tuning

3. Trajectory-based Models

3.1 Unimodal Pre-training

Road network trajectory

Free space trajectory

3.2 Unimodal Adaptation

Model fine-tuning

3.3 Cross-modal Adaptation

Prompt engineering

4. Time Series-based Models

4.1 Unimodal Pre-training

Ordinary time series

Spatial-correlated time series

4.2 Unimodal Adaptation

Prompt tuning

4.3 Cross-modal Adaptation

Prompt engineering

Model fine-tuning

Model reprogramming

5. Multimodal-based Models

5.1 Pre-training

Single-domain models

Multi-domain models

5.2 Adaptation

Prompt engineering

Model fine-tuning

6. Others

6.1 Unimodal Approaches

6.2 Cross-modal Adaptation

7. Contributing

About

Resources

License

Stars

Watchers

Forks