🌏 EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Official repository for EarthGPT. 😄
Authors: Wei Zhang*, Miaoxin Cai*, Tong Zhang, Yin Zhuang, and Xuerui Mao
- The authors contributed equally to this work.
- The dataset, model, code, and demo are coming soon! 🚀
- [2024.05.25]: EarthGPT has been accepted to IEEE-TGRS 🎉
- [2024.04.29]: We partially released the data of MMRS-1M! 🔥
- [2024.01.30]: The paper for EarthGPT is released arxiv. 🔥🔥
EarthGPT is a pioneering model designed to seamlessly unify multi-sensor and diverse remote sensing intelligent visual interpretation tasks in a unified framework, guided by user language instructions. EarthGPT is versatile at performing visual-language dialogues across optical, SAR, and infrared images. EarthGPT's capabilities extend to a wide range of tasks including scene classification, image description, visual question answering, target description, visual localization, and object detection.
The entire data of MMRS-1M is coming soon! 🚀
MMRS-1M is the largest multi-modal multi-sensor RS instruction-following dataset, consisting of over 1M image-text pairs that include optical, SAR, and infrared RS images.
We release 9000+ image-text pairs in the following link.
Link:https://pan.baidu.com/s/1hN7RXQv5xo5Fyq0nHzUlzg
PWd:haha
@article{zhang2024earthgpt,
title={Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain},
author={Zhang, Wei and Cai, Miaoxin and Zhang, Tong and Zhuang, Yin and Mao, Xuerui},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2024},
publisher={IEEE}
}
This paper benefits from llama. Thanks for their wonderful work.
If you have any questions about EarthGPT, please feel free to contact w.w.zhanger@gmail.com.