Skip to content

Latest commit

 

History

History
46 lines (30 loc) · 2.15 KB

README.md

File metadata and controls

46 lines (30 loc) · 2.15 KB

EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing

Official repository for EarthMarker.

Authors: Wei Zhang*, Miaoxin Cai*, Tong Zhang, Yin Zhuang, and Xuerui Mao

  • The authors contributed equally to this work.

📣 News

  • [2024.01.06]: We have released the dataset RSVP! 🔥 🔥🔥
  • [2024.12.22]: EarthMarker has been accepted to IEEE TGRS. 🎉
  • [2024.07.19]: The paper for EarthMarker is released arxiv. 🚀

✨ Introduction

A visual prompting MLLM called EarthMarker is proposed in the remote sensing (RS) domain for the first time. EarthMarker can comprehend RS imagery under visual and text joint prompts, and flexibly switch interpretation levels, including image, region, and point levels. More importantly, the proposed EarthMarker fills the gap in visual prompting MLLMs for RS, significantly catering to the fine-grained interpretation needs of RS imagery in real-world applications. EarthMarker is capable of various RS visual tasks including scene classification, referring object classification, captioning, and relationship analyses, which are beneficial to making informed decisions in real-world applications.

✨ The first RS Visual Prompting instruction Dataset RSVP

The entire data of RSVP is released! 🚀 RSVP contains roughly 3.65 M image-point-text and image-region-text pairings.

link1: https://pan.baidu.com/s/1_kMO5bBje7JXTNpxDiCvqg?pwd=gqdb pwd: gqdb

link2: OneDrive version is uploading.

🔖 Citation

@article{zhang2024earthmarker,
  title={EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing},
  author={Zhang, Wei and Cai, Miaoxin and Zhang, Tong and Zhuang, Yin and Li, Jun and Mao, Xuerui},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024},
  publisher={IEEE}
}

📝 Acknowledgment

This paper benefits from llama. Thanks for their wonderful work.