SkyEyeGPT is a unified multi-modal large language model designed for remote sensing (RS) vision-language tasks. The model is trained on a custom dataset, SkyEye-968k, containing 968,000 samples, including single-task and multi-task conversation instructions. The model uses a visual encoder, an alignment layer, and an LLM-based decoder to process RS data. It achieves strong performance on tasks such as image captioning, visual grounding, and visual question answering. The model is trained using a two-stage instruction tuning approach, which enhances its ability to follow instructions and engage in multi-turn conversations. Experiments on eight RS vision-language datasets show that SkyEyeGPT outperforms other models in image-level and region-level tasks. The model is open-sourced, providing an online chatbot, model checkpoint, instruction-following dataset, and codebase for real-world applications. The model's architecture is efficient and simple to train and deploy, making it suitable for a wide range of RS tasks. The results demonstrate that SkyEyeGPT can provide a comprehensive and detailed understanding of remote sensing images, comparable to or better than GPT-4V in some tests. The model's effectiveness is validated through extensive experiments and ablation studies, showing its potential for future research in RS vision-language tasks.SkyEyeGPT is a unified multi-modal large language model designed for remote sensing (RS) vision-language tasks. The model is trained on a custom dataset, SkyEye-968k, containing 968,000 samples, including single-task and multi-task conversation instructions. The model uses a visual encoder, an alignment layer, and an LLM-based decoder to process RS data. It achieves strong performance on tasks such as image captioning, visual grounding, and visual question answering. The model is trained using a two-stage instruction tuning approach, which enhances its ability to follow instructions and engage in multi-turn conversations. Experiments on eight RS vision-language datasets show that SkyEyeGPT outperforms other models in image-level and region-level tasks. The model is open-sourced, providing an online chatbot, model checkpoint, instruction-following dataset, and codebase for real-world applications. The model's architecture is efficient and simple to train and deploy, making it suitable for a wide range of RS tasks. The results demonstrate that SkyEyeGPT can provide a comprehensive and detailed understanding of remote sensing images, comparable to or better than GPT-4V in some tests. The model's effectiveness is validated through extensive experiments and ablation studies, showing its potential for future research in RS vision-language tasks.