This paper introduces Morph-Tokens to address the conflict between visual comprehension and generation in multimodal large language models (MLLMs). The key challenge is that for comprehension, MLLMs need to abstract visual features, while for generation, they must preserve visual details. Morph-Tokens serve dual purposes: as visual prompts for comprehension and as complete visual tokens for image reconstruction. The proposed method uses a three-stage training strategy to achieve this. In Stage 1, the model is trained on image-text pairs to expand the token vocabulary. In Stage 2, the model is trained to auto-encode morph-tokens, enabling both comprehension and generation. In Stage 3, the model is instruction-tuned to enhance its capabilities in complex scenarios. The results show that the model achieves state-of-the-art performance in both comprehension and generation tasks, with significant improvements in multi-turn image editing and in-context learning. The model also demonstrates strong image fidelity preservation during generation. The method is evaluated on various benchmarks and shows superior performance compared to existing MLLMs. The project is available at https://github.com/DCDmllm/MorphTokens.This paper introduces Morph-Tokens to address the conflict between visual comprehension and generation in multimodal large language models (MLLMs). The key challenge is that for comprehension, MLLMs need to abstract visual features, while for generation, they must preserve visual details. Morph-Tokens serve dual purposes: as visual prompts for comprehension and as complete visual tokens for image reconstruction. The proposed method uses a three-stage training strategy to achieve this. In Stage 1, the model is trained on image-text pairs to expand the token vocabulary. In Stage 2, the model is trained to auto-encode morph-tokens, enabling both comprehension and generation. In Stage 3, the model is instruction-tuned to enhance its capabilities in complex scenarios. The results show that the model achieves state-of-the-art performance in both comprehension and generation tasks, with significant improvements in multi-turn image editing and in-context learning. The model also demonstrates strong image fidelity preservation during generation. The method is evaluated on various benchmarks and shows superior performance compared to existing MLLMs. The project is available at https://github.com/DCDmllm/MorphTokens.