This paper introduces DitTO, a self-alignment method for role-play that enables large language models (LLMs) to simulate role-play dialogues by leveraging character knowledge. The method creates a role-play training set of 4000 characters, which is ten times larger than existing datasets. The LLM is then fine-tuned using this dataset to enhance its role-play capabilities. Evaluation on a carefully constructed role-play benchmark and the role-play subset of MT-Bench shows that DitTO consistently maintains a consistent role identity and provides accurate role-specific knowledge in multi-turn conversations. It outperforms all open-source role-play baselines and performs comparably to advanced proprietary chatbots. The paper also presents the first comprehensive cross-supervision alignment experiment in the role-play domain, revealing that the intrinsic capabilities of LLMs confine the knowledge within role-play. Meanwhile, role-play styles can be easily acquired with the guidance of smaller models. The method is highly scalable and flexible, and the authors open-source related resources in https://github.com/OFA-Sys/Ditto. The paper also proposes an objective role-play evaluation focusing on consistent role identity, accurate role-related knowledge, and cognitive boundary. The evaluation is reproducible, explainable, and efficient compared with manual annotations. The paper also analyzes the dissection of role-play by cross-supervision, providing rich insights into the keys of role-play capabilities. The experiments show that knowledge is bounded by the inherent capabilities of LLMs in strong-to-weak settings, and weak-to-strong generalizations on knowledge-related metrics are observed. The paper concludes that achieving a commendable role-play performance requires a strong foundational model, with SFT data not constituting the central bottleneck. The authors also discuss the limitations of their approach, noting that the best DITTO model based on Qwen-72B-Chat is still outperformed by advanced chatbots such as GPT-4 and GPT-4-Turbo. However, the training data contains noticeable noise, and manual cleaning of the self-generated dialogue simulation is expected to further boost the performance of DITTO. The paper also discusses ethical considerations, noting that role-play LLMs aligned by DITTO may only have minimum safety alignment and could generate toxic and harmful contents under induction. Therefore, these role-play LLMs are only for research purposes and should be carefully aligned in terms of safety in the future.This paper introduces DitTO, a self-alignment method for role-play that enables large language models (LLMs) to simulate role-play dialogues by leveraging character knowledge. The method creates a role-play training set of 4000 characters, which is ten times larger than existing datasets. The LLM is then fine-tuned using this dataset to enhance its role-play capabilities. Evaluation on a carefully constructed role-play benchmark and the role-play subset of MT-Bench shows that DitTO consistently maintains a consistent role identity and provides accurate role-specific knowledge in multi-turn conversations. It outperforms all open-source role-play baselines and performs comparably to advanced proprietary chatbots. The paper also presents the first comprehensive cross-supervision alignment experiment in the role-play domain, revealing that the intrinsic capabilities of LLMs confine the knowledge within role-play. Meanwhile, role-play styles can be easily acquired with the guidance of smaller models. The method is highly scalable and flexible, and the authors open-source related resources in https://github.com/OFA-Sys/Ditto. The paper also proposes an objective role-play evaluation focusing on consistent role identity, accurate role-related knowledge, and cognitive boundary. The evaluation is reproducible, explainable, and efficient compared with manual annotations. The paper also analyzes the dissection of role-play by cross-supervision, providing rich insights into the keys of role-play capabilities. The experiments show that knowledge is bounded by the inherent capabilities of LLMs in strong-to-weak settings, and weak-to-strong generalizations on knowledge-related metrics are observed. The paper concludes that achieving a commendable role-play performance requires a strong foundational model, with SFT data not constituting the central bottleneck. The authors also discuss the limitations of their approach, noting that the best DITTO model based on Qwen-72B-Chat is still outperformed by advanced chatbots such as GPT-4 and GPT-4-Turbo. However, the training data contains noticeable noise, and manual cleaning of the self-generated dialogue simulation is expected to further boost the performance of DITTO. The paper also discusses ethical considerations, noting that role-play LLMs aligned by DITTO may only have minimum safety alignment and could generate toxic and harmful contents under induction. Therefore, these role-play LLMs are only for research purposes and should be carefully aligned in terms of safety in the future.