SOTOPIA-π is an interactive learning method designed to enhance the social intelligence of language agents. It leverages behavior cloning and self-reinforcement training on filtered social interaction data based on large language model (LLM) ratings. The method improves the social goal completion ability of a 7B LLM to match that of an expert model (GPT-4-based agent), while enhancing safety and maintaining general QA ability on the MMLU benchmark. The training process involves generating new social tasks, collecting data from expert and agent policies, and updating the agent policy based on GPT-4 ratings. The method also reveals that LLM-based evaluators overestimate the abilities of language agents trained for social interaction. SOTOPIA-π improves social intelligence through interactive learning, including behavior cloning and self-reinforcement. It demonstrates that the method enhances safety and reduces toxicity in social tasks, while preserving the original question-answering ability of the models. The method is efficient and scalable, as it uses offline data and enables language agents to explore and reinforce social knowledge. The results show that SOTOPIA-π improves the social goal completion ability of language agents, with the best model approaching the performance of GPT-4. The method also improves other social dimensions such as believability, relationship, and social rules. The study highlights the need for alternative evaluator models that can robustly assess social interaction. SOTOPIA-π also improves the safety of language agents and reduces the toxicity of responses in social tasks. The method is effective in improving social intelligence without compromising general QA ability. The study also addresses the limitations of using LLMs as evaluators and the potential for social biases in the interactive system. The research emphasizes the importance of ethical considerations in AI development, including the need to avoid biases and ensure the safe and ethical use of AI. The study concludes that SOTOPIA-π is a promising approach for improving the social intelligence of language agents, with potential for future research in areas such as online reinforcement learning, learning from humans, and deriving safety metrics for all social tasks.SOTOPIA-π is an interactive learning method designed to enhance the social intelligence of language agents. It leverages behavior cloning and self-reinforcement training on filtered social interaction data based on large language model (LLM) ratings. The method improves the social goal completion ability of a 7B LLM to match that of an expert model (GPT-4-based agent), while enhancing safety and maintaining general QA ability on the MMLU benchmark. The training process involves generating new social tasks, collecting data from expert and agent policies, and updating the agent policy based on GPT-4 ratings. The method also reveals that LLM-based evaluators overestimate the abilities of language agents trained for social interaction. SOTOPIA-π improves social intelligence through interactive learning, including behavior cloning and self-reinforcement. It demonstrates that the method enhances safety and reduces toxicity in social tasks, while preserving the original question-answering ability of the models. The method is efficient and scalable, as it uses offline data and enables language agents to explore and reinforce social knowledge. The results show that SOTOPIA-π improves the social goal completion ability of language agents, with the best model approaching the performance of GPT-4. The method also improves other social dimensions such as believability, relationship, and social rules. The study highlights the need for alternative evaluator models that can robustly assess social interaction. SOTOPIA-π also improves the safety of language agents and reduces the toxicity of responses in social tasks. The method is effective in improving social intelligence without compromising general QA ability. The study also addresses the limitations of using LLMs as evaluators and the potential for social biases in the interactive system. The research emphasizes the importance of ethical considerations in AI development, including the need to avoid biases and ensure the safe and ethical use of AI. The study concludes that SOTOPIA-π is a promising approach for improving the social intelligence of language agents, with potential for future research in areas such as online reinforcement learning, learning from humans, and deriving safety metrics for all social tasks.