[slides] Assessing and Understanding Creativity in Large Language Models

This paper addresses the assessment of creativity in large language models (LLMs), a critical yet underexplored area in natural language processing. The authors propose an efficient framework to evaluate LLM creativity by adapting the modified Torrance Tests of Creative Thinking (TTCT). They construct a comprehensive dataset of 700 questions across seven tasks, focusing on four criteria: Fluency, Flexibility, Originality, and Elaboration. The study uses GPT-4 to evaluate the responses of six advanced LLMs, including GPT-3.5, LLaMA-2, Vicuna, and Qwen. Key findings include: 1. **Model Differences**: The creativity levels of different LLMs vary significantly, with GPT-3.5 performing the best, followed by LLaMA-2, Vicuna, and Qwen. The type of model architecture and training process are key determinants of creativity. 2. **Task-Specific Creativity**: LLMs generally excel in Elaboration but struggle with Originality. Different tasks present varying levels of creative difficulty, with Common Problem, Consequences, and Unusual Uses tasks showing higher creativity levels. 3. **Prompt Type Impact**: Instructive and Chain of Thought (CoT) prompts enhance creativity in most tasks, while post-instruction prompts have mixed effects. 4. **Role-Playing Effects**: Assigning specific roles to LLMs, such as scientist, significantly improves creativity, while other roles like farmer and merchant show lower creativity. 5. **Collaboration Benefits**: Collaborating multiple LLMs can enhance creativity, particularly in terms of Originality. 6. **Personality Traits**: LLMs exhibit correlations with human personality traits, such as emotional intelligence, empathy, and self-efficacy, suggesting a link between cognitive and affective factors. The study highlights the importance of prompt engineering and role-playing in enhancing LLM creativity and underscores the need for further research to bridge the gap between AI and human creativity.This paper addresses the assessment of creativity in large language models (LLMs), a critical yet underexplored area in natural language processing. The authors propose an efficient framework to evaluate LLM creativity by adapting the modified Torrance Tests of Creative Thinking (TTCT). They construct a comprehensive dataset of 700 questions across seven tasks, focusing on four criteria: Fluency, Flexibility, Originality, and Elaboration. The study uses GPT-4 to evaluate the responses of six advanced LLMs, including GPT-3.5, LLaMA-2, Vicuna, and Qwen. Key findings include: 1. **Model Differences**: The creativity levels of different LLMs vary significantly, with GPT-3.5 performing the best, followed by LLaMA-2, Vicuna, and Qwen. The type of model architecture and training process are key determinants of creativity. 2. **Task-Specific Creativity**: LLMs generally excel in Elaboration but struggle with Originality. Different tasks present varying levels of creative difficulty, with Common Problem, Consequences, and Unusual Uses tasks showing higher creativity levels. 3. **Prompt Type Impact**: Instructive and Chain of Thought (CoT) prompts enhance creativity in most tasks, while post-instruction prompts have mixed effects. 4. **Role-Playing Effects**: Assigning specific roles to LLMs, such as scientist, significantly improves creativity, while other roles like farmer and merchant show lower creativity. 5. **Collaboration Benefits**: Collaborating multiple LLMs can enhance creativity, particularly in terms of Originality. 6. **Personality Traits**: LLMs exhibit correlations with human personality traits, such as emotional intelligence, empathy, and self-efficacy, suggesting a link between cognitive and affective factors. The study highlights the importance of prompt engineering and role-playing in enhancing LLM creativity and underscores the need for further research to bridge the gap between AI and human creativity.

Assessing and Understanding Creativity in Large Language Models

23 Jan 2024 | Yunpu Zhao, Rui Zhang, Wenyi Li, Di Huang, Jiaming Guo, Shaohui Peng, Yifan Hao, Yuanbo Wen, Xing Hu, Zidong Du, Qi Guo, Ling Li and Yunji Chen