[slides and audio] The Better Angels of Machine Personality%3A How Personality Relates to LLM Safety

This paper explores the relationship between the personality traits of Large Language Models (LLMs) and their safety capabilities, such as toxicity, privacy, and fairness. The authors use the Myers-Briggs Type Indicator (MBTI) scale to assess LLMs' personality traits and find that these traits are closely related to their safety performance. Specifically, safety alignment tends to increase LLMs' Extraversion, Sensing, and Judging traits. The study also demonstrates that editing specific personality traits can enhance LLMs' safety capabilities. For example, changing an LLM's personality from ISTJ to ISTP improved privacy and fairness performance by approximately 43% and 10%, respectively. Additionally, the study reveals that LLMs with different personality traits are differently susceptible to jailbreak attacks. This research provides new insights into enhancing LLM safety from a personality perspective, suggesting that considering personality traits in LLM design could improve their trustworthiness and safety.This paper explores the relationship between the personality traits of Large Language Models (LLMs) and their safety capabilities, such as toxicity, privacy, and fairness. The authors use the Myers-Briggs Type Indicator (MBTI) scale to assess LLMs' personality traits and find that these traits are closely related to their safety performance. Specifically, safety alignment tends to increase LLMs' Extraversion, Sensing, and Judging traits. The study also demonstrates that editing specific personality traits can enhance LLMs' safety capabilities. For example, changing an LLM's personality from ISTJ to ISTP improved privacy and fairness performance by approximately 43% and 10%, respectively. Additionally, the study reveals that LLMs with different personality traits are differently susceptible to jailbreak attacks. This research provides new insights into enhancing LLM safety from a personality perspective, suggesting that considering personality traits in LLM design could improve their trustworthiness and safety.

The Better Angels of Machine Personality: How Personality Relates to LLM Safety

17 Jul 2024 | Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao