FEBRUARY 6, 2024 | SAYASH KAPOOR, PETER HENDERSON, ARVIND NARAYANAN
Artificial intelligence (AI) is not yet ready to redefine the legal profession, according to a recent paper. The authors examine three types of legal tasks where AI is increasingly used: information processing, tasks involving creativity, reasoning, or judgment, and predictions about the future. They argue that the ease of evaluating AI applications varies greatly depending on the clarity of correct answers and the availability of relevant information. Tasks that are hardest to evaluate are also the ones most prone to overoptimism about AI capabilities, as they are more complex and less clear-cut.
The paper highlights the challenges of evaluating AI in legal contexts, particularly for tasks involving creativity, reasoning, or judgment, where there is no single correct answer. It also discusses the limitations of current AI evaluations, such as contamination (using the same data for training and evaluation), lack of construct validity, and prompt sensitivity. These issues make it difficult to trust the performance of AI in real-world legal tasks.
The authors recommend that evaluations of AI in legal contexts should be based on real-world use and involve legal experts to improve construct validity. They also emphasize the need for naturalistic evaluations and better communication of the limitations of current AI systems. Additionally, they suggest that AI should be used in narrow settings with well-defined outcomes and high observability of evidence, such as checking errors in legal documents.
The paper also discusses the use of AI for predicting court outcomes and making decisions about people, such as pre-trial detention and parole. It highlights the challenges of evaluating such applications, including the risk of contamination and the difficulty of ensuring fairness and accuracy. The authors argue that predictive AI in the legal domain needs to be held to a much higher standard to ensure it functions as its developers claim.
Overall, the paper calls for a shift from technical evaluations to robust socio-technical assessments in legal contexts, emphasizing the need for transparency, contestability, and evaluations that consider the societal impact of AI applications.Artificial intelligence (AI) is not yet ready to redefine the legal profession, according to a recent paper. The authors examine three types of legal tasks where AI is increasingly used: information processing, tasks involving creativity, reasoning, or judgment, and predictions about the future. They argue that the ease of evaluating AI applications varies greatly depending on the clarity of correct answers and the availability of relevant information. Tasks that are hardest to evaluate are also the ones most prone to overoptimism about AI capabilities, as they are more complex and less clear-cut.
The paper highlights the challenges of evaluating AI in legal contexts, particularly for tasks involving creativity, reasoning, or judgment, where there is no single correct answer. It also discusses the limitations of current AI evaluations, such as contamination (using the same data for training and evaluation), lack of construct validity, and prompt sensitivity. These issues make it difficult to trust the performance of AI in real-world legal tasks.
The authors recommend that evaluations of AI in legal contexts should be based on real-world use and involve legal experts to improve construct validity. They also emphasize the need for naturalistic evaluations and better communication of the limitations of current AI systems. Additionally, they suggest that AI should be used in narrow settings with well-defined outcomes and high observability of evidence, such as checking errors in legal documents.
The paper also discusses the use of AI for predicting court outcomes and making decisions about people, such as pre-trial detention and parole. It highlights the challenges of evaluating such applications, including the risk of contamination and the difficulty of ensuring fairness and accuracy. The authors argue that predictive AI in the legal domain needs to be held to a much higher standard to ensure it functions as its developers claim.
Overall, the paper calls for a shift from technical evaluations to robust socio-technical assessments in legal contexts, emphasizing the need for transparency, contestability, and evaluations that consider the societal impact of AI applications.