FEBRUARY 6, 2024 | SAYASH KAPOOR1, PETER HENDERSON, ARVIND NARAYANAN
The paper "Promises and Pitfalls of Artificial Intelligence for Legal Applications" by Sayash Kapoor, Peter Henderson, and Arvind Narayanan from Princeton University explores the potential and limitations of AI in legal contexts. The authors argue that while AI is increasingly used in legal tasks, its impact on the legal profession is not as significant as often claimed. They analyze three types of legal tasks: information processing, tasks involving creativity, reasoning, or judgment, and predictions about the future. The evaluation of these tasks varies in difficulty, with tasks that are harder to evaluate also being those that could lead to the most significant changes in the legal profession.
The paper highlights several challenges in evaluating AI in legal settings, including overoptimism about AI capabilities, contamination in training and evaluation datasets, and the lack of construct validity in benchmarks. It provides recommendations for better evaluation and deployment of AI in legal contexts, such as involving legal experts in evaluation, developing naturalistic evaluation methods, and communicating the limitations of current LLMs.
For information processing tasks, the authors note that while generative AI can improve accuracy and reduce costs, it does not drastically change the nature of these tasks. For tasks involving creativity, reasoning, or judgment, the impact on the legal profession could be substantial, but current evaluations often suffer from issues like contamination and lack of construct validity. For predictions about the future, the authors emphasize the low accuracy and potential biases in AI-based predictions, particularly in criminal justice settings.
The paper concludes that effective deployment of AI in legal contexts requires robust socio-technical assessments tailored to the specific context in which the AI system will be used. The authors stress the need for stronger transparency, clear mechanisms for contestability, and evaluations that go beyond technical specifications to consider societal impacts.The paper "Promises and Pitfalls of Artificial Intelligence for Legal Applications" by Sayash Kapoor, Peter Henderson, and Arvind Narayanan from Princeton University explores the potential and limitations of AI in legal contexts. The authors argue that while AI is increasingly used in legal tasks, its impact on the legal profession is not as significant as often claimed. They analyze three types of legal tasks: information processing, tasks involving creativity, reasoning, or judgment, and predictions about the future. The evaluation of these tasks varies in difficulty, with tasks that are harder to evaluate also being those that could lead to the most significant changes in the legal profession.
The paper highlights several challenges in evaluating AI in legal settings, including overoptimism about AI capabilities, contamination in training and evaluation datasets, and the lack of construct validity in benchmarks. It provides recommendations for better evaluation and deployment of AI in legal contexts, such as involving legal experts in evaluation, developing naturalistic evaluation methods, and communicating the limitations of current LLMs.
For information processing tasks, the authors note that while generative AI can improve accuracy and reduce costs, it does not drastically change the nature of these tasks. For tasks involving creativity, reasoning, or judgment, the impact on the legal profession could be substantial, but current evaluations often suffer from issues like contamination and lack of construct validity. For predictions about the future, the authors emphasize the low accuracy and potential biases in AI-based predictions, particularly in criminal justice settings.
The paper concludes that effective deployment of AI in legal contexts requires robust socio-technical assessments tailored to the specific context in which the AI system will be used. The authors stress the need for stronger transparency, clear mechanisms for contestability, and evaluations that go beyond technical specifications to consider societal impacts.