SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines

SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines

31 Mar 2024 | Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J.D. Zamfirescu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, Eugene Wu
**Abstract:** Large language models (LLMs) are increasingly used in pipelines that process or generate data, but they often make unpredictable errors. To address this, the authors propose *data quality assertions* to identify when LLMs may be making mistakes. They present SPADE, a method for automatically synthesizing these assertions. SPADE analyzes histories of prompt versions over time to create candidate assertion functions and then selects a minimal set that meets coverage and accuracy requirements. Testing across nine real-world LLM pipelines shows that SPADE reduces the number of assertions by 14% and decreases failure rates by 21% compared to simpler baselines. SPADE has been deployed within LangSmith, LangChain’s LLM pipeline hub, and has been used to generate data quality assertions for over 2000 pipelines across various industries. **Introduction:** The paper discusses the challenges of deploying LLM pipelines due to data quality errors. It introduces the concept of data quality assertions and presents SPADE, a system for automatically generating these assertions. SPADE leverages prompt version histories to identify assertion criteria and uses an automated approach to filter out redundant and incorrect assertions. The authors analyze prompt deltas from 19 LLM pipelines to construct a taxonomy of assertion criteria and demonstrate the effectiveness of SPADE on nine real-world pipelines, showing significant improvements in assertion coverage and accuracy.**Abstract:** Large language models (LLMs) are increasingly used in pipelines that process or generate data, but they often make unpredictable errors. To address this, the authors propose *data quality assertions* to identify when LLMs may be making mistakes. They present SPADE, a method for automatically synthesizing these assertions. SPADE analyzes histories of prompt versions over time to create candidate assertion functions and then selects a minimal set that meets coverage and accuracy requirements. Testing across nine real-world LLM pipelines shows that SPADE reduces the number of assertions by 14% and decreases failure rates by 21% compared to simpler baselines. SPADE has been deployed within LangSmith, LangChain’s LLM pipeline hub, and has been used to generate data quality assertions for over 2000 pipelines across various industries. **Introduction:** The paper discusses the challenges of deploying LLM pipelines due to data quality errors. It introduces the concept of data quality assertions and presents SPADE, a system for automatically generating these assertions. SPADE leverages prompt version histories to identify assertion criteria and uses an automated approach to filter out redundant and incorrect assertions. The authors analyze prompt deltas from 19 LLM pipelines to construct a taxonomy of assertion criteria and demonstrate the effectiveness of SPADE on nine real-world pipelines, showing significant improvements in assertion coverage and accuracy.
Reach us at info@study.space
[slides] SPADE%3A Synthesizing Data Quality Assertions for Large Language Model Pipelines | StudySpace