SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines

SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines

31 Mar 2024 | Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J.D. Zamfirescu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, Eugene Wu
SPADE is a system for automatically synthesizing data quality assertions for large language model (LLM) pipelines. The goal is to identify when LLMs may be making mistakes and reduce the number of assertions while maintaining accuracy. SPADE analyzes prompt version histories to create candidate assertion functions and selects a minimal set that fulfills coverage and accuracy requirements. It has been deployed in LangSmith, LangChain's LLM pipeline hub, and has been used to generate data quality assertions for over 2000 pipelines across various industries. SPADE leverages the observation that developers often identify data quality issues during prototyping and attempt to address them by adding instructions to the LLM prompt. It analyzes prompt deltas (differences between consecutive prompt versions) to generate candidate assertions. These deltas are categorized into structural and content-based types, with content-based deltas indicating changes in the meaning or definition of the task. SPADE generates candidate assertions by analyzing prompt deltas and then filters them to ensure they are accurate and not redundant. It uses a two-step process: first, it prompts an LLM to generate natural language descriptions of assertion criteria, and then it prompts the LLM to generate Python functions that implement these criteria. The system then selects a minimal set of assertions that meet coverage and accuracy requirements. SPADE has been tested on nine real-world LLM pipelines, with results showing that it efficiently reduces the number of assertions by 14% and decreases false failures by 21% compared to simpler baselines. It has also been used to generate data quality assertions for over 2000 pipelines across various industries. The system is designed to be efficient and scalable, with the ability to handle large numbers of assertions and examples. It has been deployed in production and has been used to improve the quality of LLM pipelines in various applications.SPADE is a system for automatically synthesizing data quality assertions for large language model (LLM) pipelines. The goal is to identify when LLMs may be making mistakes and reduce the number of assertions while maintaining accuracy. SPADE analyzes prompt version histories to create candidate assertion functions and selects a minimal set that fulfills coverage and accuracy requirements. It has been deployed in LangSmith, LangChain's LLM pipeline hub, and has been used to generate data quality assertions for over 2000 pipelines across various industries. SPADE leverages the observation that developers often identify data quality issues during prototyping and attempt to address them by adding instructions to the LLM prompt. It analyzes prompt deltas (differences between consecutive prompt versions) to generate candidate assertions. These deltas are categorized into structural and content-based types, with content-based deltas indicating changes in the meaning or definition of the task. SPADE generates candidate assertions by analyzing prompt deltas and then filters them to ensure they are accurate and not redundant. It uses a two-step process: first, it prompts an LLM to generate natural language descriptions of assertion criteria, and then it prompts the LLM to generate Python functions that implement these criteria. The system then selects a minimal set of assertions that meet coverage and accuracy requirements. SPADE has been tested on nine real-world LLM pipelines, with results showing that it efficiently reduces the number of assertions by 14% and decreases false failures by 21% compared to simpler baselines. It has also been used to generate data quality assertions for over 2000 pipelines across various industries. The system is designed to be efficient and scalable, with the ability to handle large numbers of assertions and examples. It has been deployed in production and has been used to improve the quality of LLM pipelines in various applications.
Reach us at info@study.space
[slides] SPADE%3A Synthesizing Data Quality Assertions for Large Language Model Pipelines | StudySpace