May 1997 | Diane M. Strong, Yang W. Lee, and Richard Y. Wang
A new study highlights that businesses are defining data quality with the consumer in mind. Data quality (DQ) problems are increasingly evident, particularly in organizational databases. For example, 50% to 80% of computerized criminal records in the U.S. were found to be inaccurate, incomplete, or ambiguous. Poor-quality data has significant social and economic impacts, costing billions of dollars. Organizational databases are part of a larger information systems (IS) context, where data is collected from multiple sources and stored in databases. From this stored data, useful information is generated for organizational decision-making.
DQ problems can arise anywhere in this IS context. Therefore, the study argues for a conceptualization of DQ that includes this context. Database research aims to ensure the quality of data in databases. Existing research investigates DQ definitions, modeling, and control. However, DQ is often treated as an intrinsic concept, independent of the context in which data is produced and used. This focus on intrinsic DQ problems in stored data fails to solve complex organizational problems for storing and processing data; and data consumers (people or groups who use data). Each role is associated with a process or task: data producers are associated with data-production processes; data custodians with data storage, maintenance, and security; and data consumers with data-utilization processes, which may involve additional data aggregation and integration.
The study defines high-quality data as data that is fit for use by data consumers. It argues that this failure is partly due to the lack of a broader DQ conceptualization. When quality problems are defined as errors in stored data, IS professionals may not recognize, and thus solve, the most critical DQ problems in organizations.
The study examined DQ projects from three leading-edge organizations and identified common patterns of quality problems. These patterns emerged because the study used a broader conceptualization of DQ. Based on these patterns, the study developed recommendations for IS professionals to improve DQ from the perspective of data consumers.
The study defines DQ problems as any difficulty encountered along one or more quality dimensions that renders data completely or largely unfit for use. It defines a DQ project as organizational actions taken to address a DQ problem given some recognition of poor DQ by the organization. The study studied 42 DQ projects from three data-rich organizations: GoldenAir, BetterCare, and HyCare. These organizations are leaders in their industries and exhibit sufficient variation for investigating data projects.
The study employed qualitative data collection and analysis techniques. It collected data about these projects via interviews of data producers, custodians, consumers, and managers. It organized each DQ project in terms of three problem-solving steps: problem finding, problem analysis, and problem resolution. Each project was analyzed using the DQ dimensions as content analysis codes. From the coded projects, the study identified common patterns and sequences of dimensions attended to during DQ projects.
The study found that data consumers' complaints aboutA new study highlights that businesses are defining data quality with the consumer in mind. Data quality (DQ) problems are increasingly evident, particularly in organizational databases. For example, 50% to 80% of computerized criminal records in the U.S. were found to be inaccurate, incomplete, or ambiguous. Poor-quality data has significant social and economic impacts, costing billions of dollars. Organizational databases are part of a larger information systems (IS) context, where data is collected from multiple sources and stored in databases. From this stored data, useful information is generated for organizational decision-making.
DQ problems can arise anywhere in this IS context. Therefore, the study argues for a conceptualization of DQ that includes this context. Database research aims to ensure the quality of data in databases. Existing research investigates DQ definitions, modeling, and control. However, DQ is often treated as an intrinsic concept, independent of the context in which data is produced and used. This focus on intrinsic DQ problems in stored data fails to solve complex organizational problems for storing and processing data; and data consumers (people or groups who use data). Each role is associated with a process or task: data producers are associated with data-production processes; data custodians with data storage, maintenance, and security; and data consumers with data-utilization processes, which may involve additional data aggregation and integration.
The study defines high-quality data as data that is fit for use by data consumers. It argues that this failure is partly due to the lack of a broader DQ conceptualization. When quality problems are defined as errors in stored data, IS professionals may not recognize, and thus solve, the most critical DQ problems in organizations.
The study examined DQ projects from three leading-edge organizations and identified common patterns of quality problems. These patterns emerged because the study used a broader conceptualization of DQ. Based on these patterns, the study developed recommendations for IS professionals to improve DQ from the perspective of data consumers.
The study defines DQ problems as any difficulty encountered along one or more quality dimensions that renders data completely or largely unfit for use. It defines a DQ project as organizational actions taken to address a DQ problem given some recognition of poor DQ by the organization. The study studied 42 DQ projects from three data-rich organizations: GoldenAir, BetterCare, and HyCare. These organizations are leaders in their industries and exhibit sufficient variation for investigating data projects.
The study employed qualitative data collection and analysis techniques. It collected data about these projects via interviews of data producers, custodians, consumers, and managers. It organized each DQ project in terms of three problem-solving steps: problem finding, problem analysis, and problem resolution. Each project was analyzed using the DQ dimensions as content analysis codes. From the coded projects, the study identified common patterns and sequences of dimensions attended to during DQ projects.
The study found that data consumers' complaints about