March 19, 2024 | Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang
Evaluatology: The Science and Engineering of Evaluation
Evaluatology is a discipline that encompasses the science and engineering of evaluation. It aims to provide a universal framework for evaluation, including concepts, terminologies, theories, and methodologies that can be applied across various disciplines. The essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition (EC) to diverse subjects and infer the impact of different subjects by measuring and/or testing. Derived from the essence of evaluation, five axioms are proposed as the foundational principles of evaluation theory. These axioms focus on key aspects of evaluation outcomes, including true quantity, traceability of discrepancy, comparability, and realistic estimate.
The article introduces the concept of benchmarkology, which is a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines. The article also discusses the principles and methodologies of benchmarkology, including the establishment of equivalent evaluation conditions (EECs), the least equivalent evaluation condition (LEEC), and the reference evaluation model (REM). The article highlights the importance of establishing a series of evaluation models that maintain transitivity in complex scenarios.
The article also discusses the challenges of evaluating complex scenarios, including the presence of numerous confounding variables, the challenges of establishing an REM, high evaluation costs resulting from the huge state spaces, multiple irrelevant concurrent problems or tasks taking place, and the tendency to bias specific clusters of EC state space. The article proposes a pragmatic EM that simplifies the perfect EM in two ways: reducing the number of independent variables that have negligible effect and sampling from the extensive state space. A pragmatic EM provides a realistic estimate of the parameters of the real-world ES.
The article also discusses the four fundamental issues in evaluations and formally formulates the problems mathematically: ensure the transitivity of EMs; perform a cost-efficient evaluation with controlled discrepancies; ensure the evaluation traceability; connect and correlate evaluation standards across diverse disciplines. The article concludes with an overview of the article's structure, including the background, theoretical and methodological framework for evaluatology, principles and methodologies of benchmarkology, state-of-the-art and state-of-the-practice evaluations and benchmarks, and the overarching conclusion.Evaluatology: The Science and Engineering of Evaluation
Evaluatology is a discipline that encompasses the science and engineering of evaluation. It aims to provide a universal framework for evaluation, including concepts, terminologies, theories, and methodologies that can be applied across various disciplines. The essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition (EC) to diverse subjects and infer the impact of different subjects by measuring and/or testing. Derived from the essence of evaluation, five axioms are proposed as the foundational principles of evaluation theory. These axioms focus on key aspects of evaluation outcomes, including true quantity, traceability of discrepancy, comparability, and realistic estimate.
The article introduces the concept of benchmarkology, which is a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines. The article also discusses the principles and methodologies of benchmarkology, including the establishment of equivalent evaluation conditions (EECs), the least equivalent evaluation condition (LEEC), and the reference evaluation model (REM). The article highlights the importance of establishing a series of evaluation models that maintain transitivity in complex scenarios.
The article also discusses the challenges of evaluating complex scenarios, including the presence of numerous confounding variables, the challenges of establishing an REM, high evaluation costs resulting from the huge state spaces, multiple irrelevant concurrent problems or tasks taking place, and the tendency to bias specific clusters of EC state space. The article proposes a pragmatic EM that simplifies the perfect EM in two ways: reducing the number of independent variables that have negligible effect and sampling from the extensive state space. A pragmatic EM provides a realistic estimate of the parameters of the real-world ES.
The article also discusses the four fundamental issues in evaluations and formally formulates the problems mathematically: ensure the transitivity of EMs; perform a cost-efficient evaluation with controlled discrepancies; ensure the evaluation traceability; connect and correlate evaluation standards across diverse disciplines. The article concludes with an overview of the article's structure, including the background, theoretical and methodological framework for evaluatology, principles and methodologies of benchmarkology, state-of-the-art and state-of-the-practice evaluations and benchmarks, and the overarching conclusion.