Recommender systems are widely used in commercial and research settings, with various approaches available for providing recommendations. When selecting an appropriate algorithm, it is essential to focus on the properties of the application that matter most. These properties include accuracy, robustness, scalability, and others that affect user experience. This paper discusses how to compare recommenders based on relevant properties, focusing on comparative studies rather than absolute benchmarking. It describes experimental settings for choosing between algorithms, including offline experiments, user studies, and large-scale online experiments. Each type of experiment allows for answering specific questions and suggests protocols for experimentation. Trustworthy conclusions can be drawn from these experiments by considering the relevant properties. The paper also reviews a large set of properties and evaluation metrics, explaining how to evaluate systems based on these properties.
Recommender systems are used in various applications, such as Netflix, Amazon, and Microsoft, to help users decide on items they might prefer. Over the past decade, there has been extensive research in this field, focusing on designing new algorithms. Application designers must choose the most suitable algorithm based on performance and constraints such as data availability, memory, and CPU usage. Researchers also compare new algorithms to existing ones using evaluation metrics. While accurate predictions were initially the main focus, it is now recognized that other properties, such as discovery of new items, diversity, privacy, and response speed, are also important. Therefore, identifying the relevant properties for a specific application is crucial for evaluating the success of a recommender system. This paper reviews the process of evaluating a recommendation system, discussing three types of experiments: offline, user studies, and online experiments.Recommender systems are widely used in commercial and research settings, with various approaches available for providing recommendations. When selecting an appropriate algorithm, it is essential to focus on the properties of the application that matter most. These properties include accuracy, robustness, scalability, and others that affect user experience. This paper discusses how to compare recommenders based on relevant properties, focusing on comparative studies rather than absolute benchmarking. It describes experimental settings for choosing between algorithms, including offline experiments, user studies, and large-scale online experiments. Each type of experiment allows for answering specific questions and suggests protocols for experimentation. Trustworthy conclusions can be drawn from these experiments by considering the relevant properties. The paper also reviews a large set of properties and evaluation metrics, explaining how to evaluate systems based on these properties.
Recommender systems are used in various applications, such as Netflix, Amazon, and Microsoft, to help users decide on items they might prefer. Over the past decade, there has been extensive research in this field, focusing on designing new algorithms. Application designers must choose the most suitable algorithm based on performance and constraints such as data availability, memory, and CPU usage. Researchers also compare new algorithms to existing ones using evaluation metrics. While accurate predictions were initially the main focus, it is now recognized that other properties, such as discovery of new items, diversity, privacy, and response speed, are also important. Therefore, identifying the relevant properties for a specific application is crucial for evaluating the success of a recommender system. This paper reviews the process of evaluating a recommendation system, discussing three types of experiments: offline, user studies, and online experiments.