[slides] Query by committee

The paper introduces the "query by committee" algorithm, where a group of learners (committee) is trained on the same data. The next query is selected to maximize disagreement among the committee members. This approach is analyzed using two toy models: the high-low game and perceptron learning. As the number of queries increases, the algorithm achieves asymptotically finite information gain, leading to exponential decay in generalization error. This is in contrast to learning from random inputs, where information gain approaches zero and generalization error decreases slowly with a power law. The algorithm is based on the principle that the disagreement among committee members reflects the information content of a query. In the high-low game, the committee algorithm achieves maximal information gain by bisecting the version space, leading to exponential reduction in the volume of the version space. For perceptron learning, the algorithm also achieves asymptotically finite information gain, leading to exponential generalization error decay. The information gain is maximized when the committee divides the version space equally, leading to a bound of one bit per query. The paper shows that the generalization error for the committee algorithm is exponentially related to the information gain, unlike the random input case. The committee algorithm is more efficient than other query algorithms in terms of generalization performance and query time. The results suggest that asymptotically finite information gain is a key characteristic of effective query algorithms. The study provides insights into the relationship between information gain and generalization error in learning models.The paper introduces the "query by committee" algorithm, where a group of learners (committee) is trained on the same data. The next query is selected to maximize disagreement among the committee members. This approach is analyzed using two toy models: the high-low game and perceptron learning. As the number of queries increases, the algorithm achieves asymptotically finite information gain, leading to exponential decay in generalization error. This is in contrast to learning from random inputs, where information gain approaches zero and generalization error decreases slowly with a power law. The algorithm is based on the principle that the disagreement among committee members reflects the information content of a query. In the high-low game, the committee algorithm achieves maximal information gain by bisecting the version space, leading to exponential reduction in the volume of the version space. For perceptron learning, the algorithm also achieves asymptotically finite information gain, leading to exponential generalization error decay. The information gain is maximized when the committee divides the version space equally, leading to a bound of one bit per query. The paper shows that the generalization error for the committee algorithm is exponentially related to the information gain, unlike the random input case. The committee algorithm is more efficient than other query algorithms in terms of generalization performance and query time. The results suggest that asymptotically finite information gain is a key characteristic of effective query algorithms. The study provides insights into the relationship between information gain and generalization error in learning models.

Query by Committee

1992 | H. S. Seung, M. Opper, H. Sompolinsky