2002 | Joseph L. Durant, Burton A. Leland, Douglas R. Henry, and James G. Nourse
The paper discusses the reoptimization of MDL keysets for use in drug discovery, focusing on improving their performance in molecular similarity searches. The authors report on the optimization of 166-bit and 960-bit keysets, which were originally designed for substructure searching. They present an overview of the technology behind the keysets, which involves encoding molecular descriptors into binary keybits. The performance of various keysets was evaluated using a test dataset of 957 compounds, with the success measure defined by Briem and Lessel. The results show that the performance of keysets is not significantly affected by their size beyond 1000 bits. Surprisal S/N pruning, which eliminates keybits based on the surprisal signal-to-noise ratio, outperforms random pruning and other pruning methods. Genetic algorithm optimization also produced keysets with better performance, but no single globally optimal keyset was identified. The best keyset had a success measure of 0.711 and contained 548 keybits. The study suggests that keyset optimization should be driven by known constraints rather than autonomous methods.The paper discusses the reoptimization of MDL keysets for use in drug discovery, focusing on improving their performance in molecular similarity searches. The authors report on the optimization of 166-bit and 960-bit keysets, which were originally designed for substructure searching. They present an overview of the technology behind the keysets, which involves encoding molecular descriptors into binary keybits. The performance of various keysets was evaluated using a test dataset of 957 compounds, with the success measure defined by Briem and Lessel. The results show that the performance of keysets is not significantly affected by their size beyond 1000 bits. Surprisal S/N pruning, which eliminates keybits based on the surprisal signal-to-noise ratio, outperforms random pruning and other pruning methods. Genetic algorithm optimization also produced keysets with better performance, but no single globally optimal keyset was identified. The best keyset had a success measure of 0.711 and contained 548 keybits. The study suggests that keyset optimization should be driven by known constraints rather than autonomous methods.