[slides] Chemical Similarity Searching

This paper reviews the use of similarity searching in chemical databases. It begins by introducing the concept of similarity searching, differentiating it from the more common substructure searching, and then discusses the current generation of fragment-based measures used for searching chemical structure databases. The next sections focus on two principal characteristics of a similarity measure: the coefficient used to quantify the degree of structural resemblance between pairs of molecules and the structural representations used to characterize molecules being compared. New types of similarity measures are compared with current approaches, and examples of several applications related to similarity searching are given. Substructure searching has limitations, such as requiring a database structure to contain the entire query substructure, leading to a simple partition of the database into two subsets. In contrast, similarity searching allows for a ranked list of molecules most similar to the target structure, enabling the identification of potential bioactives. Similarity searching is based on structural descriptors, which are compared to those of database molecules to calculate similarity. This method is more flexible and effective for identifying molecules with similar structures, especially when only a single bioactive molecule is available. Fragment-based similarity searching has been widely used, with examples from Lederle and Pfizer. These methods use fragment substructures to quantify structural resemblance. The Tanimoto coefficient is a common measure used in similarity searching, which calculates the similarity between two molecules based on the number of shared features. Other coefficients, such as the Cosine and Dice coefficients, are also used, with the Tanimoto coefficient being preferred for its efficiency and accuracy. The paper discusses various structural representations used in similarity searching, including 2D and 3D fragment descriptors, physicochemical properties, and topological indices. These representations are crucial for differentiating molecules and calculating their similarity. The paper also highlights the importance of choosing appropriate descriptors and encoding methods for effective similarity searching. The use of descriptors such as atom pairs, topological torsions, and 3D shape descriptors is discussed, along with their effectiveness in identifying similar molecules. The paper concludes by emphasizing the need for further research and development in similarity searching, particularly in the areas of descriptor selection, encoding methods, and the application of similarity measures in chemical databases. The discussion highlights the importance of using appropriate descriptors and encoding methods to ensure accurate and efficient similarity searching in chemical databases.This paper reviews the use of similarity searching in chemical databases. It begins by introducing the concept of similarity searching, differentiating it from the more common substructure searching, and then discusses the current generation of fragment-based measures used for searching chemical structure databases. The next sections focus on two principal characteristics of a similarity measure: the coefficient used to quantify the degree of structural resemblance between pairs of molecules and the structural representations used to characterize molecules being compared. New types of similarity measures are compared with current approaches, and examples of several applications related to similarity searching are given. Substructure searching has limitations, such as requiring a database structure to contain the entire query substructure, leading to a simple partition of the database into two subsets. In contrast, similarity searching allows for a ranked list of molecules most similar to the target structure, enabling the identification of potential bioactives. Similarity searching is based on structural descriptors, which are compared to those of database molecules to calculate similarity. This method is more flexible and effective for identifying molecules with similar structures, especially when only a single bioactive molecule is available. Fragment-based similarity searching has been widely used, with examples from Lederle and Pfizer. These methods use fragment substructures to quantify structural resemblance. The Tanimoto coefficient is a common measure used in similarity searching, which calculates the similarity between two molecules based on the number of shared features. Other coefficients, such as the Cosine and Dice coefficients, are also used, with the Tanimoto coefficient being preferred for its efficiency and accuracy. The paper discusses various structural representations used in similarity searching, including 2D and 3D fragment descriptors, physicochemical properties, and topological indices. These representations are crucial for differentiating molecules and calculating their similarity. The paper also highlights the importance of choosing appropriate descriptors and encoding methods for effective similarity searching. The use of descriptors such as atom pairs, topological torsions, and 3D shape descriptors is discussed, along with their effectiveness in identifying similar molecules. The paper concludes by emphasizing the need for further research and development in similarity searching, particularly in the areas of descriptor selection, encoding methods, and the application of similarity measures in chemical databases. The discussion highlights the importance of using appropriate descriptors and encoding methods to ensure accurate and efficient similarity searching in chemical databases.

Chemical Similarity Searching

1998 | Peter Willett*, John M. Barnard and Geoffrey M. Downs