February 23, 2012 | Purvesh Khatri, Marina Sirota, Atul J. Butte
Pathway analysis has become a key method for understanding the biological functions of differentially expressed genes and proteins. This review discusses the evolution of knowledge base-driven pathway analysis over the past decade, divided into three generations. Each generation addresses the limitations of the previous one, with the first generation using over-representation analysis (ORA), the second using functional class scoring (FCS), and the third incorporating pathway topology (PT). ORA evaluates gene presence in pathways but has limitations such as ignoring gene expression levels and assuming independence between genes. FCS improves upon ORA by considering gene expression patterns and interactions within pathways. PT-based methods further enhance analysis by incorporating pathway structure and interactions.
Despite these advancements, challenges remain. Annotation challenges include low-resolution knowledge bases, incomplete or inaccurate annotations, and a lack of condition- and cell-specific information. Methodological challenges involve the inability to model dynamic biological systems and the lack of integration of pathway interactions. The review emphasizes the need for more detailed, high-resolution annotations and better methods to account for dynamic responses and inter-pathway interactions. Future pathway analysis methods must leverage technological advances to improve specificity, sensitivity, and relevance. The community must collaborate to address these challenges and develop the next generation of pathway analysis tools that can effectively utilize high-throughput technologies to better understand complex biological systems.Pathway analysis has become a key method for understanding the biological functions of differentially expressed genes and proteins. This review discusses the evolution of knowledge base-driven pathway analysis over the past decade, divided into three generations. Each generation addresses the limitations of the previous one, with the first generation using over-representation analysis (ORA), the second using functional class scoring (FCS), and the third incorporating pathway topology (PT). ORA evaluates gene presence in pathways but has limitations such as ignoring gene expression levels and assuming independence between genes. FCS improves upon ORA by considering gene expression patterns and interactions within pathways. PT-based methods further enhance analysis by incorporating pathway structure and interactions.
Despite these advancements, challenges remain. Annotation challenges include low-resolution knowledge bases, incomplete or inaccurate annotations, and a lack of condition- and cell-specific information. Methodological challenges involve the inability to model dynamic biological systems and the lack of integration of pathway interactions. The review emphasizes the need for more detailed, high-resolution annotations and better methods to account for dynamic responses and inter-pathway interactions. Future pathway analysis methods must leverage technological advances to improve specificity, sensitivity, and relevance. The community must collaborate to address these challenges and develop the next generation of pathway analysis tools that can effectively utilize high-throughput technologies to better understand complex biological systems.