June 14, 2024 | Rita González-Márquez, Luca Schmidt, Benjamin M. Schmidt, Philipp Berens, Dmitry Kobak
A 2D map of biomedical papers based on abstract texts has been developed, covering 21 million English articles from PubMed. The map highlights issues such as gender bias and fraudulent research. The map was created using the large language model PubMedBERT combined with t-SNE. It allows exploration of the biomedical literature landscape, revealing insights into the evolution of research topics, the adoption of machine learning, gender imbalance, and the distribution of retracted papers. The map is publicly available as an interactive website, enabling further analysis and research. The study demonstrates that 2D visualizations can uncover aspects of the data that other analysis methods may miss. The map shows that the COVID-19 literature is uniquely isolated, neuroscience has evolved into distinct subfields, machine learning is increasingly used in biomedical research, gender imbalance persists in academic authorship, and retracted papers are concentrated in specific areas. The map also reveals temporal patterns and heterogeneity within disciplines. The study provides a detailed visualization of the biomedical literature landscape, highlighting the importance of such tools for understanding the evolution and structure of biomedical research.A 2D map of biomedical papers based on abstract texts has been developed, covering 21 million English articles from PubMed. The map highlights issues such as gender bias and fraudulent research. The map was created using the large language model PubMedBERT combined with t-SNE. It allows exploration of the biomedical literature landscape, revealing insights into the evolution of research topics, the adoption of machine learning, gender imbalance, and the distribution of retracted papers. The map is publicly available as an interactive website, enabling further analysis and research. The study demonstrates that 2D visualizations can uncover aspects of the data that other analysis methods may miss. The map shows that the COVID-19 literature is uniquely isolated, neuroscience has evolved into distinct subfields, machine learning is increasingly used in biomedical research, gender imbalance persists in academic authorship, and retracted papers are concentrated in specific areas. The map also reveals temporal patterns and heterogeneity within disciplines. The study provides a detailed visualization of the biomedical literature landscape, highlighting the importance of such tools for understanding the evolution and structure of biomedical research.