Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus

Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus

26 Feb 2024 | Jack H. Culbert, Anne Hobert, Najko Jahn, Nick Haupka, Marion Schmidt, Paul Donner, Philipp Mayr
This paper evaluates the reference and metadata coverage of OpenAlex, a recently launched open-source bibliometric database, in comparison to established proprietary databases Web of Science (WoS) and Scopus. The study aims to assess the reliability and suitability of OpenAlex for bibliometric research. Key findings include: 1. **Reference Coverage**: When restricted to a cleaned dataset of 16,788,282 recent publications shared by all three databases, OpenAlex shows comparable average source reference numbers and internal coverage to both WoS and Scopus. However, when comparing other core metadata, such as abstracts, ORCID identifiers, and open access status, OpenAlex captures more ORCID identifiers but fewer abstracts and similar numbers of open access information per article compared to WoS and Scopus. 2. **Metadata Coverage**: - **Abstracts**: WoS and Scopus have a higher overall availability of abstracts (92%) compared to OpenAlex (87%). - **ORCID**: OpenAlex has a higher coverage of ORCID identifiers (92%) compared to WoS (16%) and Scopus (32%). However, this high coverage is partly due to OpenAlex's generous author disambiguation, which may lead to issues with Chinese names. - **Open Access**: The distribution of open access information is more linear in OpenAlex, suggesting an indexing lag in WoS and Scopus. 3. **Discrepancies**: The study also identifies discrepancies in reference counts between reported values and actual reference numbers in both WoS and OpenAlex, indicating data errors in both databases. 4. **Limitations**: The study lacks ground truth for reference counts and does not analyze the accuracy of reference matching algorithms. Future research could extend these analyses to specific disciplines and further investigate duplicate DOI records. Overall, the study suggests that while OpenAlex is a promising alternative to proprietary databases, it has some limitations in metadata coverage and data accuracy, particularly in handling non-source references and author disambiguation.This paper evaluates the reference and metadata coverage of OpenAlex, a recently launched open-source bibliometric database, in comparison to established proprietary databases Web of Science (WoS) and Scopus. The study aims to assess the reliability and suitability of OpenAlex for bibliometric research. Key findings include: 1. **Reference Coverage**: When restricted to a cleaned dataset of 16,788,282 recent publications shared by all three databases, OpenAlex shows comparable average source reference numbers and internal coverage to both WoS and Scopus. However, when comparing other core metadata, such as abstracts, ORCID identifiers, and open access status, OpenAlex captures more ORCID identifiers but fewer abstracts and similar numbers of open access information per article compared to WoS and Scopus. 2. **Metadata Coverage**: - **Abstracts**: WoS and Scopus have a higher overall availability of abstracts (92%) compared to OpenAlex (87%). - **ORCID**: OpenAlex has a higher coverage of ORCID identifiers (92%) compared to WoS (16%) and Scopus (32%). However, this high coverage is partly due to OpenAlex's generous author disambiguation, which may lead to issues with Chinese names. - **Open Access**: The distribution of open access information is more linear in OpenAlex, suggesting an indexing lag in WoS and Scopus. 3. **Discrepancies**: The study also identifies discrepancies in reference counts between reported values and actual reference numbers in both WoS and OpenAlex, indicating data errors in both databases. 4. **Limitations**: The study lacks ground truth for reference counts and does not analyze the accuracy of reference matching algorithms. Future research could extend these analyses to specific disciplines and further investigate duplicate DOI records. Overall, the study suggests that while OpenAlex is a promising alternative to proprietary databases, it has some limitations in metadata coverage and data accuracy, particularly in handling non-source references and author disambiguation.
Reach us at info@study.space
[slides and audio] Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus