2018 | Carrie A. Davis, Benjamin C. Hitz, Cricket A. Sloan, Esther T. Chan, Jean M. Davidson, Idan Gabdank, Jason A. Hilton, Kriti Jain, Ulugbek K. Baymuradov, Aditi K. Narayanan, Kathrina C. Onate, Keenan Graham, Stuart R. Miyasato, Timothy R. Dreszer, J. Seth Strattan, Otto Jolanki, Forrest Y. Tanaka and J. Michael Cherry
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) has developed the ENCODE Portal as a central source for data and metadata generated by the ENCODE Consortium. The Portal is designed to provide publicly accessible experimental protocols, analytical procedures, and data through a coherent, web-based search and download interface. It also serves as a source for carefully curated metadata that records the provenance of the data and justifies its biological interpretation. Since its initial release in 2013, the Portal has been regularly updated to better reflect these design principles. The Portal now includes metadata and data from related genomics projects such as the Roadmap Epigenome Project, modENCODE, and modERN. It now makes available over 13,000 datasets and their accompanying metadata, accessible at https://www.encodeproject.org/.
The ENCODE Portal provides persistent identifiers for data, allowing access to even revoked or archived data. New views include a data matrix and report table, which allow users to explore data by project, organism, sample type, and assay. Metadata is captured in JSON format and can be downloaded in TSV format or accessed via the REST API. New data visualization tools, including the BioDalliance Browser, enable users to visualize data in the UCSC, ENSEMBL, and BioDalliance browsers.
The ENCODE DCC has developed standardized pipelines for data processing, which are maintained in the DCC GitHub repository and implemented on the DNAnexus cloud platform. These pipelines allow for consistent and centralized data processing. The DCC also reprocesses data using updated reference genome assemblies and incorporates improved data generated with newer or refined protocols. The DCC also manages data status, including 'Submitted', 'Released', 'Revoked', 'Archived', and 'Replaced' statuses.
The ENCODE Portal includes automated audits and badges to assess the quality of datasets. These audits evaluate the completion and accuracy of metadata and the results of quality metrics from processing pipelines. The Portal also provides antibody characterizations, which help assess the specificity of antibodies used in ChIP-Seq and eCLIP experiments. The DCC continues to focus on developing methods to capture and convey key metadata properties underlying the creation of data in the Portal. Future efforts will focus on improving user experience, automating data deposition, pipeline automation, and supporting the integrated Encyclopedia of DNA Elements.The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) has developed the ENCODE Portal as a central source for data and metadata generated by the ENCODE Consortium. The Portal is designed to provide publicly accessible experimental protocols, analytical procedures, and data through a coherent, web-based search and download interface. It also serves as a source for carefully curated metadata that records the provenance of the data and justifies its biological interpretation. Since its initial release in 2013, the Portal has been regularly updated to better reflect these design principles. The Portal now includes metadata and data from related genomics projects such as the Roadmap Epigenome Project, modENCODE, and modERN. It now makes available over 13,000 datasets and their accompanying metadata, accessible at https://www.encodeproject.org/.
The ENCODE Portal provides persistent identifiers for data, allowing access to even revoked or archived data. New views include a data matrix and report table, which allow users to explore data by project, organism, sample type, and assay. Metadata is captured in JSON format and can be downloaded in TSV format or accessed via the REST API. New data visualization tools, including the BioDalliance Browser, enable users to visualize data in the UCSC, ENSEMBL, and BioDalliance browsers.
The ENCODE DCC has developed standardized pipelines for data processing, which are maintained in the DCC GitHub repository and implemented on the DNAnexus cloud platform. These pipelines allow for consistent and centralized data processing. The DCC also reprocesses data using updated reference genome assemblies and incorporates improved data generated with newer or refined protocols. The DCC also manages data status, including 'Submitted', 'Released', 'Revoked', 'Archived', and 'Replaced' statuses.
The ENCODE Portal includes automated audits and badges to assess the quality of datasets. These audits evaluate the completion and accuracy of metadata and the results of quality metrics from processing pipelines. The Portal also provides antibody characterizations, which help assess the specificity of antibodies used in ChIP-Seq and eCLIP experiments. The DCC continues to focus on developing methods to capture and convey key metadata properties underlying the creation of data in the Portal. Future efforts will focus on improving user experience, automating data deposition, pipeline automation, and supporting the integrated Encyclopedia of DNA Elements.