January 21, 2014 | Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Peseta, Antony F. Quinn, Amaia Sangrador-Vegas, Maxim Scheremetjew, Siew-Yit Yong, Rodrigo Lopez, Sarah Hunter
The article introduces InterProScan 5, a Java-based software for protein function prediction that has been reimplemented to improve scalability and usability. The new version supports large-scale genome-scale analysis, allowing the processing of millions of sequences on multiprocessor machines and clusters. Key improvements include enhanced parallelization at the sequence, application, and binary levels, a modular architecture, and the integration of new analysis algorithms such as Phobius. InterProScan 5 also introduces new output formats (GFF3, SVG), web services for precomputed matches, and the ability to infer pathway memberships. The software is freely available for download from the EMBL-EBI FTP site and the source code is hosted on Google Code. The article highlights the benefits of the new architecture, including improved installation and configuration, and discusses the system's scalability and robustness.The article introduces InterProScan 5, a Java-based software for protein function prediction that has been reimplemented to improve scalability and usability. The new version supports large-scale genome-scale analysis, allowing the processing of millions of sequences on multiprocessor machines and clusters. Key improvements include enhanced parallelization at the sequence, application, and binary levels, a modular architecture, and the integration of new analysis algorithms such as Phobius. InterProScan 5 also introduces new output formats (GFF3, SVG), web services for precomputed matches, and the ability to infer pathway memberships. The software is freely available for download from the EMBL-EBI FTP site and the source code is hosted on Google Code. The article highlights the benefits of the new architecture, including improved installation and configuration, and discusses the system's scalability and robustness.