About




Summary of service



The identification of orthologs is an important cornerstone for many comparative, evolutionary and functional genomics analyses. Yet, the true evolutionary history of genes is generally unknown. Because of the wide range of possible applications and taxonomic interests, benchmarking of orthology predictions remains a difficult challenge for methods developers and users.

This community developed web-service aims at simplifying and standardizing orthology benchmarking. And for the users, the benchmarks provide a way to identify the most effective methods for the problem at hand.

The associated paper to the service has been published open access in Nature Methods. If you use the orthology benchmark service, please consider citing it.


How does it work?



An orthology method developer should first infer the orthologs using the reference proteome dataset. The service will assess the induced pairwise orthologous relations. Therefore the method developer must provide the predictions in a format from which the pairwise orthologous predictions can be extracted in an unambiguous way.

Once the predictions have been uploaded, the service ensures that only predictions among valid reference proteomes are provided. Benchmarks are then selected and run in parallel. Finally, statistical analyses of method accuracy are performed on each benchmark dataset. The raw data and summary results in form of precision-recall curves are stored and provided to the submitter.


Protein Reference Dataset




Orthology inference is most often based on molecular protein sequences. For a comparison of different orthology prediction methods, a common set of sequences must be established. Therefore, only identical proteins are mapped to each other.

To make comparisons of method easier, the orthology research community has agreed in 2009 to established a common QfO reference proteome dataset. Currently we are using the reference proteomes from 2011.

NEW: We are experimenting with moving to the QfO reference proteome dataset of 2017. Be among the first to try it out and upload predictions for this new dataset.

Dataset sources
The QfO Reference Proteomes (version 5; 2011-04) is available in The QfO Reference Proteomes version 2017-04 can now be used for benchmarking as well. The dataset is available from EBI.

Formats



Our benchmarks assess orthology on the bases of protein pairs. Therefore, we ask our users to upload their prediction in a format from which we can extract pairwise relations in an unambiguous manner: We support

  • simple simple text file with two tab-separated columns of orthologous protein represented by their ids
  • orthoxml v0.3 , which allows for nested orthologGroups and paralogGroups.
For both formats, we expect you to submit your predictions in a single file. This file might also be compressed by gzip or bzip2. In that case, it needs to have the proper filename extention (.gz or .bz2).


How to cite the orthology benchmark service



Adrian M Altenhoff, Brigitte Boeckmann, Salvador Capella-Gutierrez, Daniel A Dalquen, Todd DeLuca, Kristoffer Forslund, Jaime Huerta-Cepas, Benjamin Linard, Cécile Pereira, Leszek P Pryszcz, Fabian Schreiber, Alan Sousa da Silva, Damian Szklarczyk, Clément-Marie Train, Peer Bork, Odile Lecompte, Christian von Mering, Ioannis Xenarios, Kimmen Sjölander, Lars Juhl Jensen, Maria J Martin, Matthieu Muffato, Toni Gabaldón, Suzanna E Lewis, Paul D Thomas, Erik Sonnhammer, Christophe Dessimoz.
Standardized benchmarking in the quest for orthologs.
Nature Methods, 2016, 13, 425-430 Open Access Full text