Next generation sequencing has become commonplace, and we are now entering an age where whole genome sequences are a “dime a dozen.” Thousands of different eukaryotic species’ genomes have been sequenced to date, with certain species, such as humans, sequenced tens of thousands of times over. But this is just the tip of the iceberg! For example, the <ext-link ext-link-type="uri" xlink:href="https://www.earthbiogenome.org/">Earth BioGenome Project</ext-link> aims to sequence all 1.5 million known eukaryotic species in 10 years. The ‘-omics’ data presents layers of complexity in the form of gene expression, regulation, network interaction, epigenetics, structural, functional, and comparative genomics and more. With all this data comes a wealth of potential biological knowledge, but there must be efficient and smart ways to make sense out of all these genes and genomes. One fundamental way to relate genomes is by orthology, or the relationship between genes of different species which originated from a single gene in the common ancestor of those species. It is commonly contrasted with paralogy, or the relationship between genes which originated by duplication. By tracing the evolutionary history of genes and their relationships between each other, we can start to understand the complexity of the biological processes underlying the evolution of life forms. Indeed, there are many applications of orthology, such as: <list list-type="bullet"> <list-item> Prediction of gene function for uncharacterized proteins. </list-item> <list-item> Elucidating gene losses, duplications, or gains (i.e. taxonomically restricted genes) to study evolution of gene families and species. </list-item> <list-item> Finding the best model systems for study based on a particular physiological process. </list-item> <list-item> Phylogenetic profiling, or correlating ortholog presence or absence among many species to detect biologically related processes. </list-item> <list-item> Studying the positional conservation of genes, which can aid in genome assembly, homology detection, and provide insight into structural evolution of chromosomes. </list-item> <list-item> Phylogenomics, among others. </list-item> </list> One particular tool for inferring orthologs is OMA (Orthologous MAtrix), which is a method and database for the inference of orthologs among complete genomes <xref ref-type="bibr" rid="ref-1">1</xref> . Covering over 2000 species from a broad phylogenetic range, some distinctive features of the OMA browser include a feature-rich web interface, availability of data in a wide range of formats and interfaces, and twice yearly update schedule. As part of the <ext-link ext-link-type="uri" xlink:href="https://lab.dessimoz.org/">Dessimoz lab</ext-link>, I have been working on the OMA project for the past 6 years, as well as being a user of the online browser and standalone software <xref ref-type="bibr" rid="ref-2">2</xref> . There are many orthology inference methods, software, and databases at one’s disposal. From working the past decade in computational biology, I have found that the bottlenecks for effectively using many bioinformatics tools are: 1) choosing the appropriate tool to fit your needs; 2) understanding the relevant information in the black box of how the tool works; and 3) efficiently running the tool and understanding the output. Thus, the aim of this F1000 Research Collection is to provide a resource for users of OMA to help them with their analysis needs. I hope to make it as hassle-free as possible to use OMA and the many supplementary analysis tools currently provided. In this direction, we have written several Tutorials, Guides, and Protocols on how to use OMA to get the most out of one’s biological data. So far, this collection contains four papers: <list list-type="bullet"> <list-item> Identifying orthologs with OMA: A primer ( <italic toggle="yes">Software Tool article</italic>) <xref ref-type="bibr" rid="ref-3">3</xref> </list-item> <list-item> How to build phylogenetic species trees with OMA ( <italic toggle="yes">Method article</italic>) <xref ref-type="bibr" rid="ref-4">4</xref> </list-item> <list-item> A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL ( <italic toggle="yes">Method article</italic>) <xref ref-type="bibr" rid="ref-5">5</xref> </list-item> <list-item> Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python ( <italic toggle="yes">Software Tool article</italic>) <xref ref-type="bibr" rid="ref-6">6</xref> </list-item> </list> All the aforementioned protocols use publicly available software, and we provide scripts, code snippets, practical examples, and plenty of explanations in order to facilitate the use of OMA in user analyses. We have several more tutorials planned for this collection and aim to continually add more resources to this collection in order to help our users. Furthermore, with the goal of providing real-world examples on how OMA can be used, any research, commentaries, conference posters/slides, or other published work that uses OMA are welcome in this collection. Hopefully, the OMA Collection will prove to be a valuable resource for making the most of genomics data. </sec> <sec> <title>Data availability

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.24904.1

Editorial

Articles

Making the most of genomic data with OMA

[version 1; peer review: not peer reviewed]

Glover

Natasha M.

Conceptualization Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0003-1811-4340 a 1 2 3 1Department of Computational Biology, University of Lausanne, Lausanne, 1015, Switzerland 2Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland 3Center for Integrative Genomics, Lausanne, 1015, Switzerland

a natasha.glover@unil.ch

No competing interests were disclosed.

1 7 2020

2020

665

24 6 2020

2020

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In this editorial, I describe OMA Collection, a resource for users of Orthologous Matrix. In this collection, we provide tutorials and protocols on how to leverage the tools provided by OMA to analyse your data. Here, I explain the motivation for this collection and its published works thus far.

OMA Orthologous Matrix collection orthologs

Swiss Institute of Bioinformatics

Supported by Service and Infrastructure grant from the Swiss Institute of Bioinformatics.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Next generation sequencing has become commonplace, and we are now entering an age where whole genome sequences are a “dime a dozen.” Thousands of different eukaryotic species’ genomes have been sequenced to date, with certain species, such as humans, sequenced tens of thousands of times over. But this is just the tip of the iceberg! For example, the <ext-link ext-link-type="uri" xlink:href="https://www.earthbiogenome.org/">Earth BioGenome Project</ext-link> aims to sequence all 1.5 million known eukaryotic species in 10 years. The ‘-omics’ data presents layers of complexity in the form of gene expression, regulation, network interaction, epigenetics, structural, functional, and comparative genomics and more. With all this data comes a wealth of potential biological knowledge, but there must be efficient and smart ways to make sense out of all these genes and genomes. One fundamental way to relate genomes is by orthology, or the relationship between genes of different species which originated from a single gene in the common ancestor of those species. It is commonly contrasted with paralogy, or the relationship between genes which originated by duplication. By tracing the evolutionary history of genes and their relationships between each other, we can start to understand the complexity of the biological processes underlying the evolution of life forms. Indeed, there are many applications of orthology, such as: <list list-type="bullet"> <list-item> Prediction of gene function for uncharacterized proteins. </list-item> <list-item> Elucidating gene losses, duplications, or gains (i.e. taxonomically restricted genes) to study evolution of gene families and species. </list-item> <list-item> Finding the best model systems for study based on a particular physiological process. </list-item> <list-item> Phylogenetic profiling, or correlating ortholog presence or absence among many species to detect biologically related processes. </list-item> <list-item> Studying the positional conservation of genes, which can aid in genome assembly, homology detection, and provide insight into structural evolution of chromosomes. </list-item> <list-item> Phylogenomics, among others. </list-item> </list> One particular tool for inferring orthologs is OMA (Orthologous MAtrix), which is a method and database for the inference of orthologs among complete genomes <xref ref-type="bibr" rid="ref-1">1</xref> . Covering over 2000 species from a broad phylogenetic range, some distinctive features of the OMA browser include a feature-rich web interface, availability of data in a wide range of formats and interfaces, and twice yearly update schedule. As part of the <ext-link ext-link-type="uri" xlink:href="https://lab.dessimoz.org/">Dessimoz lab</ext-link>, I have been working on the OMA project for the past 6 years, as well as being a user of the online browser and standalone software <xref ref-type="bibr" rid="ref-2">2</xref> . There are many orthology inference methods, software, and databases at one’s disposal. From working the past decade in computational biology, I have found that the bottlenecks for effectively using many bioinformatics tools are: 1) choosing the appropriate tool to fit your needs; 2) understanding the relevant information in the black box of how the tool works; and 3) efficiently running the tool and understanding the output. Thus, the aim of this F1000 Research Collection is to provide a resource for users of OMA to help them with their analysis needs. I hope to make it as hassle-free as possible to use OMA and the many supplementary analysis tools currently provided. In this direction, we have written several Tutorials, Guides, and Protocols on how to use OMA to get the most out of one’s biological data. So far, this collection contains four papers: <list list-type="bullet"> <list-item> Identifying orthologs with OMA: A primer ( <italic toggle="yes">Software Tool article</italic>) <xref ref-type="bibr" rid="ref-3">3</xref> </list-item> <list-item> How to build phylogenetic species trees with OMA ( <italic toggle="yes">Method article</italic>) <xref ref-type="bibr" rid="ref-4">4</xref> </list-item> <list-item> A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL ( <italic toggle="yes">Method article</italic>) <xref ref-type="bibr" rid="ref-5">5</xref> </list-item> <list-item> Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python ( <italic toggle="yes">Software Tool article</italic>) <xref ref-type="bibr" rid="ref-6">6</xref> </list-item> </list> All the aforementioned protocols use publicly available software, and we provide scripts, code snippets, practical examples, and plenty of explanations in order to facilitate the use of OMA in user analyses. We have several more tutorials planned for this collection and aim to continually add more resources to this collection in order to help our users. Furthermore, with the goal of providing real-world examples on how OMA can be used, any research, commentaries, conference posters/slides, or other published work that uses OMA are welcome in this collection. Hopefully, the OMA Collection will prove to be a valuable resource for making the most of genomics data. </sec> <sec> <title>Data availability

No data is associated with this article.

Altenhoff

Glover

Train

: The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res. 2018;46(D1):D477–85. 29106550

10.1093/nar/gkx1019

5753216

Altenhoff

Levy

Zarowiecki

: OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Res. 2019;29(7):1152–63. 31235654

10.1101/gr.243212.118

6633268

Zahn-Zabal

Dessimoz

Glover

: Identifying orthologs with OMA: A primer [version 1; peer review: 2 approved]. F1000Res. 2020;9:27. 32089838

10.12688/f1000research.21508.1

7014581

Dylus

Nevers

Altenhoff

: How to build phylogenetic species trees with OMA [version 1; peer review: awaiting peer review]. F1000Res. 2020;9:511. 10.12688/f1000research.23790.1

Sima

Dessimoz

Stockinger

: A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL [version 1; peer review: 2 approved with reservations]. F1000Res. 2019;8:1822. 10.12688/f1000research.21027.1

Kaleb

Vesztrocy

Altenhoff

: Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python [version 2; peer review: 2 approved]. F1000Res. 2019;8:42. 31001419

10.12688/f1000research.17548.2

6464060