Machine learning algorithms: their applications in plant omics and agronomic traits’ improvement

Itunuoluwa Isewon; Oluwabukola Apata; Fesobi Oluwamuyiwa; Olufemi Aromolaran; Jelili Oyelade

doi:10.12688/f1000research.125425.1

Home Browse Machine learning algorithms: their applications in plant omics and...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Review

Machine learning algorithms: their applications in plant omics and agronomic traits’ improvement

[version 1; peer review: 3 not approved]

Itunuoluwa Isewon ^1-3, Oluwabukola Apata^1,2, Fesobi Oluwamuyiwa^1,2, Olufemi Aromolaran^1,3, Jelili Oyelade^1-3

Itunuoluwa Isewon ^1-3, Oluwabukola Apata^1,2, [...] Fesobi Oluwamuyiwa^1,2, Olufemi Aromolaran^1,3, Jelili Oyelade^1-3

PUBLISHED 04 Nov 2022

Author details Author details

¹ Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, 112233, Nigeria
² Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, Nigeria
³ Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria

Itunuoluwa Isewon
Roles: Conceptualization, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Writing – Original Draft Preparation, Writing – Review & Editing

Oluwabukola Apata
Roles: Formal Analysis, Investigation, Methodology, Visualization, Writing – Original Draft Preparation

Fesobi Oluwamuyiwa
Roles: Data Curation, Methodology, Software, Writing – Original Draft Preparation

Olufemi Aromolaran
Roles: Data Curation, Formal Analysis, Software, Writing – Review & Editing

Jelili Oyelade
Roles: Conceptualization, Funding Acquisition, Resources, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Plant Science gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Plant Computational and Quantitative Genomics collection.

Abstract

Agronomic traits of plants especially those of economic or aesthetic importance are threatened by climatic and environmental factors such as climate change, biotic, and abiotic stresses. These threats are now being mitigated through the analyses of omics data like genomics, transcriptomics, proteomics, metabolomics, and phenomics. The emergence of high-throughput omics technology has led to an avalanche of plant omics data. Plant research demands novel analytical paradigms to extract and harness large plant omics data for plant improvement effectively and efficiently. Machine learning algorithms are well-suited analytical and computational approaches for the integrative analysis of large unstructured, heterogeneous datasets. This study presents an overview of omics approaches to improve plant agronomic traits and crucial curated plant genomic data sources. Furthermore, we summarize machine learning algorithms and software tools/programming packages used in plant omics research. Lastly, we discuss advancements in machine learning algorithms' applications in improving agronomic traits of economically important plants. Extensive application of machine learning would advance plant omics studies. These advancements would consequently help agricultural scientists improve economically important plants’ quality, yield, and tolerance against abiotic and biotic stresses and other plant health-threatening issues.

Keywords

Agronomic traits, machine learning, multi-omics, plant improvement

Corresponding author: Itunuoluwa Isewon

Competing interests: No competing interests were disclosed.

Grant information: This work was supported by funding from the World Bank awarded to Covenant Applied Informatics, Communication Africa Centre of Excellence (CApIC-ACE) through the ACE Impact Project (2019 – 2024) and Covenant University Center for Research, Innovation and Discovery (CUCRID)
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2022 Isewon I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Isewon I, Apata O, Oluwamuyiwa F et al. Machine learning algorithms: their applications in plant omics and agronomic traits’ improvement [version 1; peer review: 3 not approved]. F1000Research 2022, 11:1256 (https://doi.org/10.12688/f1000research.125425.1) First published: 04 Nov 2022, 11:1256 (https://doi.org/10.12688/f1000research.125425.1) Latest published: 04 Nov 2022, 11:1256 (https://doi.org/10.12688/f1000research.125425.1)

1. Introduction

The global agricultural system is threatened by ecological events such as climate change and other environmental stresses.¹^,² These events affect the yield, stability, and quality of production of economically important plants, i.e., medicinal plants, fruit crops, food crops, cereal or grain crops, legume seed crops or pulses, etc.¹^,³ These challenges are addressed by omics approaches via numerous unconventional improvement methodologies.⁴^,⁵ Omics approaches involve analysis of the constituents of genome sequence and other macromolecules generated from encoded genomic information. The utilization of omics-derived knowledge and technologies in plant improvement strategies is limited and difficult. Other drawbacks of omics technologies include a lack of data integration and effective phenotype-genotype correlation strategies. As a result, promoting the integration of computational biology and plant genomics to assist plant development is critical.⁶^,⁷ This paper gives an overview of omics methodologies for improving plant agronomic traits as well as essential curated plant genomic data sources. We discuss the bioinformatics software, tools, and packages that are utilized in omics-based plant improvement research. We also dissect how machine learning algorithms has been used to improve agronomic features of commercially significant plants, their major contributions and future outlook in plant omics and agronomics.

1.1 Plant genome sequences and bioinformatics resources

Genome sequencing was made possible with the advent of sequencing technologies.⁸^,⁹ Complete sequencing of a plant genome was first demonstrated for a model plant named Arabidopsis (Arabidopsis thaliana)¹⁰^,¹¹ and afterward for rice (Oryza sativa). Subsequently, the whole genome of over 250 species in the plant kingdom have been sequenced: bryophytes, pteridophytes, gymnosperms, and angiosperms¹²^,¹³ (Figure 1). Angiosperms account for 95% of the sequenced species, most of which are economically important plants or their wild relatives (Figure 2). Food crops like rice, wheat, beans, oat, maize, and soybean are among the sequenced plants, as are ornamental plants like orchid and hibiscus, industrial plants like oilseed, hemp, and spice/herbs like garlic, ginger, turmeric, moringa, artemisia, and neem, which are known for their high therapeutic value.

Figure 1. Published plant genome sequences from 2000 to date.

Most sequenced plants are angiosperms and are subdivided into three groups. Most of the sequenced angiosperms fall under rosids and asterids clades. Other sequenced angiosperms clades are grouped here as other dicots.

Figure 2. Percentage of sequenced plants with their common names.

94% of sequenced plants are angiosperms consisting of both monocots and dicots. Percentage of other plants are 1%, 2%, 3% for Pteridophytes, Bryophytes, and Gymnosperms, respectively.

There have been a variety of databases created to access plant genome datasets.¹⁴^,¹¹ The model plant A. thaliana genome database launched in 2001 was the premier plant genome database.¹⁰^,¹⁵ Subsequently, many databases and resources have been developed for plant genomes. The earliest genome databases were essentially archives of genome sequence data. These databases have expanded into genome portals/hubs that combine different genomic data and web servers that offer online genomics analysis. The availability of annotated plant genome data has led to many discoveries, including genome organization and gene function.¹⁶ These discoveries elucidate the complexity, evolution, and dynamics of plant genomes, contributing to a deeper understanding of plant biology.⁶^,¹⁷ Available genomic information includes cis-elements, gene expression data, protein interactome, transcriptional and post-transcriptional data. These genome databases exist as single species and comprehensive databases as shown in Table 1.

Table 1. General plant genomics databases and tools.

Database	Description	Website
AgBase	A unified resource for functional analysis in agriculture	http://www.agbase.msstate.edu/
Ensembl Plants	A genome-centric portal for plant species	http://plants.ensembl.org
AutoSNPdb	An annotated single nucleotide polymorphism database for crop plants	http://autosnpdb.qfab.org.au/
BarleyBase	An expression profiling database for plant genomics	http://www.barleybase.org/
CR-EST	A resource for crop ESTs Search for sequence, classification, clustering and annotation data of crop EST projects	http://pgrc.ipk-gatersleben.de/cr-est/
CSRDB	A small RNA integrated database and browser resource for cereals	http://sundarlab.ucdavis.edu/smrnas/
ChromDB	The Chromatin Database	http://www.chromdb.org/
DRASTICx97INSIGHTS	Querying information in a plant gene expression database	http://www.drastic.org.uk/
FLAGdb++	A Database for the Functional Analysis of the Arabidopsis Genome	http://urgv.evry.inra.fr/projects/FLAGdb++/HTML/index.shtml
GCP	The Generation Challenge Programme	http://www.generationcp.org/
GGT	Graphical GenoTypes - Software for visualization and analysis of genetic data	http://www.plantbreeding.wur.nl/
GabiPD	Integrative Plant Omics Database	http://www.gabipd.org/
GeneSeqer@PlantGDB	Gene structure prediction in plant genomes - Predict gene structures of plant genomes	http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/PlantGDBgs.cgi
GrainGenes	The genome database for small-grain crops	http://wheat.pw.usda.gov/index.shtml
Gramene	A resource for comparative grass genomics	http://www.gramene.org/
MIPS	Analysis and annotation of genome information	http://mips.gsf.de/
MetaCrop	A detailed database of crop plant metabolism	http://metacrop.ipk-gatersleben.de/
NIASGBdb	National Institute of Agrobiological Sciences Gene Bank DataBase	http://www.gene.affrc.go.jp/databases_en.php
P3DB	Plant Protein Phosphorylation Database	http://digbio.missouri.edu/p3db/
PHYTOPROT	A Database of Clusters of Plant Proteins	https://urgi.versailles.inra.fr/phytoprot/
PIP	A database of potential intron polymorphism markers	http://ibi.zju.edu.cn/pgl/pip/
PLACE	Plant cis-acting regulatory DNA elements	http://www.dna.affrc.go.jp/PLACE/
PLANT-PIs --	A database for protease inhibitors and their genes in higher plants	http://plantpis.ba.itb.cnr.it/
PLecDom	Plant Lectin Domains server	http://www.nipgr.res.in/plecdom.html
PMRD	Plant MicroRNA Database	http://bioinformatics.cau.edu.cn/PMRD/
PODB	The Plant Organelles Database	http://podb.nibb.ac.jp/Organellome
POGs/PlantRBP	A resource for comparative genomics in plants	http://cas-pogs.uoregon.edu/#/
PREP Suite	Predictive RNA Editor for Plants	http://prep.unl.edu/
PRGDB	Plant Resistance Genes DataBase	http://prgdb.cbm.fvg.it/
PathoPlantxae	A platform for microarray expression data to analyze co-regulated genes involved in plant defense responses	http://www.pathoplant.de/
Phytome	A platform for plant comparative genomics	http://www.phytome.org/
Plant MPSS databases	Signature-based transcriptional resources for analyses of mRNA and small RNA	http://mpss.udel.edu/
Plant snoRNA database	Search for comprehensive information on small nucleolar RNAs in plants	http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/home
PlantCARE	A database of plant cis-acting elements	http://bioinformatics.psb.ugent.be/webtools/plantcare/html/
PlantGDB	Plant Genome Database and Analysis tools	http://www.plantgdb.org/
PlnTFDB	Database of Plant Transcription Factor	http://www.softberry.com/berry.phtml?topic=plantprom&group=data&subgroup=plantprom
PlantTribes	A gene and gene family resource for comparative genomics in plants	http://planttfdb.cbi.pku.edu.cn/
PlantTFDB	Plant Transcription Factor Databases	http://fgp.huck.psu.edu/tribe.html
PmiRKB	Plant MicroRNA Knowledge Base - Find information about plant microRNAs	http://bis.zju.edu.cn/pmirkb/
SALAD	Surveyed contained motif ALignment diagram and the Associating Dendrogram	http://salad.dna.affrc.go.jp/salad/en/
The Adaptive Evolution Database (TAED)	A phylogeny based comparative genomics tools	http://www.bioinfo.no/tools/TAED
The Plant DNA C-values Database	Search for information on plant DNA C-values and genome sizes.	http://data.kew.org/cvalues/homepage.html
The PlantsP Functional Genomics Database	Search for information on plant kinases and phosphatases	http://plantsp.sdsc.edu/
The PlantsT Functional Genomics Databases	Search for genes and proteins involved in plant membrane transportation	http://plantst.sdsc.edu/
The TIGR Plant Repeat Databases	A Collective Resource for the Identification of Repetitive Sequences in Plant	http://www.tigr.org/tdb/e2k1/plant.repeats/index.shtml
The TIGR Plant Transcript Assemblies database	Search for plant EST and cDNA sequences from this comprehensive collection	http://plantta.tigr.org/
TropGENE-DB	A Multi-Tropical Crop Information System	http://tropgenedb.cirad.fr/
UK CropNet	A collection of databases and bioinformatics resources for crop plant genomics	http://ukcrop.net/
openSputnik x97	A database for annotated expressed sequences tags information and comparative plant genomics analysis	http://sputnik.btk.fi/

1.2 Plant omics technologies

Plant studies involving the analysis of biological macromolecules are collectively termed plant omics. Omics is a broad field of study encompassing subfields like genomics, transcriptomics, proteomics, metabolomics, phenomics, glycomics, lipidomic, etc. Plant genomics involves studying the compositions, organizations, functions, and structures of genetic materials (DNA/RNA) and molecular genetic networks of interactions in the plant genome.¹⁷^,¹⁸ While genome structure and organization are studied in structural genomics,¹³^,¹⁹ functional genomics investigates the functions, interaction, and regulation of gene and gene products.⁵^,²⁰

Plant functional genomics is a goldmine in agronomic traits improvement. It incorporates other omics approaches like transcriptomics, proteomics, metabolomics, phenomics, etc.¹⁶^,²¹ (Figure 3). Other aspects of genomics are epigenomics, mutagenomics, and pangenomics.²² Epigenetic changes, such as histone modifications, small RNA and DNA methylations occurring at the genomic phase, are dissected within epigenomics. Mutagenomics is used to explore mutation events mediating modified genotype and phenotype in mutant species. Pangenomics studies the whole set of genomic sequences present in the entire population of a species. It also explores dispensable genomes that are individual specific or partially shared. Mutagenomics and pangenomics are new omics techniques in crop sciences.²²^,²³

Figure 3. Important plant omics branches and their major techniques.

Representation of major omics approaches in plant molecular studies and the methods utilized in conventional analysis of plant omics datasets.

Plant transcriptomics involves investigating the control of plant metabolite production processes at the RNA level. Transcript-level gene expression control regulates the whole plant's development and growth.²⁴ Plant proteomics explores the structural and functional features of proteins in a living organism. It encompasses studies on plants’ typical morphological and physiological properties.¹⁶^,²² The role of proteins in controlling the plant metabolic processes is also studied in plant proteomics, especially in medicinal plants.²⁵

Plant metabolomics involves profiling primary and secondary metabolites in plants.²⁶ Metabolic data are useful when developing metabolic correlation networks. These networks can aid comparative analysis of cellular compartments such as carbon and nitrogen transport and partitioning in plants.²⁷ In addition, the molecular and cellular regulation of different enzymatic processes can also be investigated.²⁵ Plant phenomics involves the systematic study of phenotypes such as plant composition, growth, and production analysis. This study can be conducted both in controlled environments and in the field. Field phenomics involves the measurement of phenotypes that exist under both cultivated and natural conditions. Studies in controlled environments involve glasshouses, growth chambers, and other systems where growth conditions can be manipulated.²⁸ These multi-omics approaches have emerged successful for plant research, including agronomic traits improvement over the last few decades. Agronomic traits are desirable plants’ genetic or phenotypic features, i.e., quality traits, disease resistance, pest resistance, insecticide tolerance, temperature, drought, and other adverse environmental factors tolerance traits. Quality traits encompass morphological features like plant height, seed weight; physiological features like chlorophyll content and photosynthetic rate²⁹; economic features like improved crop yield, processing, and storage; pharmaceutical and industrial features like the elimination of toxins and allergen, increased nutritional or dietary value and increased medicinal values.³⁰ Recent research suggests that when multi-omics technologies are integrated, they can be better harnessed to improve genetic development, crop breeding science, plant stress resistance, and other agronomic traits.²²^,²³

2. Major areas of application of omics technologies for agronomic traits improvement

2.1 Genomics-assisted pre-breeding

Genomics-assisted pre-breeding is a genetic manipulation strategy to improve agronomic traits of interest in plants at the DNA level.³¹ Genomics-assisted pre-breeding approaches positively contribute to the efficiency of diseases and climate-resilient crop development.³²^–³⁴ Crop breeding across the globe has relied on a series of phenotypic selection and crossing before the genomic era to generate superior crop genotypes.³⁵ Genome sequence availability has paved the way for identifying all genes and genetic variants associated with agronomics traits.³⁶^,³⁷ Besides, it has made it possible to assess genotype level changes incurred during breeding processes.³⁸ Plant breeders have utilized genomics and bioinformatics in gene-level resolution of agronomic variation using quantitative trait loci (QTL) mapping³⁹^–⁴¹ and genome-wide association studies (GWAS).⁴²^,⁴³ For instance, studies have recently been conducted to develop multiple stress-adaptable rice species that are disease and climate resilient using genomics-assisted breeding techniques such as quantitative trait locus (QTL), gene/markers-phenotype association and phenotype selection.⁴⁴^–⁴⁶ Pea breeding projects used genetic marker-trait associations to boost valued yield and market-preferred agronomic traits.⁴⁷^,⁴⁸ Miedaner et al.⁴³ used high-density genotype arrays and comprehensive phenotyping of the same species population across diverse conditions, locations, and seasons in genomic selection and population mapping to speed the breeding of disease resistance traits in maize, small-grain cereals, and wheat. Hu et al.³⁹ harnessed genomic selection (GS) and genome-wide genetic variants to prevent reiterated phenotyping in breeding cycles. These studies indicate new breeding techniques such as speed breeding and genomic selection to boost genetic and trait improvements. However, the lack of robust phenotypic data limits the efficient utilization of available genomic information and technologies in genomics-assisted breeding.

2.2 Evolution and crop diversity

Variation in gene content among individuals within the same species is caused by genetic variation ranging from single-nucleotide polymorphisms to substantial structural variants (SVs). Due to human and natural selection acts, this variation offers the raw material on which evolution occurs.⁴⁹^,⁵⁰ Deviation in agricultural plants’ phenotypic and genetic characteristics is referred to as crop diversity.³⁶ The understanding of crop diversity is enhanced by plant genomics at both species and gene levels.⁵¹ According to recent research, a single reference genome is insufficient to capture a species' entire genetic diversity landscape. Pan-genome analysis provides a platform for evaluating a species' genetic diversity by looking at its whole genomic repertoire. Pan-genomic studies have shed new light on the landscape of diversity and improvement of major crops such as Brachypodium distachyon,⁵² Brassica Spp.,⁵³^–⁵⁵ maize,⁵⁶ rice,⁵⁷^,⁵⁸ soybean,⁵⁹ wheat⁶⁰ etc. Evolution in plant diversity is correlated with and relatively predictable by heterogenous biotic and abiotic environmental stress induced by global climate change. These stresses, in turn, affect crop yield and crop-growing seasons.⁶¹ A study on natural plant populations shows that the organization and evolution of plant populations’ diversity at all genomic regions is nonrandom at the molecular and organismal level.⁶²^,⁶³ Therefore, plants can evolve under climatic gradients resulting in clinal adaptation. Hence, the breeding of climate resilience crops can be facilitated by understanding the genomic basis of clinal adaption in crop species.⁶⁴

2.3 Abiotic and biotic stresses

Biotic stresses are instigated by living organisms such as insects, parasitic plant nematodes, diseases, or weeds in production agriculture.⁶⁵ Genomics approaches to biotic stress include ribonucleic acid interference (RNAi) silencing and transgenesis. RNA interference (RNAi) silencing is employed against viruses and some fungi, while transgenesis has been exploited to develop resistance against some fungi, for example, Fusarium head blight.⁶⁶ Genome-wide identification and expression analysis in legume crops also revealed the role of small RNA biogenesis mediators in biotic stress response regulation.⁴²

Abiotic stresses, such as low or high temperatures, heavy metals, insufficient or excessive water, high salinity, ultraviolet radiation, are hostile to developing plants, resulting in significant wane in crop yield worldwide.⁶⁷ According to Kumar et al.,²⁸ knowledge of plants' response to abiotic stresses can be enhanced by integrating information generated from metabolomics and proteomics with genomics data. Sustenance of yield in crops threatened by abiotic stresses is a significant challenge in breeding resilient crop varieties.⁶⁸ A study on Brassica oleracea shows that heat stress transcription factors are integral to signal transduction pathways functioning in response to environmental stresses and are suggested to contribute significantly to various stress responses.⁶² Heat stress transcription factors genes were identified in the in silico analysis of B. oleracea. The identified genes may be exploited in developing crop varieties resilient to global climate change.¹^,⁶²

2.4 Population studies

Population genomics is employed to study adaptation and speciation. Population genomics datasets are used in GWAS to detect the genes responsible for adaptive phenotypic variations of large plant population samples.³⁵^,⁶⁹ For instance, Bamba et al. identified specific adaptation loci in a GWA study and unveiled the molecular basis of genetic trade-offs. It also showed that ecological fitness could be predicted by polygenic effects of several loci associated with local climate.³⁵ Medicinal plants’ diversity is of exceptional interest because of their ethnomedicine role. GWAS studies on adaptive genotypic and phenotypic variation provide a framework to assess the diversity of medicinal plant application across different cultures and infer modifications in plant use over time.⁷⁰^,⁷¹ Other genomic approaches such as genomic selection, nested association mapping, genetic diversity, and allele mining have been integrated into crop improvement programs to address the genetic issues associated with maize productivity and nutritional contents.⁷²

Plant omics studies have greatly helped our understanding and interpretation of plant responses to ecological influences and their contribution to key developmental processes important for crop yield and food quality. However, there are still some problems, such as a lack of data integration and robust techniques for phenotype-genotype correlation. Also, the use of omics-derived knowledge and tools in plant improvement strategies is limited and difficult. As a result, there is a pressing need to promote the integration of computational biology and plant genomics to benefit plant improvement.⁶^,⁷

3. Applications of machine learning in plant omics and agronomics

3.1 Machine learning algorithms and resources

Machine learning (ML) is a computer science field that utilizes algorithms to learn and capture the characteristics of target patterns of complex datasets.⁷³ Machine learning algorithms are generally classified into the following categories; supervised, semi-supervised, unsupervised, reinforcement, and deep learning.⁷⁴ A supervised ML algorithm is trained using a labeled dataset. It learns to respond more accurately based on these training sets by comparing its output with the given input.⁶⁷^,⁷³ Semi-supervised algorithms provide a tool that harnesses the potential of both supervised and unsupervised learning. These algorithms are ideally adapted for model building and can be used for classification, regression, and prediction.⁷⁵ Unsupervised learning is all about identifying unexplained existing patterns from the data to generate pattern rules. Unsupervised learning is a learning approach focused on statistics and thus applied to the issue of discovering a hidden structure in unlabeled data.⁷^,⁷⁴^,⁷⁶

Reinforcement learning is considered an intermediate form of learning as the algorithm is only provided with an answer that tells whether the output is correct or not.⁷⁵ Deep learning is built on artificial neural networks (ANN). The algorithms extract higher-level features from the raw input using multiple layers of neural networks. Learning of the algorithm can be unsupervised, semi-supervised, or supervised.⁷³ Machine learning approaches provide unique techniques for integrating and analyzing omics data, allowing for the improvement of crops and other economically important plants. Some machine learning algorithms have been used to developed tools specifically for plant omics analysis. Table 2 highlights the existing machine learning tools for plant omics analysis. Machine learning algorithms have a broad range of applications in plant genomics. These algorithms play vital roles in genome assembly, iterative gene regulatory network inference, and the identification of true SNPs in polyploid plants.⁷⁷

Table 2. Existing machine learning tools for plant omics data analysis.

Application area	Developed tools	URL	Algorithms	Selected features
Plants Mitochondrially Localized Proteins Prediction	MU-LOC⁷⁸	http://mu-loc.org/	SVM and DNN	gene co-expression information, protein position weight matrix, amino acid compositions, and N-terminal sequence information,
plant resistance protein NBSLRR prediction	NBSPred⁷⁹	http://soilecology.biol.lu.se/nbs/	SVM	R-protein and non-R-protein sequences attribute like sequence domain and compositional frequencies
Ribosomal proteins (RPs) prediction	RAMA⁸⁰	http://inctipp.bioagro.ufv.br:8080/Rama.	MLP, RF, and NB	Amino acid side chains attributes
Plant disease resistance proteins prediction	DRPPP⁸¹	http://14.139.240.55/NGS/download.php	SVM, MLP, and RF	genomic sequence (satellite DNAs)
Geminiviruses Gene and genera classification	Fangorn Forest (F2)⁸²	www.geminivirus.org:8080/geminivirusdw/discoveryGeminivirus.jsp.	SVM, MLP, and RF	genomic sequence (satellite DNAs)
Transcriptomes for stress responses in Arabidopsis	mIDNA⁸³	www.plantcell.org/content/26/2/520.short#def-8	RF with PSOL algorithm, SVM, and NN	patterns of 32 known stress-related gene expression traits and the complementary expression characteristic

3.2 Precision plant breeding

Precision breeding is a genetic engineering technique that involves reproducing organisms of the same species together to preserve desirable characteristics and create a stronger hybrid.⁸⁴ Traditional statistical methods mainly used in plant breeding strategies are ineffective in plant data analysis because of the non-deterministic and nonlinear nature of plant features attributed to environment, genotype, and interaction.⁸⁵ Machine learning has enabled effective plant phenotyping and data mining for patterns such as genotype and trait correlation.³⁹ It has also been successful in genomic selection. Genomic selection is a critical method used in selecting plant species with genetic gains of interest in plant breeding. Applying different ML algorithms in building GS models has produced robust and accurate prediction.⁸⁶^,⁸⁷ Multilayer neural networks (NNs) have been used in genetic value prediction in plant breeding. NN models are efficient in predicting genetic value, regardless of the population size, heritability, or coefficient of variation. Thus, the ANN is promising for genetic value prediction in unbiased experiments.⁸⁸ Multilayer NNs have been utilized to select bean genotypes with highly stable phenotypes using 13 genotypes of common beans between 2002 and 2006. The integration of this model in plant breeding has enabled precise genetic value prediction and selection.⁸⁹

Also, deep learning generated a robust prediction accuracy in grain yield compared to the conventional linear statistical methods used in traditional plant breeding when analyzing multiple traits with mixed ordinal, continuous, and binary phenotype data. Univariate and multivariate deep learning models' predictive performance was assessed using the Durum wheat (Triticum turgidum var. durum Desf.) dataset. Deep learning model performance shows that it has a promising potential to be a successful model for accurate genomic prediction in plant breeding.⁹⁰ The flexibility of machine learning algorithms makes them a viable alternative to traditional parametric methods for predicting categorical and continuous responses in genomic selection.⁹¹

An ensemble of RF and SVM was implemented to improve genotype-phenotype classification using manually derived root trait datasets. The combined model accurately identified the most distinguishing root traits and corresponding cultivar differentiation. The model's performance demonstrates the potential of ML approaches in unbiased cultivars classification and trait selection.⁹²

Additionally, predictive models have aided the integration of additive and dominance effects in GWAS and have enhanced the prediction of complex agronomic traits in polyploid plant species. For instance, a study revealed the feasibility of genome-wide prediction of potato agronomic traits despite being an autotetraploid food crop. It also shows that GWA prediction is viable in selecting breeding values in elite germplasm with substantial non-additive genetic variance.⁹³

3.3 Phenomics

Plant phenomics is a systematic study of plant phenotypes.²⁸ In recent years, plant field phenotyping has gotten a lot of attention with the possibility of crop fields' high-throughput analysis.⁹⁴ The application of machine learning methods and the various technological developments for image analysis have improved quantitative crop traits assessment.⁹²^,⁹⁴^,⁹⁵ For instance, CNN-based detection and analysis of wheat spikes using wheat field trials images captured over one planting season achieved an average accuracy of 88 to 94% across diverse groups of test images. CNN's high-performance accuracy shows that it is a robust model for genome-based selection and prediction in plant breeding.⁹⁶ Also, the RF algorithm was used in plant image segmentation involving the acquisition and processing of several plant images samples.⁹⁷ The predictions made by the model enabled the discovery of various parameters relevant to plant growth.⁹⁴

3.4 Stress resilience phenotyping

ML learning has been exploited in identifying favorable agronomic traits, including abiotic and biotic stress resistance. ML algorithms are integrated into conventional statistical methodologies to optimize the accuracy of plants stresses prediction and detection.⁹⁶ For instance, 25,000 soybean leaflets images exposed to varied diseases and nutritional perturbation were used to develop a convolutional neural network (CNN), which can infer the image features of the disease types and dietary deficiencies at high resolution. The prediction accuracy of the ML framework was very close to that of human expert diagnosis. Other plants’ induced stresses can also be identified, classified, and quantified using the model. The model can also be adapted to identify, classify, and quantify the induced stresses in other plants.⁹⁵

Random forest has been used to predict metabolite and transcript markers in drought tolerance prediction using experimental drought-stressed plant field trial datasets. The low error rate recorded in the model shows that the model could be considered as an alternative model for accurate prediction and identification of molecular markers.⁹⁸ RF was used to identify suitable features combination for phenotypic traits prediction using data derived from various agro-management treatment experiments. This approach achieved optimal prediction accuracy and improved plant breeding strategies by enabling maximal allocation of stress management resources.⁹⁹ Sanz-Carbonell and colleagues used deep sequencing and computational approaches such as PCA and Clustering analysis to infer the biotic and abiotic stress responses regulatory network mediated by miRNA. 24 miRNAs were used in this study, all of which are known to alter expression significantly under stressful conditions. The prediction generated inference that target genes of miRNAs down-regulated under stress conditions contribute to plant response to stress, whereas miRNAs that are up-regulated control genes associated with growth and development.⁶⁷ Soybean fields were screened for tolerance to soybean iron chlorosis deficiency (abiotic stress in soybean) using linear discriminant analysis (LDA) and SVM. The phenotypic data obtained from soybean fields were used in model training and predicting soybeans' iron chlorosis deficiency stress. The ML application has helped evaluate the severity of real-time stress in the soybean sector.⁹⁵

3.5 Plant–pathogen interaction and diseases prediction

Plant diseases and pests pose a significant threat to agriculture. Early identification of plant diseases and pests would aid in developing effective treatment strategies while economic losses are mitigated.¹⁰⁰ Diverse ML approaches for precise disease recognition and prediction have been implemented in plant populations.¹⁰¹^,¹⁰² Neural networks (NNs) have achieved impressive results in plant disease prediction using image classification. A deep convolutional network was implemented in leaf image classification model for disease recognition. The developed model showed a high predictive performance in distinguishing plant leaves from their surroundings and recognized 13 plant diseases types on healthy leaves.¹⁰³ In another approach, a heterogeneous ensemble of deep-learning-based neural network models was used in detecting tomato plants diseases and pests using images collected on-site by imaging devices of varying resolutions. The ensemble model successfully handled image complexity in the plant's surrounding area and recognized nine different diseases and pests.¹⁰⁰^,¹⁰⁴ Therefore, deep CNNs are promising in automatic classification and detection of diseases traits from leaf images. In addition, CNN has shown optimal performance when implemented in plant–pathogen interaction, pest, and disease recognition in some studies. These studies include prediction of pests and diseases occurrence in cotton¹⁰⁵; rice plant diseases and pests recognition¹⁰⁶; rice blast disease prediction¹⁰⁷^,¹⁰⁸; image-based potato tuber disease detection¹⁰⁹ and so on. The CNN model is a high-performing method for detecting plant diseases, and it can be implemented and optimized for practical applications.

SVM has also been used for weather-based rice blast prediction and has proven suitable for plant disease forecasting with incredible predictive accuracy. A world-first SVM-based web server for rice blast prediction was developed. Plant scientists and farmers have benefited from this tool, especially in their decision-making.¹¹⁰ In the pixel-wise quantification and identification of powdery mildew diseased barley tissue, SVM classification was used to establish hypersensitive response spots using multispectral imaging of diseased barley plants. SVM application enabled precise automatic identification of barley interaction with powdery mildew.¹¹¹

Recently, a data-driven ML approach named ApoplastP was proposed. RF classifier is the base algorithm for ApoplastP and has shown high performance in predicting protein localization in plant apoplast. At first, differences in the constituents of apoplastic and intracellular plant proteins were unknown. However, the advent of ApoplastP enabled the exploration of differences in the composition of plant proteins. The plant apoplast is integral to plant–pathogen interactions, transport, and intercellular signaling. Hence, integrating and optimizing machine learning algorithms in apoplastic localization prediction will aid functional studies and help predict whether an effector will localize to the apoplast or enter the plant cells.¹¹²

Also, RF has been implemented to build an inter-species protein–protein interaction (PPI) prediction model using Arabidopsis–pathogen PPI data acquired both experimentally and from PPI public databases-UniProt. A critical assessment of the model performance showed that random forest integration with linear statistical methods using sequence information and network attributes as model features resulted in substantial and robust improvement in performance.²⁴

In addition, RF classification has been used to exploit protein biomarkers' potential in precision breeding using biomarkers generated assays of 104 potatoes (Solanum tuberosum) peptides. These peptides were selected using diagonal linear discriminant analysis, bagging, principal component analysis (PCA), and SVM and then classified with RF classifiers. The ML algorithms' application helped identify Phytophthora infestans resistance in leaves, tubers and its effect on plant yield using potato leaf secretome data.¹⁰⁹ Early disease detection enables farmers to use timely and targeted crop protection strategies. With the use of ML, researchers have improved the accuracy of object detection and recognition systems dramatically.

3.6 Challenges and future outlook

ML applications in plant genomics and agronomics have majorly contributed to efficient breeding of crops with desirable agronomic traits, plant phenotyping, genetic trait prediction, and precise disease prediction such as in rice, soybeans, maize, beans, etc.⁷²^,⁹⁵^,¹⁰⁶^,¹¹⁰ However, several limitations still exist. Firstly, the black-box nature of some sophisticated ML algorithms inhibits interpretation. The plant research community is more interested and fascinated with the biological implications of the prediction than the accuracy of the predictive model. Hence, there is a need for further processing and careful interpretation of the predictive model output using conforming biological knowledge. Additionally, the dimensionality of omics datasets poses challenges such as multicollinearity, overfitting, and sparsity which are difficult to avoid. Though contemporary machine learning methods and the huge sample size can partially alleviate these problems, the model’s accuracy can be significantly enhanced by using different fine-tuning, augmentation, and optimization techniques.¹⁰⁷ Data integration from various sources is necessary for GS-assisted breeding and other trait improvement approaches.¹¹³ Simultaneous analysis of multiple omics datasets can advance our understanding of complex biological phenomena.⁷⁸^,⁷⁹ Another challenge is the limited and inconsistent information on plant-pathogen interaction phenotypic information. The ML models used in plant disease recognition can be extended by enriching the plant disease database with plant-pathogen interaction phenotype data. Developing more robust classification algorithms with an expanded number of diseases classes will improve plant disease recognition and forecasting.¹⁰³^,¹⁰⁵^,¹⁰⁸^,¹⁰⁶ Finally, a comprehensive plant database must be constructed to facilitate comparative studies and promote research collaborations on critical plant science problems.

4. Conclusion

Machine learning has shown tremendous promise in studying enormous high-dimensional data sets, although it is still limited in plant molecular studies application. An in-depth understanding of ML models will stimulate ML implementation for plant biological data analysis. As sequenced plant genome data continues to accumulate, ML will accelerate all plant genomic research fields, including identifying genes associated with biotic and abiotic stress resistance and other genes with significant functions, understanding gene regulation mechanisms, exploring plant genome genetic framework, and estimating breeding values. These advancements would help agricultural researchers improve the quality and yield of crops with stronger tolerance to abiotic and biotic stress and other plant health-threatening issues.

Data availably statement

Extended data

OSF: Extended data for “Machine learning algorithms: their applications in plant omics and agronomic traits’ improvement”, https://doi.org/10.17605/OSF.IO/TE6GC.¹⁴

Files included:

Supplementary Table 1. Published sequenced plant genomes. Hundreds of plant genomes have been sequenced and published since 2000. The statistics for each genome are taken from the publication, despite several model plants having significant updates to genome assemblies and gene counts. NA, data not available in publication; Mb, megabases; kb, kilobases.

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

References

1. Mousavi-Derazmahalleh M, Bayer PE, Hane JK, et al.: Adapting legume crops to climate change using genomic approaches. Plant Cell Environ. 2019; 42: 6–19. PubMed Abstract | Publisher Full Text
2. United Nations Convention to Combat Desertification (UNCCD).: Sustainable Development Knowledge Platform:n.d. (accessed August 24, 2020).Reference Source
3. Zhang Q, Xu M, Xia X, et al.: Crop genetics research in Asia: improving food security and nutrition. Theor. Appl. Genet. 2020; 133: 1339–1344. PubMed Abstract | Publisher Full Text
4. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000; 408: 796–815. Publisher Full Text
5. Michael TP, VanBuren R: Progress, challenges and the future of crop genomes. Curr. Opin. Plant Biol. 2015; 24: 71–81. Publisher Full Text
6. Borevitz JO, Ecker JR: PLANT GENOMICS: The Third Wave. Annu. Rev. Genom. Hum. Genet. 2004; 5: 443–477. PubMed Abstract | Publisher Full Text
7. Abadi S, Yan WX, Amar D, et al.: A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLoS Comput. Biol. 2017; 13: e1005807. PubMed Abstract | Publisher Full Text
8. Marri PR, Ye L, Jia Y, et al.:Advances in Sequencing and Resequencing in Crop Plants.Varshney RK, Pandey MK, Chitikineni A, editors. Plant Genetics and Molecular Biology. Cham:Springer International Publishing;2018; pp. 11–35. PubMed Abstract | Publisher Full Text
9. Akhtar MS, Alaraidh IA, Hakeem KR:Experimental Approaches for Genome Sequencing.Hakeem KR, Shaik NA, Banaganapalli B, et al., editors. Essentials of Bioinformatics, Volume III: In Silico Life Sciences: Agriculture. Cham:Springer International Publishing;2019; pp. 159–165. Publisher Full Text
10. Huala E, Dickerman AW, Garcia-Hernandez M, et al.: The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 2001; 29: 102–105. PubMed Abstract | Publisher Full Text | Free Full Text
11. Hu S, Li G, Yang J, et al.: Aquatic Plant Genomics: Advances, Applications, and Prospects. Int. J. Genomics. 2017; 2017: 1–9. PubMed Abstract | Publisher Full Text
12. Pryer KM, Schneider H, Zimmer EA, et al.: Ann Banks, Deciding among green plants for whole genome studies. Trends Plant Sci. 2002; 7: 550–554. PubMed Abstract | Publisher Full Text
13. Cooper L, Jaiswal P:The Plant Ontology: A Tool for Plant Genomics.Edwards D, editor. Plant Bioinformatics. New York, NY:Springer New York;2016; pp. 89–114. Publisher Full Text
14. Isewon IM, Apata OR, Oluwamuyiwa FA, et al.: Machine Learning Algorithms: Their Applications in Plant Omics and Agronomic Traits Improvement.2022. Publisher Full Text
15. Chen F, Dong W, Zhang J, et al.: The Sequenced Angiosperm Genomes and Genome Databases. Front. Plant Sci. 2018; 9. PubMed Abstract | Publisher Full Text
16. Rai A, Yamazaki M, Saito K: A new era in plant functional genomics. Curr. Opin. Syst. Biol. 2019; 15: 58–67. Publisher Full Text
17. Abdurakhmonov IY:Genomics Era for Plants and Crop Species – Advances Made and Needed Tasks Ahead.Abdurakhmonov IY, editor. Plant Genomics. InTech;2016. Publisher Full Text
18. Shrivastava P, Kumar R: Soil salinity: A serious environmental issue and plant growth promoting bacteria as one of the tools for its alleviation. Saudi J. Biol. Sci. 2015; 22: 123–131. PubMed Abstract | Publisher Full Text
19. Terryn N, Rouzé P, Montagu MV: Plant genomics. FEBS Lett. 1999; 452: 3–6. Publisher Full Text
20. Exposito-Alonso M, Drost H-G, Burbano HA, et al.: The Earth BioGenome project: opportunities and challenges for plant genomics and conservation. Plant J. 2020; 102: 222–229. PubMed Abstract | Publisher Full Text
21. Holtorf H, Guitton M-C, Reski R: Plant functional genomics. Naturwissenschaften. 2002; 89: 235–249. Publisher Full Text
22. Yang Y, Saand MA, Huang L, et al.: Applications of Multi-Omics Technologies for Crop Improvement. Front. Plant Sci. 2021; 12: 1846. PubMed Abstract | Publisher Full Text
23. Muthamilarasan M, Singh NK, Prasad M: Multi-omics approaches for strategic improvement of stress tolerance in underutilized crop species: A climate change perspective. Adv. Genet. 2019; 103: 1–38. PubMed Abstract | Publisher Full Text
24. Yang S, Li H, He H, et al.: Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods. Brief. Bioinformatics. 2019; 20: 274–287. PubMed Abstract | Publisher Full Text
25. Babar MM, Zaidi N-SS, Pothineni VR, et al.:Application of Bioinformatics and System Biology in Medicinal Plant Studies.Hakeem KR, Malik A, Vardar-Sukan F, et al., editors. Plant Bioinformatics: Decoding the Phyta. Cham:Springer International Publishing;2017; pp. 375–393. Publisher Full Text
26. Wolfender J-L, Rudaz S, Hae Choi Y, et al.: Plant Metabolomics: From Holistic Data to Relevant Biomarkers. Curr. Med. Chem. 2013; 20: 1056–1090. PubMed Abstract | Publisher Full Text
27. Kempinski C, Jiang Z, Bell S, et al.:Metabolic Engineering of Higher Plants and Algae for Isoprenoid Production.Schrader J, Bohlmann J, editors. Biotechnology of Isoprenoids. Cham:Springer International Publishing;2015; pp. 161–199. PubMed Abstract | Publisher Full Text
28. Kumar J, Pratap A, Kumar S:Plant Phenomics: An Overview.Kumar J, Pratap A, Kumar S, editors. Phenomics in Crop Plants: Trends. Springer India, New Delhi:Options and Limitations;2015; pp. 1–10. Publisher Full Text
29. Tshikunde NM, Mashilo J, Shimelis H, et al.: Agronomic and Physiological Traits, and Associated Quantitative Trait Loci (QTL) Affecting Yield Response in Wheat (Triticum aestivum L.): A Review. Front. Plant Sci. 2019; 10. PubMed Abstract | Publisher Full Text
30. Fan G, Liu X, Sun S, et al.: The Chromosome Level Genome and Genome-wide Association Study for the Agronomic Traits of Panax Notoginseng. IScience. 2020; 23: 101538. PubMed Abstract | Publisher Full Text
31. Varshney RK, Bohra A, Yu J, et al.: Designing Future Crops: Genomics-Assisted Breeding Comes of Age. Trends Plant Sci. 2021; 26: 631–649. PubMed Abstract | Publisher Full Text
32. Varshney RK, Singh VK, Kumar A, et al.: Can genomics deliver climate-change ready crops? Curr. Opin. Plant Biol. 2018; 45: 205–211. PubMed Abstract | Publisher Full Text
33. Thudi M, Palakurthi R, Schnable JC, et al.: Genomic resources in plant breeding for sustainable agriculture. J. Plant Physiol. 2021; 257: 153351. PubMed Abstract | Publisher Full Text
34. Zenda T, Liu S, Dong A, et al.: Advances in Cereal Crop Genomics for Resilience under Climate Change. Life (Basel, Switzerland). 2021; 11. PubMed Abstract | Publisher Full Text
35. Bamba M, Kawaguchi YW, Tsuchimatsu T: Plant adaptation and speciation studied by population genomic approaches. Develop. Growth Differ. 2019; 61: 12–24. PubMed Abstract | Publisher Full Text
36. Gedil M, Menkir A: An Integrated Molecular and Conventional Breeding Scheme for Enhancing Genetic Gain in Maize in Africa. Front. Plant Sci. 2019; 10: 1430. PubMed Abstract | Publisher Full Text
37. Jha UC, Sharma KD, Nayyar H, et al.: Breeding and Genomics Interventions for Developing Ascochyta Blight Resistant Grain Legumes. Int. J. Mol. Sci. 2022; 23: 2217. PubMed Abstract | Publisher Full Text
38. Kole C, Muthamilarasan M, Henry R, et al.: Application of genomics-assisted breeding for generation of climate resilient crops: progress and prospects. Front. Plant Sci. 2015; 6. Publisher Full Text
39. Hu H, Scheben A, Edwards D: Advances in Integrating Genomics and Bioinformatics in the Plant Breeding Pipeline. Agriculture. 2018; 8: 75. Publisher Full Text
40. Singh RK, Prasad A, Muthamilarasan M, et al.: Breeding and biotechnological interventions for trait improvement: status and prospects. Planta. 2020; 252: 54. PubMed Abstract | Publisher Full Text
41. Jia B, Conner RL, Penner WC, et al.: Quantitative Trait Locus Mapping of Marsh Spot Disease Resistance in Cranberry Common Bean (Phaseolus vulgaris L.). Int. J. Mol. Sci. 2022; 23: 7639. PubMed Abstract | Publisher Full Text
42. Garg V, Agarwal G, Pazhamala LT, et al.: Genome-Wide Identification, Characterization, and Expression Analysis of Small RNA Biogenesis Purveyors Reveal Their Role in Regulation of Biotic Stress Responses in Three Legume Crops. Front. Plant Sci. 2017; 8. PubMed Abstract | Publisher Full Text
43. Barreto FZ, Rosa JRBF, Balsalobre TWA, et al.: A genome-wide association study identified loci for yield component traits in sugarcane (Saccharum spp.). PLoS One. 2019; 14: e0219843. PubMed Abstract | Publisher Full Text
44. Sandhu N, Yadav S, Catolos M, et al.: Developing Climate-Resilient, Direct-Seeded, Adapted Multiple-Stress-Tolerant Rice Applying Genomics-Assisted Breeding. Front. Plant Sci. 2021; 12. PubMed Abstract | Publisher Full Text
45. He Z, Xin Y, Wang C, et al.: Genomics-Assisted Improvement of Super High-Yield Hybrid Rice Variety “Super 1000” for Resistance to Bacterial Blight and Blast Diseases. Front. Plant Sci. 2022; 13: 881244. PubMed Abstract | Publisher Full Text
46. Sandhu N, Singh J, Singh G, et al.: Development and validation of a novel core set of KASP markers for the traits improving grain yield and adaptability of rice under direct-seeded cultivation conditions. Genomics. 2022; 114: 110269. PubMed Abstract | Publisher Full Text
47. Bohra A, Saxena KB, Varshney RK, et al.: Genomics-assisted breeding for pigeonpea improvement. Theor. Appl. Genet. 2020; 133: 1721–1737. PubMed Abstract | Publisher Full Text
48. Mohanty JK, Jha UC, Dixit GP, et al.: Harnessing the hidden allelic diversity of wild Cicer to accelerate genomics-assisted chickpea crop improvement. Mol. Biol. Rep. 2022; 49: 5697–5715. Publisher Full Text
49. Miedaner T, Boeven ALG-C, Gaikpa DS, et al.: Genomics-Assisted Breeding for Quantitative Disease Resistances in Small-Grain Cereals and Maize. Int. J. Mol. Sci. 2020; 21: E9717. PubMed Abstract | Publisher Full Text
50. Tao Y, Zhao X, Mace E, et al.: Exploring and Exploiting Pan-genomics for Crop Improvement. Mol. Plant. 2019; 12: 156–169. PubMed Abstract | Publisher Full Text
51. Muthamilarasan M, Theriappan P, Prasad M: Recent advances in crop genomics for ensuring food security. Curr. Sci. 2013; 105: 155–158.
52. Gordon SP, Contreras-Moreira B, Woods DP, et al.: Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 2017; 8: 2184. PubMed Abstract | Publisher Full Text
53. Lin K, Zhang N, Severing EI, et al.: Beyond genomic variation - comparison and functional annotation of three Brassica rapagenomes: a turnip, a rapid cycling and a Chinese cabbage. BMC Genomics. 2014; 15: 250. PubMed Abstract | Publisher Full Text
54. Golicz AA, Bayer PE, Barker GC, et al.: The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 2016; 7: 13390. PubMed Abstract | Publisher Full Text
55. Hurgobin B, Golicz AA, Bayer PE, et al.: Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 2018; 16: 1265–1274. PubMed Abstract | Publisher Full Text
56. Hirsch CN, Foerster JM, Johnson JM, et al.: Insights into the Maize Pan-Genome and Pan-Transcriptome. Plant Cell. 2014; 26: 121–135. PubMed Abstract | Publisher Full Text
57. Wang W, Mauleon R, Hu Z, et al.: Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018; 557: 43–49. PubMed Abstract | Publisher Full Text
58. Zhao Q, Feng Q, Lu H, et al.: Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 2018; 50: 278–284. PubMed Abstract | Publisher Full Text
59. Li Y, Zhou G, Ma J, et al.: De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 2014; 32: 1045–1052. PubMed Abstract | Publisher Full Text
60. Montenegro JD, Golicz AA, Bayer PE, et al.: The pangenome of hexaploid bread wheat. Plant J. 2017; 90: 1007–1013. PubMed Abstract | Publisher Full Text
61. Neeta L, Golicz A, Singh M, et al.: Genome-wide analysis of the Hsf gene family in Brassica oleracea and a comparative analysis of the Hsf gene family in B. oleracea, B. rapa and B. napus. Funct. Integr. Genomics. 2019; 19: 515–531. PubMed Abstract | Publisher Full Text
62. Ndjiondjop MN, Alachiotis N, Pavlidis P, et al.: Comparisons of molecular diversity indices, selective sweeps and population structure of African rice with its wild progenitor and Asian rice. Theor. Appl. Genet. 2019; 132: 1145–1158. PubMed Abstract | Publisher Full Text
63. Nevo E: Ecological genomics of natural plant populations: the Israeli perspective. Methods Mol. Biol. 2009; 513: 321–344. PubMed Abstract | Publisher Full Text
64. Olatoye MO, Hu Z, Maina F, et al.: Genomic Signatures of Adaptation to a Precipitation Gradient in Nigerian Sorghum. G3 (Bethesda). 2018; 8: 3269–3281. PubMed Abstract | Publisher Full Text
65. Juliana P, Poland J, Huerta-Espino J, et al.: Improving grain yield, stress resilience and quality of bread wheat using large-scale genomics. Nat. Genet. 2019; 51: 1530–1539. PubMed Abstract | Publisher Full Text
66. Ricroch AE:What Will Be the Benefits of Biotech Wheat for European Agriculture?Bhalla PL, Singh MB, editors. Wheat Biotechnology: Methods and Protocols. New York, NY:Springer;2017; pp. 25–35. Publisher Full Text
67. Sanz-Carbonell A, Marques MC, Bustamante A, et al.: Inferring the regulatory network of the miRNA-mediated response to biotic and abiotic stress in melon. BMC Plant Biol. 2019; 19: 78. PubMed Abstract | Publisher Full Text
68. Mochida K, Shinozaki K: Genomics and Bioinformatics Resources for Crop Improvement. Plant Cell Physiol. 2010; 51: 497–523. PubMed Abstract | Publisher Full Text
69. Katuuramu DN, Branham SE, Levi A, et al.: Genome-Wide Association Analysis of Resistance to Pseudoperonospora cubensis in Citron Watermelon. Plant Dis. 2022; 106: 1952–1958. PubMed Abstract | Publisher Full Text
70. Teixidor-Toneu I, Jordan FM, Hawkins JA: Comparative phylogenetic methods and the cultural evolution of medicinal plant use. Nat Plants. 2018; 4: 754–761. Publisher Full Text
71. Guidini R, Jahani M, Huang K, et al.: Genome Wide Association Mapping in Sunflower (Helianthus annuus L.) reveals Common Loci and Putative Candidate Genes for resistance to Diaporthe gulyae and D. helianthi causing Phomopsis Stem Canker. Plant Dis. 2022. Publisher Full Text
72. Chakradhar T, Hindu V, Reddy PS: Genomic-based-breeding tools for tropical maize improvement. Genetica. 2017; 145: 525–539. PubMed Abstract | Publisher Full Text
73. Sun S, Wang C, Ding H, et al.: Machine learning and its applications in plant molecular studies. Brief. Funct. Genomics. 2020; 19: 40–48. Publisher Full Text
74. Alzubi J, Nayyar A, Kumar A: Machine Learning from Theory to Algorithms: An Overview. J. Phys. Conf. Ser. 2018; 1142: 012012. Publisher Full Text
75. Carbonell JG, Michalski RS, Mitchell TM:1 - AN OVERVIEW OF MACHINE LEARNING.Michalski RS, Carbonell JG, Mitchell TM, editors. Machine Learning. San Francisco (CA):Morgan Kaufmann;1983; pp. 3–23. Publisher Full Text
76. Matukumalli LK, Grefenstette JJ, Hyten DL, et al.: Application of machine learning in SNP discovery. BMC Bioinformatics. 2006; 7: 4. PubMed Abstract | Publisher Full Text
77. Esposito S, Carputo D, Cardi T, et al.: Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding. Plants. 2020; 9: 34. PubMed Abstract | Publisher Full Text
78. Zhang N, Rao RSP, Salvato F, et al.: MU-LOC: A Machine-Learning Method for Predicting Mitochondrially Localized Proteins in Plants. Front. Plant Sci. 2018; 9. PubMed Abstract | Publisher Full Text
79. Kushwaha SK, Chauhan P, Hedlund K, et al.: NBSPred: a support vector machine-based high-throughput pipeline for plant resistance protein NBSLRR prediction. Bioinformatics. 2016; 32: 1223–1225. PubMed Abstract | Publisher Full Text
80. Carvalho TFM, Silva JCF, Calil IP, et al.: Rama: a machine learning approach for ribosomal protein prediction in plants. Sci. Rep. 2017; 7: 16273. PubMed Abstract | Publisher Full Text
81. Pal T, Jaiswal V, Chauhan R: DRPPP: A machine learning based tool for prediction of disease resistance proteins in plants. Comput. Biol. Med. 2016; 78: 42–48. PubMed Abstract | Publisher Full Text
82. Silva JC, Carvalho T, Fontes E, et al.: Fangorn Forest (F2): a machine learning approach to classify genes and genera in the family Geminiviridae. BMC Bioinformatics. 2017; 18: 431. PubMed Abstract | Publisher Full Text
83. Ma C, Xin M, Feldmann KA, et al.: Machine Learning-Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis. Plant Cell. 2014; 26: 520–537. PubMed Abstract | Publisher Full Text
84. van Dijk ADJ , Kootstra G, Kruijer W, et al.: Machine learning in plant science and plant breeding. IScience. 2021; 24: 101890. PubMed Abstract | Publisher Full Text
85. Niazian M, Niedbała G: Machine Learning for Plant Breeding and Biotechnology. Agriculture. 2020; 10: 436. Publisher Full Text
86. Guzzetta G, Jurman G, Furlanello C: A machine learning pipeline for quantitative phenotype prediction from genotype data. BMC Bioinformatics. 2010; 11: S3. PubMed Abstract | Publisher Full Text
87. Crossa J, Pérez-Rodríguez P, Cuevas J, et al.: Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017; 22: 961–975. PubMed Abstract | Publisher Full Text
88. Peixoto LA, Bhering LL, Cruz CD: Artificial neural networks reveal efficiency in genetic value prediction. Genet. Mol. Res. 2015; 14: 6796–6807. PubMed Abstract | Publisher Full Text
89. Corrêa AM, Teodoro PE, Gonçalves MC, et al.: Artificial intelligence in the selection of common bean genotypes with high phenotypic stability. Genet. Mol. Res. 2016; 15. Publisher Full Text
90. Montesinos-López OA, Montesinos-López A, Tuberosa R, et al.: Multi-Trait, Multi-Environment Genomic Prediction of Durum Wheat With Genomic Best Linear Unbiased Predictor and Deep Learning Methods. Front. Plant Sci. 2019; 10. Publisher Full Text
91. González-Camacho JM, Ornella L, Pérez-Rodríguez P, et al.: Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance. Plant Genome. 2018; 11: 170104. PubMed Abstract | Publisher Full Text
92. Zhao BG, Rewald B: Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits. Front. Plant Sci. 2016; 7. Publisher Full Text
93. Endelman JB, Carley CAS, Bethke PC, et al.: Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato. Genetics. 2018; 209: 77–87. PubMed Abstract | Publisher Full Text
94. Selvaraj MG, Valderrama M, Guzman D, et al.: Machine learning for high-throughput field phenotyping and image processing provides insight into the association of above and below-ground traits in cassava (Manihot esculenta Crantz). Plant Methods. 2020; 16: 87. PubMed Abstract | Publisher Full Text
95. Dobbels AA, Lorenz AJ: Soybean iron deficiency chlorosis high throughput phenotyping using an unmanned aircraft system. Plant Methods. 2019; 15: 97. PubMed Abstract | Publisher Full Text
96. Hasan MM, Chopin JP, Laga H, et al.: Detection and analysis of wheat spikes using Convolutional Neural Networks. Plant Methods. 2018; 14: 100. PubMed Abstract | Publisher Full Text
97. de Carvalho RRB , Marmolejo Cortes DF, Sousa MBE, et al.: Image-based phenotyping of cassava roots for diversity studies and carotenoids prediction. PLoS One. 2022; 17: e0263326. PubMed Abstract | Publisher Full Text
98. Sprenger H, Erban A, Seddig S, et al.: Metabolite and transcript markers for the prediction of potato drought tolerance. Plant Biotechnol. J. 2018; 16: 939–950. PubMed Abstract | Publisher Full Text
99. Parmley KA, Higgins RH, Ganapathysubramanian B, et al.: Machine Learning Approach for Prescriptive Plant Breeding. Sci. Rep. 2019; 9: 17132. PubMed Abstract | Publisher Full Text
100. Fuentes A, Yoon S, Kim SC, et al.: A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition. Sensors (Basel). 2017; 17. PubMed Abstract | Publisher Full Text
101. Danilevicz MF, Gill M, Anderson R, et al.: Plant Genotype to Phenotype Prediction Using Machine Learning. Front. Genet. 2022; 13: 822173. PubMed Abstract | Publisher Full Text
102. Simón D, Borsani O, Filippi CV: RFPDR: a random forest approach for plant disease resistance protein prediction. PeerJ. 2022; 10: e11683. PubMed Abstract | Publisher Full Text
103. Sladojevic S, Arsenovic M, Anderla A, et al.: Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification. Comput. Intell. Neurosci. 2016; 2016: 1–11. PubMed Abstract | Publisher Full Text
104. Wang Q, Qi F, Sun M, et al.: Identification of Tomato Disease Types and Detection of Infected Areas Based on Deep Convolutional Neural Networks and Object Detection Techniques. Comput. Intell. Neurosci. 2019; 2019: 9142715–9142753. PubMed Abstract | Publisher Full Text
105. Xiao Q, Li W, Kai Y, et al.: Occurrence prediction of pests and diseases in cotton on the basis of weather factors by long short term memory network. BMC Bioinformatics. 2019; 20: 688. PubMed Abstract | Publisher Full Text
106. Li D, Wang R, Xie C, et al.: A Recognition Method for Rice Plant Diseases and Pests Video Detection Based on Deep Convolutional Neural Network. Sensors (Basel). 2020; 20. PubMed Abstract | Publisher Full Text
107. Liang W-J, Zhang H, Zhang G-F, et al.: Rice Blast Disease Recognition Using a Deep Convolutional Neural Network. Sci. Rep. 2019; 9: 2869. PubMed Abstract | Publisher Full Text
108. Nettleton DF, Katsantonis D, Kalaitzidis A, et al.: Predicting rice blast disease: machine learning versus process-based models. BMC Bioinformatics. 2019; 20: 514. PubMed Abstract | Publisher Full Text
109. Oppenheim D, Shani G, Erlich O, et al.: Using Deep Learning for Image-Based Potato Tuber Disease Detection. Phytopathology. 2019; 109: 1083–1087. Publisher Full Text
110. Kaundal R, Kapoor AS, Raghava GPS: Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinformatics. 2006; 7: 485. PubMed Abstract | Publisher Full Text
111. Kuska MT, Behmann J, Großkinsky DK, et al.: Screening of Barley Resistance Against Powdery Mildew by Simultaneous High-Throughput Enzyme Activity Signature Profiling and Multispectral Imaging. Front. Plant Sci. 2018; 9: 1074. Publisher Full Text
112. Sperschneider J, Dodds PN, Singh KB, et al.: ApoplastP: prediction of effectors and plant proteins in the apoplast using machine learning. New Phytol. 2018; 217: 1764–1778. PubMed Abstract | Publisher Full Text
113. Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, et al.: A review of deep learning applications for genomic selection. BMC Genomics. 2021; 22: 19. PubMed Abstract | Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 04 Nov 2022

Author details Author details

¹ Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, 112233, Nigeria
² Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, Nigeria
³ Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria

Itunuoluwa Isewon
Roles: Conceptualization, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Writing – Original Draft Preparation, Writing – Review & Editing

Oluwabukola Apata
Roles: Formal Analysis, Investigation, Methodology, Visualization, Writing – Original Draft Preparation

Fesobi Oluwamuyiwa
Roles: Data Curation, Methodology, Software, Writing – Original Draft Preparation

Olufemi Aromolaran
Roles: Data Curation, Formal Analysis, Software, Writing – Review & Editing

Jelili Oyelade
Roles: Conceptualization, Funding Acquisition, Resources, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was supported by funding from the World Bank awarded to Covenant Applied Informatics, Communication Africa Centre of Excellence (CApIC-ACE) through the ACE Impact Project (2019 – 2024) and Covenant University Center for Research, Innovation and Discovery (CUCRID)
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 04 Nov 2022, 11:1256

https://doi.org/10.12688/f1000research.125425.1

Copyright

© 2022 Isewon I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Isewon I, Apata O, Oluwamuyiwa F et al. Machine learning algorithms: their applications in plant omics and agronomic traits’ improvement [version 1; peer review: 3 not approved]. F1000Research 2022, 11:1256 (https://doi.org/10.12688/f1000research.125425.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 04 Nov 2022

Views

9

Reviewer Report 15 Dec 2023

Haixiao Hu, Plant Sciences, University of California Davis, Davis, California, USA

Not Approved

https://doi.org/10.5256/f1000research.137728.r220199

The authors endeavored to review the applications of machine learning algorithms in plant omics and agronomic trait improvement. In Section 1, the authors enumerated various genomics databases and identified five subfields within plant omics technologies. In Section 2, the utilization ... Continue reading

The authors endeavored to review the applications of machine learning algorithms in plant omics and agronomic trait improvement. In Section 1, the authors enumerated various genomics databases and identified five subfields within plant omics technologies. In Section 2, the utilization of plant omics technologies across four major research areas was discussed. Section 3, classifying machine learning algorithms into five categories, and highlighted their application across five research areas in sections 3.2 to 3.5.

While acknowledging the importance of the four application research areas for omics technology and machine learning methods, I noted that the authors did not thoroughly describe how each omics technology, individually or collaboratively, contributes to enhancing agronomic traits in plant science. Similarly, regarding machine learning algorithms/methods, the authors omitted detailed discussions on theories, the gradual implementation of these computational technologies in each area, milestone works, and the most successful applications in major plant species. Therefore, in addition to merely listing the areas to which omics technologies and machine learning algorithms can be applied, it is strongly recommended that the authors delve extensively into each application area. This entails a thorough examination of a substantial number of high-quality papers to ensure a profound understanding of how omics and machine learning technologies have significantly impacted each research domain, who are the major contributors in each of these areas, etc. A comprehensive review should not only outline the application areas but also offer insights into how these technologies have reshaped the landscape of plant science within each specific area.

Minor comments:
Concerning Figure 1, it's recommended that the authors specify the source of the published plant genome sequences from 2000 to the present, indicating whether it comprises a full set or a subset of the total plant genomes released each year. Additionally, the inclusion of a y-axis is suggested for clarity.

Regarding Table 1, clarification is sought on the criteria used for selecting the databases listed. Furthermore, the absence of certain important species-specific databases like TAIR, maizeGDB, SoyBase, etc., raises questions about the selection criteria.

In Section 3.2, the statement "Genomic selection is a critical method used in selecting plant species with genetic gains of interest in plant breeding" is deemed not entirely accurate. It is suggested to revise it to "Genomic prediction is a critical method used to select the best individuals with the optimal combination of beneficial alleles in a breeding population."

Is the topic of the review discussed comprehensively in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

Yes
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics, Quantitative Genetics, Statistics, Plant Breeding

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

12

Reviewer Report 24 Nov 2023

Hao Tong, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany

Not Approved

https://doi.org/10.5256/f1000research.137728.r220205

The review paper prepared by Isewon et al. tried to summarize how machine learning models applied to agriculture. This manuscript is not well-structured, many sentences are unclear for me. There are many review papers on this topic in different particular ... Continue reading

The review paper prepared by Isewon et al. tried to summarize how machine learning models applied to agriculture. This manuscript is not well-structured, many sentences are unclear for me. There are many review papers on this topic in different particular focuses, I did not see the novelty this general review paper contributed to this topic.

It is better to begin with why we need to use machine learning models in agriculture in introduction section, instead of using nearly half of manuscripts to summarize the dataset.
The Table 1 needs to be better organized to include what species and omics data. Also, some links are not working when I click it, need to be double-checked for updates. The title is for plant genomics database, while some are not for genome but for proteomic data, for instance.
It would be more helpful that the authors summarize the database not only for genomic data, but also database have multiple omics data.
Just given definitions of many omics data is not so useful. It would be great to see how to integrate multiple omics data in machine learning to solve agriculture questions.
The sentence “Quality traits encompass morphological features like plant height, seed weight” is incorrect. These traits are quantitative traits in the context of quantitative genetics.
By given the definition of supervised, unsupervised machine leaning models, you should also be mentioned in your following example studies, which are supervised or unsupervised.
In the context of genomic selection, the most common used models, rrBLUP or Bayes models should be mentioned and even compared with any machine learning models.
In the studies you mentioned, it would be better to state the predictability in the main text.
Many statements are unclear for me. For instance, “the lack of robust phenotypic data limits the efficient utilization of available genomic information …”. Why the lack of robust phenotypic data, is it from the measurement error or environmental factors?
Many statements need reference paper to support. For instance, “deep learning generated a robust prediction accuracy in grain yield compared to the conventional linear statistical methods…”. The authors need to carefully check the entire manuscripts.

Is the topic of the review discussed comprehensively in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

No
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Artificial intelligence in genetics, Systems biology modelling, Quantitative genetics modelling, Multi-omics big data analysis

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

25

Reviewer Report 29 Sep 2023

Aalt-Jan van Dijk, Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands

Not Approved

https://doi.org/10.5256/f1000research.137728.r205805

A variety of reviews on omics data analysis and machine learning (ML) in the fields of plant science and plant breeding have recently been published. It is unclear where exactly this manuscript positions itself with respect to these existing reviews. ... Continue reading

A variety of reviews on omics data analysis and machine learning (ML) in the fields of plant science and plant breeding have recently been published. It is unclear where exactly this manuscript positions itself with respect to these existing reviews. In particular, the title seems to indicate a strong focus on ML; however, the introduction makes clear that there is a broader view, including review on data sources, “bioinformatics software, tools and packages”, in addition to reviewing the application of ML. This seems like a very broad topic to review, and maybe as a result, various sections of the manuscript are not satisfactory in my opinion:

The description of genomic information in 1.1. seems limited as well as somewhat confusing (“cis-elements, gene expression data, protein interactome, transcriptional and post-transcriptional data“). How can protein interactome be described as “genomic information”?
On the other hand, how about e.g. DNA methylation, histone modifications, transcription factor binding sites etc?
Similarly, the information in Table 1 seems unorganized as well as incomplete. It would help to provide some sub-division with headers of subsections; moreover, it is unclear what criteria are used to include/exclude databases. E.g. what about https://bar.utoronto.ca/thalemine/begin.do, http://plantismash.secondarymetabolites.org/, http://cisbp.ccbr.utoronto.ca/, protein interaction databases, the jaspar database, and various others. Also note that when I tested urls from the table, not all seemed reachable (e.g. chromdb.org, http://mips.gsf.de/).
Sometimes very arbitrary examples are chosen to refer to; e.g. for QTL mapping ref 41 – why would that one specifically be relevant, there would be numerous other QTL mapping studies that could be mentioned?
Figure 2, does not add much compared to figure 1 – it could well be removed or provided as a (SI) table
Some sections in the part on ML are quite disappointing. For example, Table 2 is very brief – there are definitely more examples of ML tools for plant omics data analysis and in fact even some tools mentioned in the manuscript itself (e.g. apoplastp) are not mentioned in the table.
The explanation on ML in section 3.1 is not always to the point. E.g. the sentence “Machine learning algorithms are generally classified into the following categories; supervised, semi-supervised, unsupervised, reinforcement, and deep learning” suggests that deep learning is an alternative type of ML compared to supervised, semi-supervised learning etc, which is not correct. Also the statement “Unsupervised learning is a learning approach focused on statistics…” is unclear. Similarly, Reinforcement Learning is not well explained.
Some of the examples mentioned in the text on ML are also not to the point. For example, ref 93 seems like a randomly chosen example of genomic prediction – there would be many studies that could be mentioned, why would this one be mentioned specifically as ML? Similarly for ref 67, this seems just like a straightforward analysis of omics data – it is again not clear why this is ML?
Relevant other examples of application of ML are missing, including e.g. https://www.pnas.org/doi/10.1073/pnas.1814551116, https://www.nature.com/articles/s41467-021-25893-w, https://onlinelibrary.wiley.com/doi/full/10.1111/tpj.13979
Various statements in the text are not clear (sometimes too strong worded), e.g.:

In the abstract “These threats are now being mitigated through the analyses of omics data like genomics, transcriptomics, proteomics, metabolomics, and phenomics. “ – I don’t think threats are mitigated by data analysis itself.
Section 2.1 “Genome sequence availability has paved the way for identifying all genes and genetic variants associated with agronomics traits” – that seems too simple, just having genome sequence does not mean we known which genes/variatns are associated with traits of interest
Section 3.6: “Finally, a comprehensive plant database must be constructed to facilitate comparative studies” – not clear what this would mean and why this has to be done.

Is the topic of the review discussed comprehensively in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

No
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

No

References

1. Washburn JD, Mejia-Guerra MK, Ramstein G, Kremling KA, et al.: Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence.Proc Natl Acad Sci U S A. 2019; 116 (12): 5542-5549 PubMed Abstract | Publisher Full Text
2. Cheng CY, Li Y, Varala K, Bubert J, et al.: Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships.Nat Commun. 2021; 12 (1): 5627 PubMed Abstract | Publisher Full Text
3. Demirci S, Peters SA, de Ridder D, van Dijk ADJ: DNA sequence and shape are predictive for meiotic crossovers throughout the plant kingdom.Plant J. 2018. PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Machin learning, plant systems biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 04 Nov 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 1 04 Nov 22	read	read	read

Aalt-Jan van Dijk, Wageningen University and Research, Wageningen, The Netherlands
Hao Tong, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
Haixiao Hu, University of California Davis, Davis, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

9 Views

15 Dec 2023 | for Version 1

Haixiao Hu, Plant Sciences, University of California Davis, Davis, California, USA

9 Views Cite this report Responses(0)

Not Approved

The authors endeavored to review the applications of machine learning algorithms in plant omics and agronomic trait improvement. In Section 1, the authors enumerated various genomics databases and identified five subfields within plant omics technologies. In Section 2, the utilization of plant omics technologies across four major research areas was discussed. Section 3, classifying machine learning algorithms into five categories, and highlighted their application across five research areas in sections 3.2 to 3.5.

While acknowledging the importance of the four application research areas for omics technology and machine learning methods, I noted that the authors did not thoroughly describe how each omics technology, individually or collaboratively, contributes to enhancing agronomic traits in plant science. Similarly, regarding machine learning algorithms/methods, the authors omitted detailed discussions on theories, the gradual implementation of these computational technologies in each area, milestone works, and the most successful applications in major plant species. Therefore, in addition to merely listing the areas to which omics technologies and machine learning algorithms can be applied, it is strongly recommended that the authors delve extensively into each application area. This entails a thorough examination of a substantial number of high-quality papers to ensure a profound understanding of how omics and machine learning technologies have significantly impacted each research domain, who are the major contributors in each of these areas, etc. A comprehensive review should not only outline the application areas but also offer insights into how these technologies have reshaped the landscape of plant science within each specific area.

Minor comments:
Concerning Figure 1, it's recommended that the authors specify the source of the published plant genome sequences from 2000 to the present, indicating whether it comprises a full set or a subset of the total plant genomes released each year. Additionally, the inclusion of a y-axis is suggested for clarity.

Regarding Table 1, clarification is sought on the criteria used for selecting the databases listed. Furthermore, the absence of certain important species-specific databases like TAIR, maizeGDB, SoyBase, etc., raises questions about the selection criteria.

In Section 3.2, the statement "Genomic selection is a critical method used in selecting plant species with genetic gains of interest in plant breeding" is deemed not entirely accurate. It is suggested to revise it to "Genomic prediction is a critical method used to select the best individuals with the optimal combination of beneficial alleles in a breeding population."

Is the topic of the review discussed comprehensively in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

Yes
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bioinformatics, Quantitative Genetics, Statistics, Plant Breeding

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

12 Views

24 Nov 2023 | for Version 1

Hao Tong, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany

12 Views Cite this report Responses(0)

Not Approved

The review paper prepared by Isewon et al. tried to summarize how machine learning models applied to agriculture. This manuscript is not well-structured, many sentences are unclear for me. There are many review papers on this topic in different particular focuses, I did not see the novelty this general review paper contributed to this topic.

It is better to begin with why we need to use machine learning models in agriculture in introduction section, instead of using nearly half of manuscripts to summarize the dataset.
The Table 1 needs to be better organized to include what species and omics data. Also, some links are not working when I click it, need to be double-checked for updates. The title is for plant genomics database, while some are not for genome but for proteomic data, for instance.
It would be more helpful that the authors summarize the database not only for genomic data, but also database have multiple omics data.
Just given definitions of many omics data is not so useful. It would be great to see how to integrate multiple omics data in machine learning to solve agriculture questions.
The sentence “Quality traits encompass morphological features like plant height, seed weight” is incorrect. These traits are quantitative traits in the context of quantitative genetics.
By given the definition of supervised, unsupervised machine leaning models, you should also be mentioned in your following example studies, which are supervised or unsupervised.
In the context of genomic selection, the most common used models, rrBLUP or Bayes models should be mentioned and even compared with any machine learning models.
In the studies you mentioned, it would be better to state the predictability in the main text.
Many statements are unclear for me. For instance, “the lack of robust phenotypic data limits the efficient utilization of available genomic information …”. Why the lack of robust phenotypic data, is it from the measurement error or environmental factors?
Many statements need reference paper to support. For instance, “deep learning generated a robust prediction accuracy in grain yield compared to the conventional linear statistical methods…”. The authors need to carefully check the entire manuscripts.

Is the topic of the review discussed comprehensively in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

No
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Artificial intelligence in genetics, Systems biology modelling, Quantitative genetics modelling, Multi-omics big data analysis

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

25 Views

29 Sep 2023 | for Version 1

Aalt-Jan van Dijk, Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands

25 Views Cite this report Responses(0)

Not Approved

A variety of reviews on omics data analysis and machine learning (ML) in the fields of plant science and plant breeding have recently been published. It is unclear where exactly this manuscript positions itself with respect to these existing reviews. In particular, the title seems to indicate a strong focus on ML; however, the introduction makes clear that there is a broader view, including review on data sources, “bioinformatics software, tools and packages”, in addition to reviewing the application of ML. This seems like a very broad topic to review, and maybe as a result, various sections of the manuscript are not satisfactory in my opinion:

The description of genomic information in 1.1. seems limited as well as somewhat confusing (“cis-elements, gene expression data, protein interactome, transcriptional and post-transcriptional data“). How can protein interactome be described as “genomic information”?
On the other hand, how about e.g. DNA methylation, histone modifications, transcription factor binding sites etc?
Similarly, the information in Table 1 seems unorganized as well as incomplete. It would help to provide some sub-division with headers of subsections; moreover, it is unclear what criteria are used to include/exclude databases. E.g. what about https://bar.utoronto.ca/thalemine/begin.do, http://plantismash.secondarymetabolites.org/, http://cisbp.ccbr.utoronto.ca/, protein interaction databases, the jaspar database, and various others. Also note that when I tested urls from the table, not all seemed reachable (e.g. chromdb.org, http://mips.gsf.de/).
Sometimes very arbitrary examples are chosen to refer to; e.g. for QTL mapping ref 41 – why would that one specifically be relevant, there would be numerous other QTL mapping studies that could be mentioned?
Figure 2, does not add much compared to figure 1 – it could well be removed or provided as a (SI) table
Some sections in the part on ML are quite disappointing. For example, Table 2 is very brief – there are definitely more examples of ML tools for plant omics data analysis and in fact even some tools mentioned in the manuscript itself (e.g. apoplastp) are not mentioned in the table.
The explanation on ML in section 3.1 is not always to the point. E.g. the sentence “Machine learning algorithms are generally classified into the following categories; supervised, semi-supervised, unsupervised, reinforcement, and deep learning” suggests that deep learning is an alternative type of ML compared to supervised, semi-supervised learning etc, which is not correct. Also the statement “Unsupervised learning is a learning approach focused on statistics…” is unclear. Similarly, Reinforcement Learning is not well explained.
Some of the examples mentioned in the text on ML are also not to the point. For example, ref 93 seems like a randomly chosen example of genomic prediction – there would be many studies that could be mentioned, why would this one be mentioned specifically as ML? Similarly for ref 67, this seems just like a straightforward analysis of omics data – it is again not clear why this is ML?
Relevant other examples of application of ML are missing, including e.g. https://www.pnas.org/doi/10.1073/pnas.1814551116, https://www.nature.com/articles/s41467-021-25893-w, https://onlinelibrary.wiley.com/doi/full/10.1111/tpj.13979
Various statements in the text are not clear (sometimes too strong worded), e.g.:

In the abstract “These threats are now being mitigated through the analyses of omics data like genomics, transcriptomics, proteomics, metabolomics, and phenomics. “ – I don’t think threats are mitigated by data analysis itself.
Section 2.1 “Genome sequence availability has paved the way for identifying all genes and genetic variants associated with agronomics traits” – that seems too simple, just having genome sequence does not mean we known which genes/variatns are associated with traits of interest
Section 3.6: “Finally, a comprehensive plant database must be constructed to facilitate comparative studies” – not clear what this would mean and why this has to be done.

Is the topic of the review discussed comprehensively in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

No
Is the review written in accessible language?

Yes
Are the conclusions drawn appropriate in the context of the current research literature?

No

References

1. Washburn JD, Mejia-Guerra MK, Ramstein G, Kremling KA, et al.: Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence.Proc Natl Acad Sci U S A. 2019; 116 (12): 5542-5549 PubMed Abstract | Publisher Full Text
2. Cheng CY, Li Y, Varala K, Bubert J, et al.: Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships.Nat Commun. 2021; 12 (1): 5627 PubMed Abstract | Publisher Full Text
3. Demirci S, Peters SA, de Ridder D, van Dijk ADJ: DNA sequence and shape are predictive for meiotic crossovers throughout the plant kingdom.Plant J. 2018. PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Machin learning, plant systems biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. Mousavi-Derazmahalleh M, Bayer PE, Hane JK, et al.: Adapting legume crops to climate change using genomic approaches. Plant Cell Environ. 2019; 42: 6–19. PubMed Abstract | Publisher Full Text

[2] 2. United Nations Convention to Combat Desertification (UNCCD).: Sustainable Development Knowledge Platform:n.d. (accessed August 24, 2020).Reference Source

[3] 3. Zhang Q, Xu M, Xia X, et al.: Crop genetics research in Asia: improving food security and nutrition. Theor. Appl. Genet. 2020; 133: 1339–1344. PubMed Abstract | Publisher Full Text

[4] 4. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000; 408: 796–815. Publisher Full Text

[5] 5. Michael TP, VanBuren R: Progress, challenges and the future of crop genomes. Curr. Opin. Plant Biol. 2015; 24: 71–81. Publisher Full Text

[6] 6. Borevitz JO, Ecker JR: PLANT GENOMICS: The Third Wave. Annu. Rev. Genom. Hum. Genet. 2004; 5: 443–477. PubMed Abstract | Publisher Full Text

[7] 7. Abadi S, Yan WX, Amar D, et al.: A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLoS Comput. Biol. 2017; 13: e1005807. PubMed Abstract | Publisher Full Text

[8] 8. Marri PR, Ye L, Jia Y, et al.:Advances in Sequencing and Resequencing in Crop Plants.Varshney RK, Pandey MK, Chitikineni A, editors. Plant Genetics and Molecular Biology. Cham:Springer International Publishing;2018; pp. 11–35. PubMed Abstract | Publisher Full Text

[9] 9. Akhtar MS, Alaraidh IA, Hakeem KR:Experimental Approaches for Genome Sequencing.Hakeem KR, Shaik NA, Banaganapalli B, et al., editors. Essentials of Bioinformatics, Volume III: In Silico Life Sciences: Agriculture. Cham:Springer International Publishing;2019; pp. 159–165. Publisher Full Text

[10] 10. Huala E, Dickerman AW, Garcia-Hernandez M, et al.: The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 2001; 29: 102–105. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Hu S, Li G, Yang J, et al.: Aquatic Plant Genomics: Advances, Applications, and Prospects. Int. J. Genomics. 2017; 2017: 1–9. PubMed Abstract | Publisher Full Text

[12] 12. Pryer KM, Schneider H, Zimmer EA, et al.: Ann Banks, Deciding among green plants for whole genome studies. Trends Plant Sci. 2002; 7: 550–554. PubMed Abstract | Publisher Full Text

[13] 13. Cooper L, Jaiswal P:The Plant Ontology: A Tool for Plant Genomics.Edwards D, editor. Plant Bioinformatics. New York, NY:Springer New York;2016; pp. 89–114. Publisher Full Text

[14] 14. Isewon IM, Apata OR, Oluwamuyiwa FA, et al.: Machine Learning Algorithms: Their Applications in Plant Omics and Agronomic Traits Improvement.2022. Publisher Full Text

[15] 15. Chen F, Dong W, Zhang J, et al.: The Sequenced Angiosperm Genomes and Genome Databases. Front. Plant Sci. 2018; 9. PubMed Abstract | Publisher Full Text

[16] 16. Rai A, Yamazaki M, Saito K: A new era in plant functional genomics. Curr. Opin. Syst. Biol. 2019; 15: 58–67. Publisher Full Text

[17] 17. Abdurakhmonov IY:Genomics Era for Plants and Crop Species – Advances Made and Needed Tasks Ahead.Abdurakhmonov IY, editor. Plant Genomics. InTech;2016. Publisher Full Text

[18] 18. Shrivastava P, Kumar R: Soil salinity: A serious environmental issue and plant growth promoting bacteria as one of the tools for its alleviation. Saudi J. Biol. Sci. 2015; 22: 123–131. PubMed Abstract | Publisher Full Text

[19] 19. Terryn N, Rouzé P, Montagu MV: Plant genomics. FEBS Lett. 1999; 452: 3–6. Publisher Full Text

[20] 20. Exposito-Alonso M, Drost H-G, Burbano HA, et al.: The Earth BioGenome project: opportunities and challenges for plant genomics and conservation. Plant J. 2020; 102: 222–229. PubMed Abstract | Publisher Full Text

[21] 21. Holtorf H, Guitton M-C, Reski R: Plant functional genomics. Naturwissenschaften. 2002; 89: 235–249. Publisher Full Text

[22] 22. Yang Y, Saand MA, Huang L, et al.: Applications of Multi-Omics Technologies for Crop Improvement. Front. Plant Sci. 2021; 12: 1846. PubMed Abstract | Publisher Full Text

[23] 23. Muthamilarasan M, Singh NK, Prasad M: Multi-omics approaches for strategic improvement of stress tolerance in underutilized crop species: A climate change perspective. Adv. Genet. 2019; 103: 1–38. PubMed Abstract | Publisher Full Text

[24] 24. Yang S, Li H, He H, et al.: Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods. Brief. Bioinformatics. 2019; 20: 274–287. PubMed Abstract | Publisher Full Text

[25] 25. Babar MM, Zaidi N-SS, Pothineni VR, et al.:Application of Bioinformatics and System Biology in Medicinal Plant Studies.Hakeem KR, Malik A, Vardar-Sukan F, et al., editors. Plant Bioinformatics: Decoding the Phyta. Cham:Springer International Publishing;2017; pp. 375–393. Publisher Full Text

[26] 26. Wolfender J-L, Rudaz S, Hae Choi Y, et al.: Plant Metabolomics: From Holistic Data to Relevant Biomarkers. Curr. Med. Chem. 2013; 20: 1056–1090. PubMed Abstract | Publisher Full Text

[27] 27. Kempinski C, Jiang Z, Bell S, et al.:Metabolic Engineering of Higher Plants and Algae for Isoprenoid Production.Schrader J, Bohlmann J, editors. Biotechnology of Isoprenoids. Cham:Springer International Publishing;2015; pp. 161–199. PubMed Abstract | Publisher Full Text

[28] 28. Kumar J, Pratap A, Kumar S:Plant Phenomics: An Overview.Kumar J, Pratap A, Kumar S, editors. Phenomics in Crop Plants: Trends. Springer India, New Delhi:Options and Limitations;2015; pp. 1–10. Publisher Full Text

[29] 29. Tshikunde NM, Mashilo J, Shimelis H, et al.: Agronomic and Physiological Traits, and Associated Quantitative Trait Loci (QTL) Affecting Yield Response in Wheat (Triticum aestivum L.): A Review. Front. Plant Sci. 2019; 10. PubMed Abstract | Publisher Full Text

[30] 30. Fan G, Liu X, Sun S, et al.: The Chromosome Level Genome and Genome-wide Association Study for the Agronomic Traits of Panax Notoginseng. IScience. 2020; 23: 101538. PubMed Abstract | Publisher Full Text

[31] 31. Varshney RK, Bohra A, Yu J, et al.: Designing Future Crops: Genomics-Assisted Breeding Comes of Age. Trends Plant Sci. 2021; 26: 631–649. PubMed Abstract | Publisher Full Text

[32] 32. Varshney RK, Singh VK, Kumar A, et al.: Can genomics deliver climate-change ready crops? Curr. Opin. Plant Biol. 2018; 45: 205–211. PubMed Abstract | Publisher Full Text

[33] 33. Thudi M, Palakurthi R, Schnable JC, et al.: Genomic resources in plant breeding for sustainable agriculture. J. Plant Physiol. 2021; 257: 153351. PubMed Abstract | Publisher Full Text

[34] 34. Zenda T, Liu S, Dong A, et al.: Advances in Cereal Crop Genomics for Resilience under Climate Change. Life (Basel, Switzerland). 2021; 11. PubMed Abstract | Publisher Full Text

[35] 35. Bamba M, Kawaguchi YW, Tsuchimatsu T: Plant adaptation and speciation studied by population genomic approaches. Develop. Growth Differ. 2019; 61: 12–24. PubMed Abstract | Publisher Full Text

[36] 36. Gedil M, Menkir A: An Integrated Molecular and Conventional Breeding Scheme for Enhancing Genetic Gain in Maize in Africa. Front. Plant Sci. 2019; 10: 1430. PubMed Abstract | Publisher Full Text

[37] 37. Jha UC, Sharma KD, Nayyar H, et al.: Breeding and Genomics Interventions for Developing Ascochyta Blight Resistant Grain Legumes. Int. J. Mol. Sci. 2022; 23: 2217. PubMed Abstract | Publisher Full Text

[38] 38. Kole C, Muthamilarasan M, Henry R, et al.: Application of genomics-assisted breeding for generation of climate resilient crops: progress and prospects. Front. Plant Sci. 2015; 6. Publisher Full Text

[39] 39. Hu H, Scheben A, Edwards D: Advances in Integrating Genomics and Bioinformatics in the Plant Breeding Pipeline. Agriculture. 2018; 8: 75. Publisher Full Text

[40] 40. Singh RK, Prasad A, Muthamilarasan M, et al.: Breeding and biotechnological interventions for trait improvement: status and prospects. Planta. 2020; 252: 54. PubMed Abstract | Publisher Full Text

[41] 41. Jia B, Conner RL, Penner WC, et al.: Quantitative Trait Locus Mapping of Marsh Spot Disease Resistance in Cranberry Common Bean (Phaseolus vulgaris L.). Int. J. Mol. Sci. 2022; 23: 7639. PubMed Abstract | Publisher Full Text

[42] 42. Garg V, Agarwal G, Pazhamala LT, et al.: Genome-Wide Identification, Characterization, and Expression Analysis of Small RNA Biogenesis Purveyors Reveal Their Role in Regulation of Biotic Stress Responses in Three Legume Crops. Front. Plant Sci. 2017; 8. PubMed Abstract | Publisher Full Text

[43] 43. Barreto FZ, Rosa JRBF, Balsalobre TWA, et al.: A genome-wide association study identified loci for yield component traits in sugarcane (Saccharum spp.). PLoS One. 2019; 14: e0219843. PubMed Abstract | Publisher Full Text

[44] 44. Sandhu N, Yadav S, Catolos M, et al.: Developing Climate-Resilient, Direct-Seeded, Adapted Multiple-Stress-Tolerant Rice Applying Genomics-Assisted Breeding. Front. Plant Sci. 2021; 12. PubMed Abstract | Publisher Full Text

[45] 45. He Z, Xin Y, Wang C, et al.: Genomics-Assisted Improvement of Super High-Yield Hybrid Rice Variety “Super 1000” for Resistance to Bacterial Blight and Blast Diseases. Front. Plant Sci. 2022; 13: 881244. PubMed Abstract | Publisher Full Text

[46] 46. Sandhu N, Singh J, Singh G, et al.: Development and validation of a novel core set of KASP markers for the traits improving grain yield and adaptability of rice under direct-seeded cultivation conditions. Genomics. 2022; 114: 110269. PubMed Abstract | Publisher Full Text

[47] 47. Bohra A, Saxena KB, Varshney RK, et al.: Genomics-assisted breeding for pigeonpea improvement. Theor. Appl. Genet. 2020; 133: 1721–1737. PubMed Abstract | Publisher Full Text

[48] 48. Mohanty JK, Jha UC, Dixit GP, et al.: Harnessing the hidden allelic diversity of wild Cicer to accelerate genomics-assisted chickpea crop improvement. Mol. Biol. Rep. 2022; 49: 5697–5715. Publisher Full Text

[49] 49. Miedaner T, Boeven ALG-C, Gaikpa DS, et al.: Genomics-Assisted Breeding for Quantitative Disease Resistances in Small-Grain Cereals and Maize. Int. J. Mol. Sci. 2020; 21: E9717. PubMed Abstract | Publisher Full Text

[50] 50. Tao Y, Zhao X, Mace E, et al.: Exploring and Exploiting Pan-genomics for Crop Improvement. Mol. Plant. 2019; 12: 156–169. PubMed Abstract | Publisher Full Text

[51] 51. Muthamilarasan M, Theriappan P, Prasad M: Recent advances in crop genomics for ensuring food security. Curr. Sci. 2013; 105: 155–158.

[52] 52. Gordon SP, Contreras-Moreira B, Woods DP, et al.: Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 2017; 8: 2184. PubMed Abstract | Publisher Full Text

[53] 53. Lin K, Zhang N, Severing EI, et al.: Beyond genomic variation - comparison and functional annotation of three Brassica rapagenomes: a turnip, a rapid cycling and a Chinese cabbage. BMC Genomics. 2014; 15: 250. PubMed Abstract | Publisher Full Text

[54] 54. Golicz AA, Bayer PE, Barker GC, et al.: The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 2016; 7: 13390. PubMed Abstract | Publisher Full Text

[55] 55. Hurgobin B, Golicz AA, Bayer PE, et al.: Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 2018; 16: 1265–1274. PubMed Abstract | Publisher Full Text

[56] 56. Hirsch CN, Foerster JM, Johnson JM, et al.: Insights into the Maize Pan-Genome and Pan-Transcriptome. Plant Cell. 2014; 26: 121–135. PubMed Abstract | Publisher Full Text

[57] 57. Wang W, Mauleon R, Hu Z, et al.: Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018; 557: 43–49. PubMed Abstract | Publisher Full Text

[58] 58. Zhao Q, Feng Q, Lu H, et al.: Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 2018; 50: 278–284. PubMed Abstract | Publisher Full Text

[59] 59. Li Y, Zhou G, Ma J, et al.: De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 2014; 32: 1045–1052. PubMed Abstract | Publisher Full Text

[60] 60. Montenegro JD, Golicz AA, Bayer PE, et al.: The pangenome of hexaploid bread wheat. Plant J. 2017; 90: 1007–1013. PubMed Abstract | Publisher Full Text

[61] 61. Neeta L, Golicz A, Singh M, et al.: Genome-wide analysis of the Hsf gene family in Brassica oleracea and a comparative analysis of the Hsf gene family in B. oleracea, B. rapa and B. napus. Funct. Integr. Genomics. 2019; 19: 515–531. PubMed Abstract | Publisher Full Text

[62] 62. Ndjiondjop MN, Alachiotis N, Pavlidis P, et al.: Comparisons of molecular diversity indices, selective sweeps and population structure of African rice with its wild progenitor and Asian rice. Theor. Appl. Genet. 2019; 132: 1145–1158. PubMed Abstract | Publisher Full Text

[63] 63. Nevo E: Ecological genomics of natural plant populations: the Israeli perspective. Methods Mol. Biol. 2009; 513: 321–344. PubMed Abstract | Publisher Full Text

[64] 64. Olatoye MO, Hu Z, Maina F, et al.: Genomic Signatures of Adaptation to a Precipitation Gradient in Nigerian Sorghum. G3 (Bethesda). 2018; 8: 3269–3281. PubMed Abstract | Publisher Full Text

[65] 65. Juliana P, Poland J, Huerta-Espino J, et al.: Improving grain yield, stress resilience and quality of bread wheat using large-scale genomics. Nat. Genet. 2019; 51: 1530–1539. PubMed Abstract | Publisher Full Text

[66] 66. Ricroch AE:What Will Be the Benefits of Biotech Wheat for European Agriculture?Bhalla PL, Singh MB, editors. Wheat Biotechnology: Methods and Protocols. New York, NY:Springer;2017; pp. 25–35. Publisher Full Text

[67] 67. Sanz-Carbonell A, Marques MC, Bustamante A, et al.: Inferring the regulatory network of the miRNA-mediated response to biotic and abiotic stress in melon. BMC Plant Biol. 2019; 19: 78. PubMed Abstract | Publisher Full Text

[68] 68. Mochida K, Shinozaki K: Genomics and Bioinformatics Resources for Crop Improvement. Plant Cell Physiol. 2010; 51: 497–523. PubMed Abstract | Publisher Full Text

[69] 69. Katuuramu DN, Branham SE, Levi A, et al.: Genome-Wide Association Analysis of Resistance to Pseudoperonospora cubensis in Citron Watermelon. Plant Dis. 2022; 106: 1952–1958. PubMed Abstract | Publisher Full Text

[70] 70. Teixidor-Toneu I, Jordan FM, Hawkins JA: Comparative phylogenetic methods and the cultural evolution of medicinal plant use. Nat Plants. 2018; 4: 754–761. Publisher Full Text

[71] 71. Guidini R, Jahani M, Huang K, et al.: Genome Wide Association Mapping in Sunflower (Helianthus annuus L.) reveals Common Loci and Putative Candidate Genes for resistance to Diaporthe gulyae and D. helianthi causing Phomopsis Stem Canker. Plant Dis. 2022. Publisher Full Text

[72] 72. Chakradhar T, Hindu V, Reddy PS: Genomic-based-breeding tools for tropical maize improvement. Genetica. 2017; 145: 525–539. PubMed Abstract | Publisher Full Text

[73] 73. Sun S, Wang C, Ding H, et al.: Machine learning and its applications in plant molecular studies. Brief. Funct. Genomics. 2020; 19: 40–48. Publisher Full Text

[74] 74. Alzubi J, Nayyar A, Kumar A: Machine Learning from Theory to Algorithms: An Overview. J. Phys. Conf. Ser. 2018; 1142: 012012. Publisher Full Text

[75] 75. Carbonell JG, Michalski RS, Mitchell TM:1 - AN OVERVIEW OF MACHINE LEARNING.Michalski RS, Carbonell JG, Mitchell TM, editors. Machine Learning. San Francisco (CA):Morgan Kaufmann;1983; pp. 3–23. Publisher Full Text

[76] 76. Matukumalli LK, Grefenstette JJ, Hyten DL, et al.: Application of machine learning in SNP discovery. BMC Bioinformatics. 2006; 7: 4. PubMed Abstract | Publisher Full Text

[77] 77. Esposito S, Carputo D, Cardi T, et al.: Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding. Plants. 2020; 9: 34. PubMed Abstract | Publisher Full Text

[78] 78. Zhang N, Rao RSP, Salvato F, et al.: MU-LOC: A Machine-Learning Method for Predicting Mitochondrially Localized Proteins in Plants. Front. Plant Sci. 2018; 9. PubMed Abstract | Publisher Full Text

[79] 79. Kushwaha SK, Chauhan P, Hedlund K, et al.: NBSPred: a support vector machine-based high-throughput pipeline for plant resistance protein NBSLRR prediction. Bioinformatics. 2016; 32: 1223–1225. PubMed Abstract | Publisher Full Text

[80] 80. Carvalho TFM, Silva JCF, Calil IP, et al.: Rama: a machine learning approach for ribosomal protein prediction in plants. Sci. Rep. 2017; 7: 16273. PubMed Abstract | Publisher Full Text

[81] 81. Pal T, Jaiswal V, Chauhan R: DRPPP: A machine learning based tool for prediction of disease resistance proteins in plants. Comput. Biol. Med. 2016; 78: 42–48. PubMed Abstract | Publisher Full Text

[82] 82. Silva JC, Carvalho T, Fontes E, et al.: Fangorn Forest (F2): a machine learning approach to classify genes and genera in the family Geminiviridae. BMC Bioinformatics. 2017; 18: 431. PubMed Abstract | Publisher Full Text

[83] 83. Ma C, Xin M, Feldmann KA, et al.: Machine Learning-Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis. Plant Cell. 2014; 26: 520–537. PubMed Abstract | Publisher Full Text

[84] 84. van Dijk ADJ , Kootstra G, Kruijer W, et al.: Machine learning in plant science and plant breeding. IScience. 2021; 24: 101890. PubMed Abstract | Publisher Full Text

[85] 85. Niazian M, Niedbała G: Machine Learning for Plant Breeding and Biotechnology. Agriculture. 2020; 10: 436. Publisher Full Text

[86] 86. Guzzetta G, Jurman G, Furlanello C: A machine learning pipeline for quantitative phenotype prediction from genotype data. BMC Bioinformatics. 2010; 11: S3. PubMed Abstract | Publisher Full Text

[87] 87. Crossa J, Pérez-Rodríguez P, Cuevas J, et al.: Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017; 22: 961–975. PubMed Abstract | Publisher Full Text

[88] 88. Peixoto LA, Bhering LL, Cruz CD: Artificial neural networks reveal efficiency in genetic value prediction. Genet. Mol. Res. 2015; 14: 6796–6807. PubMed Abstract | Publisher Full Text

[89] 89. Corrêa AM, Teodoro PE, Gonçalves MC, et al.: Artificial intelligence in the selection of common bean genotypes with high phenotypic stability. Genet. Mol. Res. 2016; 15. Publisher Full Text

[90] 90. Montesinos-López OA, Montesinos-López A, Tuberosa R, et al.: Multi-Trait, Multi-Environment Genomic Prediction of Durum Wheat With Genomic Best Linear Unbiased Predictor and Deep Learning Methods. Front. Plant Sci. 2019; 10. Publisher Full Text

[91] 91. González-Camacho JM, Ornella L, Pérez-Rodríguez P, et al.: Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance. Plant Genome. 2018; 11: 170104. PubMed Abstract | Publisher Full Text

[92] 92. Zhao BG, Rewald B: Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits. Front. Plant Sci. 2016; 7. Publisher Full Text

[93] 93. Endelman JB, Carley CAS, Bethke PC, et al.: Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato. Genetics. 2018; 209: 77–87. PubMed Abstract | Publisher Full Text

[94] 94. Selvaraj MG, Valderrama M, Guzman D, et al.: Machine learning for high-throughput field phenotyping and image processing provides insight into the association of above and below-ground traits in cassava (Manihot esculenta Crantz). Plant Methods. 2020; 16: 87. PubMed Abstract | Publisher Full Text

[95] 95. Dobbels AA, Lorenz AJ: Soybean iron deficiency chlorosis high throughput phenotyping using an unmanned aircraft system. Plant Methods. 2019; 15: 97. PubMed Abstract | Publisher Full Text

[96] 96. Hasan MM, Chopin JP, Laga H, et al.: Detection and analysis of wheat spikes using Convolutional Neural Networks. Plant Methods. 2018; 14: 100. PubMed Abstract | Publisher Full Text

[97] 97. de Carvalho RRB , Marmolejo Cortes DF, Sousa MBE, et al.: Image-based phenotyping of cassava roots for diversity studies and carotenoids prediction. PLoS One. 2022; 17: e0263326. PubMed Abstract | Publisher Full Text

[98] 98. Sprenger H, Erban A, Seddig S, et al.: Metabolite and transcript markers for the prediction of potato drought tolerance. Plant Biotechnol. J. 2018; 16: 939–950. PubMed Abstract | Publisher Full Text

[99] 99. Parmley KA, Higgins RH, Ganapathysubramanian B, et al.: Machine Learning Approach for Prescriptive Plant Breeding. Sci. Rep. 2019; 9: 17132. PubMed Abstract | Publisher Full Text

[100] 100. Fuentes A, Yoon S, Kim SC, et al.: A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition. Sensors (Basel). 2017; 17. PubMed Abstract | Publisher Full Text

[101] 101. Danilevicz MF, Gill M, Anderson R, et al.: Plant Genotype to Phenotype Prediction Using Machine Learning. Front. Genet. 2022; 13: 822173. PubMed Abstract | Publisher Full Text

[102] 102. Simón D, Borsani O, Filippi CV: RFPDR: a random forest approach for plant disease resistance protein prediction. PeerJ. 2022; 10: e11683. PubMed Abstract | Publisher Full Text

[103] 103. Sladojevic S, Arsenovic M, Anderla A, et al.: Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification. Comput. Intell. Neurosci. 2016; 2016: 1–11. PubMed Abstract | Publisher Full Text

[104] 104. Wang Q, Qi F, Sun M, et al.: Identification of Tomato Disease Types and Detection of Infected Areas Based on Deep Convolutional Neural Networks and Object Detection Techniques. Comput. Intell. Neurosci. 2019; 2019: 9142715–9142753. PubMed Abstract | Publisher Full Text

[105] 105. Xiao Q, Li W, Kai Y, et al.: Occurrence prediction of pests and diseases in cotton on the basis of weather factors by long short term memory network. BMC Bioinformatics. 2019; 20: 688. PubMed Abstract | Publisher Full Text

[106] 106. Li D, Wang R, Xie C, et al.: A Recognition Method for Rice Plant Diseases and Pests Video Detection Based on Deep Convolutional Neural Network. Sensors (Basel). 2020; 20. PubMed Abstract | Publisher Full Text

[107] 107. Liang W-J, Zhang H, Zhang G-F, et al.: Rice Blast Disease Recognition Using a Deep Convolutional Neural Network. Sci. Rep. 2019; 9: 2869. PubMed Abstract | Publisher Full Text

[108] 108. Nettleton DF, Katsantonis D, Kalaitzidis A, et al.: Predicting rice blast disease: machine learning versus process-based models. BMC Bioinformatics. 2019; 20: 514. PubMed Abstract | Publisher Full Text

[109] 109. Oppenheim D, Shani G, Erlich O, et al.: Using Deep Learning for Image-Based Potato Tuber Disease Detection. Phytopathology. 2019; 109: 1083–1087. Publisher Full Text

[110] 110. Kaundal R, Kapoor AS, Raghava GPS: Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinformatics. 2006; 7: 485. PubMed Abstract | Publisher Full Text

[111] 111. Kuska MT, Behmann J, Großkinsky DK, et al.: Screening of Barley Resistance Against Powdery Mildew by Simultaneous High-Throughput Enzyme Activity Signature Profiling and Multispectral Imaging. Front. Plant Sci. 2018; 9: 1074. Publisher Full Text

[112] 112. Sperschneider J, Dodds PN, Singh KB, et al.: ApoplastP: prediction of effectors and plant proteins in the apoplast using machine learning. New Phytol. 2018; 217: 1764–1778. PubMed Abstract | Publisher Full Text

[113] 113. Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, et al.: A review of deep learning applications for genomic selection. BMC Genomics. 2021; 22: 19. PubMed Abstract | Publisher Full Text

Machine learning algorithms: their applications in plant omics and agronomic traits’ improvement

Abstract

Keywords

1. Introduction

1.1 Plant genome sequences and bioinformatics resources

Figure 1. Published plant genome sequences from 2000 to date.

Figure 2. Percentage of sequenced plants with their common names.

Table 1. General plant genomics databases and tools.

1.2 Plant omics technologies

Figure 3. Important plant omics branches and their major techniques.

2. Major areas of application of omics technologies for agronomic traits improvement

2.1 Genomics-assisted pre-breeding

2.2 Evolution and crop diversity

2.3 Abiotic and biotic stresses

2.4 Population studies

3. Applications of machine learning in plant omics and agronomics

3.1 Machine learning algorithms and resources

Table 2. Existing machine learning tools for plant omics data analysis.

3.2 Precision plant breeding

3.3 Phenomics

3.4 Stress resilience phenotyping

3.5 Plant–pathogen interaction and diseases prediction

3.6 Challenges and future outlook

4. Conclusion

Data availably statement

Extended data

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated