Population structure of Salmonella enterica serotype Mbandaka reveals similar virulence potential irrespective of source and phylogenomic stratification

Background: Salmonella enterica serotype Mbandaka ( Salmonella ser. Mbandaka) is a multi-host adapted Non-typhoidal Salmonella (NTS) that can cause foodborne illnesses in human. Outbreaks of Salmonella ser. Mbandaka contributed to the economic stress caused by NTS due to hospitalizations. Whole genome sequencing (WGS)-based phylogenomic analysis facilitates better understanding of the genomic features that may expedite the foodborne spread of Salmonella ser. Mbandaka. Methods: In the present study, we define the population structure, antimicrobial resistance (AMR), and virulence profile of Salmonella ser. Mbandaka using WGS data of more than 400 isolates collected from different parts of the world. We validated the genotypic prediction of AMR and virulence phenotypically using an available set of representative isolates. Results: Phylogenetic analysis of Salmonella ser. Mbandaka using Bayesian approaches revealed clustering of the population into two major groups; however, clustering of these groups and their subgroups showed no pattern based on the host or geographical origin. Instead, we found a uniform virulence gene repertoire in all isolates. Phenotypic analysis on a representative set of isolates showed a similar trend in cell invasion behavior and adaptation to a low pH environment. Both genotypic and phenotypic analysis revealed the carriage of multidrug resistance (MDR) genes in Salmonella ser. Mbandaka. Conclusions: Overall, our results show that the presence of multidrug resistance along with adaptation to broad range of hosts and uniformity in the virulence potential, isolates of Salmonella ser. Mbandaka from any source could have the potential to cause foodborne outbreaks as well as AMR dissemination.


Introduction
For the last two decades, Salmonella has remained the major foodborne pathogen in the U.S. according to the Centers for Disease Control and Prevention (CDC) 1 . Non-typhoidal Salmonella (NTS) is estimated to cause 1 million foodborne illnesses annually 2 . Salmonella enterica subspecies enterica serotype Mbandaka (Salmonella ser. Mbandaka) has been identified by the CDC as an important outbreak-causing serotype of Salmonella 3 . Classified as one of the top ten Salmonella serotypes that cause human foodborne illnesses in Europe, a clonal population of Salmonella ser. Mbandaka (sequence type ST413) has been shown to be capable of surviving for many years and associated with animal feed, poultry, and human food 4 . Following its initial isolation from the Belgian Congo (Central Africa) in 1948, Salmonella ser. Mbandaka has been reported as a cause of human salmonellosis in several countries, making this serotype globally important for human and animal health [4][5][6] .
Human foodborne illnesses caused by Salmonella ser. Mbandaka have rarely been reported in the U.S.; nevertheless, three multistate outbreaks were reported by the CDC between 2013 and 2018 7-9 . Based on annually compiled data from several sources, Hayward et al. reported that cattle, poultry, and pigs are the major hosts of Salmonella ser. Mbandaka 10 . However, the sources of two of the abovementioned outbreaks were from food preparations 7,9 , indicating the spread of this multi-host adapted serotype by other means.
Although Salmonella ser. Mbandaka has been involved in several multi-serotype comparative studies, its epidemiological and evolutionary characteristics are not well understood. Previous studies on Salmonella ser. Mbandaka have focused on either a very small number of isolates 11 or isolates from a specific geographical location 4 . Comparative genomic analysis has been widely used as a powerful tool for elucidating genomic diversity across Salmonella serotypes as well as epidemiological investigation of outbreaks [12][13][14] . In the present study, we defined the population structure and associated genotypic features of Salmonella ser. Mbandaka in a global context using whole genome sequence (WGS)-based analysis of 403 Salmonella ser. Mbandaka genomes. We assessed the antimicrobial resistance (AMR) and virulence gene repertoire of this Salmonella serotype to understand the potential capability of this serovar to act as an important zoonotic pathogen and public health hazard. To verify the genotypic prediction of the AMR and virulence, we examined these characteristics phenotypically using an available set of isolates that represented the study population of 403 Salmonella ser. Mbandaka isolates.

Results
WGS-based analysis identifies ST413 as the most common Salmonella ser. Mbandaka sequence type To define the phylogenomic characteristics of Salmonella ser. Mbandaka, we used genome sequence data previously deposited in the National Center for Biotechnology Information (NCBI) sequence read archive (SRA) in conjunction with our newly sequenced genomes. To ascertain their serotype as Salmonella ser. Mbandaka with an antigenic formula of z10 e,n,z15, we performed in silico genoserotyping of all genomes using the web-accessible tool Salmonella In Silico Typing Resource (SISTR) 15 . Samples that showed a discrepancy in their serotype between the available metadata and our serotyping results were removed from further study. This resulted in a final set of 403 Salmonella ser. Mbandaka genomes, of which 66 were newly sequenced and the remaining 337 were accessed from the NCBI-SRA. Based on their isolation sources, we grouped them into 12 different categories (Table 1; Supplementary Table 1, Underlying data 16 ).
We performed core genome multilocus sequence typing (cgMLST) analysis of the genomes using the web tool 'SISTR' and identified five different sequence types (STs) in Salmonella ser. Mbandaka. ST413 was the most common type (93% of the isolates) followed by ST1602 (5% of the isolates) (Supplementary Table 2, Underlying data 16 ). ST413 showed a global prevalence, while others were limited to certain geographical areas. For example, ST1602, ST2238, and ST2444, were found only in European or Asian isolates, while ST2404 was found only in North American isolates (Supplementary Figure 1, Extended data 17 ). A similar trend was observed in the case of isolation sources of these STs. ST143 showed a wide distribution of isolation sources, while a narrower source distribution was found in other STs of this serotype (Supplementary Figure 1, Extended data 17 ).
Phylogenomic analysis of Salmonella ser. Mbandaka shows biphasic clustering in its population structure We hypothesized that the population structure of Salmonella ser. Mbandaka may have host-specific clades. To test our hypothesis, we constructed a core genome phylogeny and elucidated the pan-genome of the serotype. Figure 1 illustrates the phylogeny of the serotype as predicted by Bayesian evolutionary analysis sampling trees (BEAST). The results show that the isolates bifurcate into two major groups ( Figure 1A). Contrary to our hypothesis, no major host-specific clades were identified in this analysis. The constituent genomes of groups 1 and 2 did not show any clear pattern with respect to their isolation source, geographical origin, or date of isolation (Supplementary Figure 1, Extended data 17 ). We generated a pairwise single nucleotide polymorphism (SNP) distance matrix from the core gene alignment of 403 genomes. Interestingly, hierarchal clustering of these genomes based on the Pearson correlation showed the same biphasic clustering of the genomes (Figure 2), further supporting the core genome phylogeny. The pairwise SNP difference between the members within a group (either group 1 or group 2) differed by a median of 46 SNPs in both groups. However, the difference was large between group 1 and group 2 members, with a median of 293 SNPs ( Figure 2).
Based on the clustering pattern in the Markov chain Monte Carlo (MCMC) tree, we divided each group into different subgroups and determined the distribution of isolates according to their origin and isolation source. We found that all subgroups contained isolates from multiple sources ( Figure 1A); however, there was a closer association of food isolates with Asian countries and human isolates with Europe. These associations were found in both groups 1 and 2. Pan-genome analysis of Salmonella ser. Mbandaka revealed a core genome size of 4,044 genes and a pan-genome size of ~ 13,000 genes. The core genes that represented 30% of the pan-genome were found in ≥ 99% of the genomes analyzed; however, the cloud genes i.e., those present in only < 15% of the total genomes analyzed, represented the major share (65%) of the Salmonella ser. Mbandaka pan-genome ( Figure 1B).

Genotypic and phenotypic screening for antimicrobial resistance (AMR) genes
The presence of antimicrobial-resistant pathogenic bacteria in food has been addressed as a direct hazard to public health 18 .
We determined the AMR profile of Salmonella ser. Mbandaka at both the phenotypic and genomic levels.  Table 3, Underlying data 16 ). These genes were related to resistance against 12 classes of antibiotics. Most resistance was found against tetracycline (16.87% genomes), followed by aminoglycosides (13.89%), sulfonamide (8.4%), QAC (6.9%), trimethoprim (5.7%), and quinolone    , which confers resistance to the tetracycline group of antibiotics, was most abundant and found in 10.66% of the genomes. This was followed by two aminoglycoside resistance genes aph(6)-Id and aph(3'')-Ib, which had a percentage occurrence of 8.93% and 8.68%, respectively. We identified isolates AMR genes were predicted from the genomic assemblies and categorized into different groups based on the antibiotic class, as shown in the legend. A cladogram of the MCC tree rooted to outgroup (KY1 and ALT1) is shown at the center. Tree branches are colored to visualize the two major groups. The first circular layer immediately around the tree shows the presence (colored triangle) or absence (no color) of genes resistant to the corresponding antibiotic class. The next two circular layers represent the types of point mutations (pmrB T147B -blue, gyrA D87Y/S83Y -red) identified in Salmonella ser. Mbandaka genomes. Newly sequenced isolates (n = 66) formed a representative dataset in the context of their phylogenetic location. The phenotypic resistance profiles of these representative isolates against 12 different antibiotics are depicted in the next circular layer. This is followed by an outermost circular layer that shows the presence (dark) and absence (light) matrix of 26 plasmid replicons in the analyzed genome assemblies. The isolates that showed a match to both phenotypic and genotypic prediction of AMR are marked with a dark color for the tree leaf node. carrying resistance genes against quinolones, lincosamide, bleomycin, and rifampin. Thirty-six genomes (8.9%) were found to carry genes conferring resistance against ≥ 3 classes of antimicrobial agents. A total of five quinolone resistance genes (qnrB1, qnrB19, qnrB6, qnrB9, and qnrS1) were predicted in 14 isolates. Of the 403 isolates, only six (1.5%) carried genes conferring resistance to β-lactam antimicrobials. The blaCMY-2, blaTEM-1, and blaLAP-2 genes were found in three, two, and one genome(s), respectively. These isolates were distributed in the bovine, porcine, food, and human categories of sources and were from North America, Europe and Asia.
We used a representative set of 66 isolates to perform a phenotypic level antibiotic sensitivity assay using a panel of 12 antibiotics [Sensititre™ Gram negative plate (CMV3AGNF, ThermoFisher)]. More than 60% of the isolates showed resistance to at least one antibiotic ( Figure 3). Most resistance was observed against streptomycin (41 isolates, 62%), followed by tetracycline (six isolates, 9%). There were isolates that showed resistance (intermediate) against cefoxitin, chloramphenicol, and sulfonamide; however, no resistance was found against quinolones in these 66 tested isolates (Supplementary  Table 4, Underlying data 16 ).
Comparison of genotypic predictions with phenotypic susceptibility results found 100% sensitivity for genotypic prediction of phenotypic resistance to nine of 12 antimicrobials, with a specificity of ≥ 95% for all antimicrobials tested ( Table 2). Disagreement was found in 49 (6.2%) of a possible 792 resistance/ susceptibility combinations of 12 antimicrobials. The overall specificity was 99%, with that for streptomycin being 100%.
We subjected all 403 genomes of Salmonella ser. Mbandaka to screening of plasmid replicons and point mutations that may confer drug resistance. A total of 26 different plasmids were predicted in our analysis ( Figure 3). With the highest abundance, the IncHI2 and IncHI2A plasmids were predicted in 11.16% of the genomes (Supplementary Figure 4, Extended data 17 ; Supplementary Host cell invasion and survival in the acidic environment of phagosomes are two critical steps in Salmonella pathogenicity; therefore, we used these as surrogate measurements of  Table 10, Underlying data 16 ). All 66 tested isolates showed an increase in their growth after three and six hours of exposure to an acidic environment without any prior adaptation. Taken together, the genomic and phenotypic results show the potential virulence capability of Salmonella ser. Mbandaka isolates, irrespective of their isolation source, geographical location, and population structure.

Discussion
In Salmonella enterica species, host specificity and the ability to cause disease in different hosts are serotype-dependent 22 . Some serotypes are "host-restricted," that is, they are only able to infect one specific host 23 ; others such as Salmonella ser.
Mbandaka have a broad host range. In addition to humans and farm animals, the main sources of Salmonella ser. Mbandaka are dogs, wild birds, and fish. According to outbreak investigation reports, Salmonella ser. Mbandaka can originate from both live animals 8,24 and processed food 7,9 . Geographically, as well as host distribution wise, ST413 was the most prevalent sequence type in Salmonella ser. Mbandaka. Association of ST413 with sources such as animal feed, livestock, food, and humans aids its survival in the food chain and renders Salmonella ser. Mbandaka a serious threat for foodborne outbreaks. Unlike other serotypes 25 , we could not find any specific pattern in the Salmonella ser. Mbandaka population structure in relation to either geographical origin or isolation source, disproving our hypothesis that hostspecific clades may be emerging in this serotype ( Figure 1A). We presume that this was not due to sampling bias, since a similar clustering pattern was observed in a previous study 4 based on pulsed-field gel electrophoresis (PFGE) profiles of a smaller number (n = 70) of Salmonella ser. Mbandaka isolates from a geographically restricted area.
Antibiotic use in agricultural settings and animal-based food production provides major contributions to the overall problem of antibiotic resistance 26 . Due to the widespread use of antimicrobial agents in livestock farming, resistant Salmonella strains are more frequently found in animals used for food 22,27 . WGS is an excellent technique for the prediction of AMR due to its fast turnaround and affordability. It has been proven to be a successful method for genotypic AMR prediction in several gastrointestinal pathogens including Salmonella 28-31 . Our study also shows a high sensitivity and specificity for the comparison of genotypic prediction using WGS with phenotypic resistance to nine antimicrobials out of the 12 tested. There was a high discrepancy in the case of streptomycin resistance, since we found 38 isolates that were phenotypically resistant but genotypically susceptible. There could be two reasons for this. Firstly, we used a low minimum inhibitory concentration (MIC) breakpoint of ≥ 16, since there is not a precise clinical breakpoint for streptomycin susceptibility in Salmonella 32 . Secondly, there may exist unknown resistance mechanisms or resistance determinants, that may be absent in the reference database used for prediction 28 .
Plasmids are one of the main vehicles for dissemination of AMR genes. Resistance genes are assembled on plasmids as arrays by transposition and site-specific recombination mechanisms 33 . For example, the AMR genes blaTEM, tetA, tetB, and tetC have been found to be associated with plasmids in Salmonella ser. Typhimurium. Acquisition of plasmids is not a universal phenomenon in all Salmonella enterica subspecies enterica serotypes. There are many serotypes in this subspecies that do not possess any plasmids 34 . Above all, animals used for food and food products have been reported as major sources of AMR plasmids 35 . We predicted 26 different plasmids in 403 draft genomes, of which incompatibility group HI2 (IncHI2) plasmids were the most predominantly identified in our analysis (Figure 3; Supplementary Figure 4, Extended data 17 ; Supplementary Table  5, Underlying data 16 ). The presence of IncHI2 plasmids in antibiotic-resistant Salmonella, as well as their association with MDR in Salmonella, has been reported previously 36,37 . In addition, we found the presence of chromosomal mutations associated with quinolone resistance 38,39 (gyrA D87Y and S83Y) and resistance to colistin 40 (pmrB), an antibiotic that can be used against carbapenemase-producing Enterobacteriaceae as a last-resort treatment option 41 . This indicates the potential capability of these mobile genetic elements to spread AMR among Salmonella.  Table 7, Underlying data 16 ). In addition to virulence factors, we also made use of WGS-based genotypic prediction to elucidate the distribution of pathogenicity islands, large distinct regions on chromosomes that contain virulence genes 42 . Of the 23 previously reported SPIs in Salmonella 10,20 , seven were detected in our study isolates, including C63PI (Figure 4; Supplementary  16 ).

Prediction of virulence determinants in
Genome sequence data analysis Sequencing reads from the Salmonella isolates used in the present study (Salmonella ser. Mbandaka n = 403, Salmonella ser. Kentucky n = 1, and Salmonella ser. Altona n = 1) were assembled into contigs using SPAdes v.3.0 44 . To ensure that the assemblies were not greatly fragmented, those containing more than 500 contigs (minimum contig length: 200 bp) were removed from the analysis. In silico serotyping and multilocus sequence typing (MLST) of all genomes were performed using the SISTR webserver 15 . Annotation of genomes was performed using Prokka v1.12 45 . A genus-specific database for Salmonella was created using a manually annotated reference genome assembly of Salmonella enterica ser. Typhimurium str. LT2 (RefSeq assembly accession: GCF_000006945.2) and formatted to a Prokka database as described elsewhere (https://github.com/tseemann/prokka). Prokaryote pan-genome analysis pipeline Roary v.3.12.0 46 was used for pan-genome analysis and the generation of a multi-FASTA alignment of core genes from the isolates using the aligner PRANK 47 . The software SNP-sites v2.4.0 48 was used to format the core gene alignment output from Roary to remove gaps and N characters (suitable format for BEAST2 phylogeny). A pairwise SNP distance matrix was created using snp-dists v0.6.3.
To infer the phylogenetic relationship of Salmonella ser. Mbandaka isolates, we used the Bayesian maximum clade credibility approach. For this purpose, we used the BEAST2 (v.2.5.1) platform, which employs the MCMC 49,50 method for phylogenetic tree inference. We performed a model testing of the alignment of all SNPs using 'ModelTest-NG.v.0.1.5' and generated a maximum likelihood tree using a generalized time-reversible (GTR+I+G4) substitution model. This tree was constructed to infer the relationship between genetic divergence and time using Tempest 51 . The resulting phylogeny provided a weak temporal signal (R < 0.10); therefore, tip dates were not included in the final BEAST2 phylogeny. Multiple analyses using both relaxed clock (Log normal) and strict clock models were carried out in combination with coalescent constant population for priors. A MCMC chain length of 100 million generations with 10% preburnin and sampling at every 1000 generations were used for each analysis. The traces from each analysis were examined using Tracer v.1.7.1 52 and the strict clock coalescent constant population model, which showed a better convergence and > 100 effective sample sizes (ESSs) for many of the traces, was selected as a best-fit model for our dataset. Information from 100,000 sample trees produced by BEAST2, after removing 10% burnin, was summarized to a final target MCC tree using TreeAnnotator v2.5.1. We used two other Salmonella serotypes [Salmonella ser. Kentucky (KY1) and Salmonella ser. Altona (ALT1)] as outgroups, since said serotypes were identified as the nearest neighbors to Salmonella ser. Mbandaka by SISTR 15 .
Antimicrobial resistance (AMR) and virulence gene homologs were identified in assemblies using ABRicate. A minimum sequence identity of 95% and a coverage of 60% were used against the NCBI Bacterial Antimicrobial Resistance Reference Gene Database and Virulence Factor Database (VFDB) for AMR gene prediction and virulence gene profiling, respectively. Plasmid-Finder v.2.0 (Center for Genomic Epidemiology) and SPIFinder v.1.0 (Center for Genomic Epidemiology) were used for screening plasmid replicons and Salmonella pathogenicity islands (SPIs) in the genome assemblies 53 . Point mutations were identified using the CLC Genomics Workbench v.12 (Qiagen) after downloading the PointFinder database for Salmonella (accessed on December 2018 ). An open source alternative for finding point mutation is ResFinder 4.0 offered by the Center for Genomic Epidemiology.
Antibiotic sensitivity assay Susceptibility to 12 antimicrobial agents was determined for 66 Salmonella ser. Mbandaka isolates using Sensititre™ Gram negative plates (CMV3AGNF, ThermoFisher). Resistance to antimicrobial agents was determined in accordance with the Clinical and Laboratory Standards Institute (CLSI) and National Antimicrobial Resistance Monitoring System (NARMS) guidelines. Five beta lactams (amoxicillin/clavulanic acid, ampicillin, cefoxitin, ceftiofur, and ceftriaxone), two quinolones (ciprofloxacin and nalidixic acid), two aminoglycosides (gentamicin and streptomycin), trimethoprim/sulfamethoxazole, and chloramphenicol were the antibiotics included in the screening panel. Isolates reported as displaying intermediate resistance to any antimicrobials were also considered as resistant.
Caco2 cell culture and invasion assay Human colorectal adenocarcinoma (Caco2) cells obtained from ATCC were used for the cell invasion assay. Cells were seeded onto a 24-well plate at a density of 0.3 × 10 5 /well and were grown in DMEM medium containing glutamine (DMEM (1×) + glutamine; Gibco) supplemented with 10% (v/v) FBS and 1% antibiotic and antimycotic solution at 37°C with 5% CO 2 (v/v) for 48 hours. Overnight cultures of bacteria were used to infect Caco2 cell monolayers at a multiplicity of infection (MOI) of 1:100. The invasion assay described by Lee et al. 54 was used with some modifications. Briefly, bacteria were resuspended in cell culture medium (DMEM (1×) + glutamine; Gibco, with no supplementation) after washing with sterile PBS. Cells were subsequently incubated with this bacterial suspension for 2 hours at 37°C with 5% CO 2 . After infection, the media was removed, and the cells were washed with sterile PBS. Cells were then treated with 400 µL DMEM supplemented with the antibiotic gentamicin (100 µg/mL) to kill extracellular non-invading bacteria. Plates were incubated for 1 hour at 37°C with 5% CO 2 followed by washing with sterile PBS. A colony forming unit (CFU) count of intracellular bacteria was taken using serial dilution and plating. Cells were lysed with 1% Triton X-100 (Sigma) for 10 minutes to release intracellular bacteria, following which the cell lysate was serially diluted, plated on LB plates, and incubated overnight. Two independent assays, each in triplicate, were performed for each Salmonella ser. Mbandaka isolate (total n = 66).
pH sensitivity assay An overnight bacterial culture with an OD 600 adjusted to 0.4 was used to inoculate (20% v/v) low pH LB broth (pH = 4.0 ± 0.1, adjusted using 1M HCl). The experiment was performed in flat-bottomed, non-treated 96-well plates, in which triplicate wells were used for each bacterial sample. The OD 600 was measured using an ELISA plate reader (BioTek, ELx808) to assess the growth of bacteria over time. The initial OD 600 was taken immediately after inoculation (T 0 ) and then after three hours (T 1 ) and six hours (T 2 ) of incubation at 37°C in an aerobic environment. The assay was performed as two independent experiments for all 66 isolates. This project contains the following extended data: - Supplementary Figure 1. Defining the population structure of Salmonella ser. Mbandaka isolates.

Data availability
An MCC tree of 403 Salmonella ser. Mbandaka isolates along with two outgroup strains of Salmonella ser. Kentucky (KY1) and Salmonella ser. Altona (ALT1). The tree was rerooted to outgroup strains as shown in the circular cladogram. The tree is colored based on the two major groups identified in the phylogeny (Figure 1). Different circles around the tree represent cgMLST sequence type as well as different metadata attributes of the genomes. Heat map showing the predicted AMR genes (n = 40) in the Salmonella ser. Mbandaka isolates. The dark color represents the presence of a predicted gene at 90% sequence identity with a minimum coverage of 60%. The tree represents the MCC tree created using BEAST v.2.5.1. Figure  Simple bar graph showing the percentage of Salmonella ser. Mbandaka genomes harboring each predicted AMR gene. More than 10% of genomes carried the tet(B) gene that confers resistance to tetracycline antibiotics, followed by the aminoglycoside resistance genes aph(6)-ld and aph (3" The 66 newly sequenced Salmonella ser. Mbandaka isolates were used for the invasion assay in Caco2 cells. Bar plot showing the count of intracellular bacteria (log CFU/mL) retrieved after an incubation time of two hours under aerobic conditions followed by treatment with gentamicin to kill all extracellular bacteria. Cell lysates were serially diluted and plated on LB plates in duplicate. Figure  The ability of Salmonella ser. Mbandaka isolates to tolerate an acidic environment was tested using the 66 newly sequenced isolates as representatives. All tested isolates were able to withstand the immediate exposure to a low pH environment and showed an increase in growth after three hours (A) and six hours (B) of incubation at 37°C under aerobic conditions. Bar graph showing the OD 600 immediately after exposure to LB broth at pH 4.0 (T0) and after three hours (T1) and six hours (T2) of incubation. Figure