The identification of retro-DNAs in primate genomes as DNA transposons mobilizing via retrotransposition

Background Mobile elements (MEs) constitute a major portion of the genome in primates and other higher eukaryotes, and they play important role in genome evolution and gene function. MEs can be divided into two fundamentally different classes: DNA transposons which transpose in the genome in a “cut-and-paste” style, and retrotransposons which propagate in a “copy-and-paste” fashion via a process involving transcription and reverse-transcription. In primate genomes, DNA transposons are mostly dead, while many retrotransposons are still highly active. We report here the identification of a unique group of MEs, which we call “retro-DNAs”, for their combined characteristics of these two fundamentally different ME classes. Methods A comparative computational genomic approach was used to analyze the reference genome sequences of 10 primate species consisting of five apes, four monkeys, and marmoset. Results From our analysis, we identified a total of 1,750 retro-DNAs, representing 748 unique insertion events in the genomes of ten primate species including human. These retro-DNAs contain sequences of DNA transposons but lack the terminal inverted repeats (TIRs), the hallmark of DNA transposons. Instead, they show characteristics of retrotransposons, such as polyA tails, longer target-site duplications (TSDs), and the “TT/AAAA” insertion site motif, suggesting the use of the L1-based target- primed reverse transcription (TPRT) mechanism. At least 40% of these retro-DNAs locate into genic regions, presenting potentials for impacting gene function. More interestingly, some retro-DNAs, as well as their parent sites, show certain levels of expression, suggesting that they have the potential to create more retro-DNA copies in the present primate genomes. Conclusions Although small in number, the identification of these retro-DNAs reveals a new mean for propagating DNA transposons in primate genomes without active canonical DNA transposon activity. Our data also suggest that the TPRT machinery may transpose a wider variety of DNA sequences in the genomes.

sequences of DNA transposons but lack the terminal inverted repeats (TIRs), the hallmark of DNA transposons.Instead, they show characteristics of retrotransposons, such as polyA tails, longer targetsite duplications (TSDs), and the "TT/AAAA" insertion site motif, suggesting the use of the L1-based target-primed reverse transcription (TPRT) mechanism.At least 40% of these retro-DNAs locate into genic regions, presenting potentials for impacting gene function.More interestingly, some retro-DNAs, as well as their parent sites, show certain levels of expression, suggesting that they have the potential to create more retro-DNA copies in the present primate genomes.

Conclusions
Although small in number, the identification of these retro-DNAs reveals a new mean for propagating DNA transposons in primate genomes without active canonical DNA transposon activity.Our data also suggest that the TPRT machinery may transpose a wider variety of DNA sequences in the genomes.

Introduction
2][3][4] Despite being initially considered "junk" DNA, research from the last few decades has demonstrated that MEs make significant contributions to genome evolution and impact gene function via a variety of mechanisms.7][8][9][10][11][12][13][14][15][16][17] Based on the type of the transposition intermediate, MEs can be divided into two major classes: Class I, called "retrotransposons", that utilize an RNA-intermediate to transpose in a "copy-and-paste" fashion, and Class II termed "DNA transposons", that work directly at the DNA level to transpose in a "cut-and-paste" style.[19] Retrotransposons represent the majority of MEs in primate genomes, owing to their "copy-and-paste" style transposition, which results in direct copy number increase over time, conjugated with their continuing activity over the course of evolution up to the current time.In this process, a retrotransposon is first transcribed into RNA, which is then reversetranscribed into DNA as a new copy inserting into a new location in the genome. 20Retrotransposons can be divided into two major subtypes: the long terminal repeats (LTR) and non-LTR retrotransposons, with the former carrying two LTRs flanking the internal viral sequences, while the latter lack LTRs but mostly carry a polyA tail. 1 LTRs represent domesticated retroviruses from those infecting the germline cells of the ancestors and becoming integrated into the host genome, and for this reason, they are also called endogenous retrovirus (ERVs). 21,22In primate genomes, LTRs exist either as full-length LTRs and can be as long as 10kb, or solo-LTRs around 1kb in length as a product of post-insertion homology-based recombination between the two LTRs, which removes the long internal viral sequences.[25][26] The non-LTR retrotransposons, on the other hand, as the most successful MEs in primate genomes, contribute to more than 35% of the genomes and more than 80% of all MEs in these genomes with several millions of copies. 3,4From their sequence features, the currently known non-LTR MEs in primate genomes belong to four subclasses, including shortinterspersed nuclear elements (SINEs), long-interspersed nuclear elements (LINEs), SINE-R/VNTR/Alu (SVAs), and processed pseudogenes (i.e.8][29][30][31] Despite having many differences with regard to their length, consensus sequences, and coding capacity, all subclasses of non-LTR retrotransposons share the common properties of having a 3'-polyA tail and the use of target-prime reverse transcription (TPRT) mechanism for retrotransposition. 31,32Among them, LINE-1s (L1s) as the only subfamily of autonomous non-LTR retrotransposons in the primate genomes provide the TPRT machinery for all other non-autonomous non-LTR retrotransposons.[34][35][36] In contrast, DNA transposons, initially known as "jumping genes", move in genomes using a transposase encoded by autonomous copies. 1 Ten out of the twelve DNA transposon superfamilies are known to excise themselves out from their original locations as double-stranded DNA and move to new sites in the genome, which leads to no direct change in their copy numbers. 17,19Two of the superfamilies, Helitrons and Mavericks, transpose through non-canonical mechanisms by utilizing a single-stranded DNA as intermediate, which leads to a "copy-and-paste" style. 17,37,38The ten "cut-and-paste" DNA transposon superfamilies, as well as Mavericks, have TIRs and TSDs, while Helitrons is the only superfamily with neither TIRs nor TSDs, owing to its rolling-circle mechanism. 17,37In addition to these aforementioned DNA transposons, there is another group of DNA transposons named miniature inverted-repeat transposable element (MITEs) characterized by the presence of both TSDs and TIRs yet lacking the coding capacity for the transposase. 39By using DNA transposases

REVISED Amendments from Version 2
Major changes in this version include 1) as requested by the two reviewers, all statement of "a new type of MEs" was removed/changed, except in the end of the Discussion section where the phrase was used with the condition of having their ongoing activity verified by experiments; 2) a critical error associated with figure 7 and its citation has been fixed.In the prior version, Figure 7 missed panel A and was incorrectly cited as "Figure 6"; 3) as suggested by reviewer 2, based on data in Table 3 and Figure 4, new interpretations/discussions were added regarding the possible presence of 5' truncation as an additional characteristic of TPRT-based retrotransposition.
Any further responses from the reviewers can be found at the end of the article encoded by other autonomous DNA transposons, these non-autonomous, short (50-600bp) MITEs can transpose in the host genome. 17,40A transposons have been considered inactive in the current primate genomes and have received very little research attention.Lander et al. (2001) in their initial human genome analysis concluded that there was no evidence for DNA transposon activity during the past 50 My, 2 while a later study suggested that DNA transposons had been highly active during the early part of primate evolution till ~37 Mya. 19There has been no report for lineage-specific or species-specific DNA transposons in primate genomes.However, in our recent comparative analysis of species-specific MEs in eight primates from the Hominidae and the Cercopithecidae families, there were also a total of 2,405 DNA transposons identified to be species-specific in addition to the 228,450 species-specific retrotransposons. 36As part of efforts to understand the mechanism(s) underlying these species-specific DNA transposons, we performed further comparative analysis across ten primate genomes and identified a novel group of MEs with the sequences originated from DNA transposons, but showing some hallmarks of L1-based retrotransposons, which we called "retro-DNAs".

Overall profiles of DNA transposons and lineage-specific retro-DNAs in the ten primate genomes
To identify all retro-DNA events in the primate genomes, we first identified the diallelic DNA transposons (da-DNAs) that are defined as DNA transposons with both the insertion allele and pre-integration allele identifiable in one or more of these genomes.These DNA transposons are likely to be the results of relatively recent transposition events shown as having a low level of sequence divergence from their parent copies, which permits accurate identification of TSDs and TIRs.The starting lists of DNA transposons were based on the RepeatMasker annotation subjected to a consolidation process to ensure the accuracy in identifying DNA transposons with both insertion and pre-integration alleles as well as their TSDs. 3,36One main type of targets for integration in this case are the ME entries split by insertion of other MEs and non-ME sequences.As shown in Table 1, the number of DNA transposons in the primate genomes dropped ~18% on average after integration, leading to less variation in their numbers across genomes ranging from 324,288 in marmoset to 421,580 in chimpanzee, and averaging at 376,720 copies per genome verse 459,521 per genome before integration.These DNA transposons contributed to a total of ~98 Mbp or ~3.6% of these primate genomes on average (Table 1).Various factors could have contributed to the different DNA transposon numbers in these genomes, including, but not limited to, the differences in the versions of RepeatMasker and the ME reference sequences used for ME annotation, the quality of genome assemblies, and probably most importantly the different evolution history of the individual genomes.Using a multi-way comparative genomics approach modified from our previous analysis of human-specific MEs, 36 we identified a total of 271,085 da-DNAs in the 10 primate genomes (Table 1).Specifically, for each da-DNA, we require the presence of a pre-integration allele in at least one of the other nine genomes.As shown in Table 1, the number of da-DNAs varied from 23,923 in the orangutan genome to 34,901 in the marmoset, averaging at 27,109 for the 10 genomes.The largest number of da-DNAs in the marmoset was expected for its largest evolutionary distance from the remaining primate species.Notable differences were also seen between genomes with mutually closest evolutionary relationship among the 10 genomes, making these numbers directly comparable for the paired genomes.For example, between the human and chimpanzee genomes, the latter had >10% more da-DNAs than the former (28,273 versus 25,933), while between the two macaques, the rhesus genome had ~10% more than the crab-eating macaque genome (28,149 versus 26,218) (Table 1).In comparison, the species-specific non-LTR retrotransposons in the crab-eating macaque genome were less than 1/8 of that for the rhesus genome (3,039 versus 25,085), 3 indicating at least that the lower number of da-DNAs in rhesus genome was not due to genome sequence quality differences.
By composition in DNA transposon type, the majority of the da-DNAs belonged to the hAT and TcMar superfamilies with the hAT subfamilies (hAT-Charlie and hAT-Tip100) contributing to ~57% of da-DNAs and the hAT-Charlie subfamily alone contributing to ~50% of all da-DNAs in all genomes (Table S1, Figure 1A).The two TcMar families, TcMar-Tigger and TcMar-Mariner, contributed ~33% of da-DNAs, while the remaining families contributed to ~10% of da-DNAs.This composition pattern seems to be quite similar among all genomes, with the orangutan genome having a slightly lower portion from the hAT-Tip100 and TcMar-Tigger families but slightly more from the other families (Figure 1A, Table S1).In comparison, the family composition for retro-DNAs is shown to be more variable across the genomes, particular in the orangutan genome having lower portion for all 4 named subfamilies but much more from the other (unclassified) class (Figure 1B).

Retro-DNAs in the primate genomes possess non-LTR retrotransposon sequence characteristics
While analyzing the da-DNAs in detail for understanding the possible mechanisms involved, we came across an unusual case of a 201-bp Tigger7 DNA transposon from the TcMar-Tigger family located at chr4:146335052-146335253 of the human genome (GRCh38).It appears to be a human-specific ME for its absence in the orthologous region in the chimp genome (Figure 2A), and more interestingly, this DNA transposon insertion has a 14 bp TSD "AAGAGTCCTGGATC" that is much longer than TSDs for DNA transposons, and it has no identifiable TIR typical of a DNA transposon (Figure 2A).Furthermore, it has a 27 bp polyA tail at its 3'-end and a predicted polyadenylation signal "ATTAAA" before the polyA tail.All these features point to a non-LTR retrotransposon rather than a canonical Tigger7 DNA transposon, which is expected to have TIRs and 2 bp (TA) TSDs.We therefore named it as a "retro-DNA" for being a retrotransposonlike element derived from a DNA transposon sequence.
Following the identification of this retro-DNA, we searched the human genome and other primate genomes and identified more similar cases, as exampled in Figure 2B-D.For instance, a 446 bp Charlie1a fragment from the hAT-Charlie family was identified as a retro-DNA in the genome of three primates (green monkey, rhesus, and crab-eating macaque), which has TSDs in 13 bp long but no TIRs (Figure 2B).
By requiring the presence of longer TSDs (≥8 bp) and the absence of TIRs, we identified a total of 1,750 retro-DNA entries among all da-DNAs using a workflow shown in Figure 3.By classification, these retro-DNAs consist of 847, 478, 156, 74, and 195 entries from the hAT-Charlie, TcMar-Tigger, hAT-Tip100, TcMar-Mariner, and other families, respectively (Table 2).The composition pattern (Figure 1B) in general was similar to that of all da-DNAs (Figure 1A), indicating there is no strong bias for retro-DNA towards any particular subfamily among da-DNAs.However, at the genome level, the ratios of retro-DNAs in the orangutan genome from the hAT-Tip100 and TcMar-Tigger families were much lower, while that from the "other" families was much higher compared to other genomes (25% versus 10%) (Figure 1B).As seen in Table 2, the 1,750 retro-DNAs encompassed all 10 genomes and could be clustered into 748 unique retro-DNA insertion events based on their orthologous relationships.It is worth noting that our list of retro-DNAs may suffer a certain level of false negatives and false positives due to the uses of a set of criteria that might not be optimal and due to the challenges associated with the analysis of MEs and the deficiencies of the reference genome resources, especially for the non-human primates as discussed in our recent study. 3 sequence length, these 748 (after removing orthologous redundancy (Table 2)) retro-DNAs averaged at 209 bp (AE190 bp) in length, representing in all cases only part of the corresponding family consensus sequences (averaging at 21%) (Table 3).While the consensus sequences for DNA transposon families differ in length significantly, ranging from 380 bp for TcMar-Mariner to 1,506 bp for hAT-Tip100, the average length of retro-DNAs seems to be relatively more consistent across the families, ranging from 122 bp for TcMar-Mariner to 251 bp for TcMar-Tigger.Nevertheless, in general, the retro-DNAs from the longer families do have a longer average length (e.g.hAT-Tip100) than those from the shorter families, but at lower proportions of their consensus sequences than those with shorter consensus sequences (e.g.TcMar-Mariner) (Table 3), suggesting more chance for 5 0 truncation than the short ones as expected.This may be considered as another characteristics of the TPRT-driven retrotransposition.
Additionally, we examined whether there were any hotspots in these DNA transposon sequences as the source sequences of these retro-DNAs.By using the retro-DNA entries from the Tigger1 DNA transposon subfamily, which is the largest subfamily containing 41 non-redundant retro-DNAs, we generated a frequency plot to show the usage of the consensus sequences by the retro-DNAs.As illustrated in Figure 4, while all regions of the consensus sequence were covered by the 41 retro-DNAs, the frequency varied substantially from 2.4% to 29.3%, showing that a few regions of the consensus sequence (e.g.~1310-1440 bp and ~1840-2240 bp) were used more frequently than the rest of the regions.The overall lower representation for the 5 0 end of the consensus may suggest there might be a preference for the plus strand for using as retro-DNA at least for this family and presence of 5 0 truncation.
From the total 748 non-redundant retro-DNAs, we identified 176 entries carrying a potential polyA tail (Table S2).We speculate that the relatively low percentage (23.5%) of entries with a polyA tail might be partially due to polyA sequences being more prone to sequence divergence from random post-insertion mutations for their homopolymer nature.The complete list of the 748 non-redundant retro-DNA entries with their genomic coordinates in all applicable genomes is provided in Supplementary File 1.For these retro-DNA insertion events, we further examined the sequence motifs at the insertion sites and the TSD length distribution pattern.As shown in Figure 5A, a sequence motif of 'TT/AAAA', same as the motif for Alus, L1s, and SVAs (Figure 5B), 32,36,41 was observed, despite the signal being much weaker.This, nevertheless, serves as a strong indication of their use of the L1-based TPRT machinery. 33,34As further support, the TSD length distribution peaked at 8 bp (Figure 5C), similar to the second peak seen for the TSDs of human specific L1s, despite missing the major peak at 15 bp observed for the latter (Figure 5D). 36e species-and lineage specific pattern of retro-DNAs We examined the evolutionary timeline of the retro-DNA insertion events by mapping them onto a phylogenetic tree of these primates based on the data in the TimeTree database. 43As shown in Figure 6A (the insert), 450 (60.2%) of these retro-DNAs appeared to be species-specific for being uniquely present in only one genome, while another 295 (39.4%) were found in multiple genomes in a clear lineage-specific pattern.On average, a retro-DNA was shared by two    genomes, implying an average age older than the species-specific MEs (unique to one species) reported in our earlier study. 3As shown in Figure 6B, the number of retro-DNA insertional events appears to show a positive linear correlation with the relative evolutionary ages of the species and lineages (R 2 = 0.5463), suggesting that these retro-DNA insertional events occurred at a low but relatively consistent rate during primate evolution.
The example shown in Figure 2A serves as a very clear case of species-specific retro-DNA.As shown in the multiple sequence alignments with its orthologous sequences including its flanking sequences from other eight primate genomes (not locatable in marmoset genome), this Tigger7 element was absent from the orthologous sites of all non-human primate genomes (Figure 7A), confirming it as an authentic human-specific retro-DNA.On the contrary, the example shown in Figure 2B is shown to be a retro-DNA insertion event shared among three of the four monkey species and absent in the orthologous regions of the remaining seven primate genomes, thus likely as a lineage-specific retro-DNA (Figure 7B).Furthermore, it appears that this retro-DNA sequence in these three genomes had been subject to mutation in the polyA tails shown as having variable lengths, agreeing with its relatively older age as a lineage-specific retro-DNA and the higher rate of mutation in the polyA region.Similarly, the example shown in Figure 2D represents an ape lineagespecific retro-DNA for its presence in all ape genomes but absent in all non-ape genomes examined.

The genome distribution patterns of retro-DNAs and their parent sites in gene context and expression
To assess the potential functional impact of these retro-DNAs, we examined their gene context based on the Ensembl gene annotation for these genomes. 42A total of 698 retro-DNAs, representing ~40% of the 1,750 retro-DNAs were located within the genic regions and promoter regions for 734 transcripts from 414 unique genes (Table 4 and Table S3).Majority of these retro-DNAs were located within the intron regions (699/734 transcripts), while 27 entries were inserted into promoter regions and untranslated regions.The presence of these retro-DNAs in the genic regions provides them the potential to impact gene regulation or splicing.
Further, we identified the potential parent sites for these retro-DNAs by performing a sequence similarity search using their sequences to query the corresponding genome sequences.For each retro-DNA, the best non-self-match was selected as its potential parent site.An example of such a parent-child relationship is shown in Figure 8, in which a human-specific retro-DNA event on chromosome 4 is shown to be a child to a much longer Tigger7 (1882 bp) on chromosome 9, which has orthologous copies in other primate genomes, indicating a much older age of the latter and its validity as a parent copy for the former.As shown in Table S4, we identified a total of 715 potential parent sites for the 1,750 retro-DNA entries (or 325 entries for the 748 retro-DNAs after removing the redundancy across species).The failure in finding the parent copies for the remaining entries could be due to the loss of the parent copy from genomic rearrangements or due to incomplete sequence coverage of the genomes.Like for the retro-DNAs, we examined the gene context for these potential parent sites, and as shown in Table S5, 351 (49.1%) of these redundant potential retro-DNA parent sites locate to 410 different genic regions for 371 unique genes.In these cases, the transcripts of these potential parent sites, likely as part of the transcripts or splicing by-products (e.g., excised intron sequences) of their host genes, might have had the chance to be captured by the L1 TPRT machinery to generate retro-DNAs as in the case of processed pseudogenes/retro-genes.
The ratio of genic entries (49.1%) was higher for the parent sites than that for retro-DNAs (~40%), and the implication of this is discussed later.
We also examined the expression level of retro-DNAs and their potential parent sites using RNA-seq data from the Non-Human Primate Reference TRanscriptome (NHPRTR) dataset 44 and two other studies 45,46 to see if any of these entries had any transcriptional activity in the present-day primate genomes.For this, we collected a total of 21 transcriptomes for seven primates, excluding orangutan, gibbon, and marmoset, for which no transcriptome data was available at the time of our analysis.To minimize false positives due to the high sequence similarity among ME members in the same family, we included only the reads with a perfect match to the retro-DNAs or their parent site regions and with each read used only once in calculating the expression level.However, we believe that this process has inevitably introduced a certain level of false negatives in the results due to sequence polymorphisms and, therefore, may have led to an underestimation of the retro-DNAs and parent sites' expression levels.As seen in Tables 5 and S6, 966 loci from the 1,750 retro-DNA and 715 parent sites in these seven primate genomes were shown to have a certain level of expression ranging in fragments per kilobase of transcript per million reads (fpkm) value from 0.0003 to 27.3.
We further investigated the relationship between retro-DNAs and their parent sites based on their expression levels.Specifically, three human testis transcriptome samples (SRR2040581, SRR2040582, SRR2040583) retrieved from the NCBI SRA (Sequence Read Archive)) were used to analyze the expression level of the retro-DNA/parent site pairs.As shown in Figure 9A, a total of 66 retro-DNA/parent site pairs were shown to have a certain level of expression (fpkm > 0)  62 The bootstrapped consensus tree inferred from 500 replicates 63 is used to represent the evolutionary history of the taxa involved.Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed.The percentage of replicating trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. 63Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach followed by selecting the topology with superior likelihood value in logarithmic scale.This analysis involved 11 DNA sequences with a total of 222 positions in the final dataset.
for either the retro-DNA or the parent site among the three human testis samples.Notably, among these 66 retro-DNA/ parent site pairs, 57 (86.4%) parent sites were shown to be expressed (fpkm > 0) compared to only 42 (63.6%)expressed retro-DNAs (Table S4 and S6, Figure 9A).This difference might indicate that the generation of a retro-DNA requires the expression of its parent site, while a retro-DNA itself may not be expressive depending on its landing location.Therefore, a higher ratio of transcriptionally active sites may be expected for the parent sites than for the progenies (retro-DNAs).More interestingly, the two parent sites responsible for multiple retro-DNA entries were shown to have the highest levels of expression among the parent sites (Figure 9A).This may suggest that the expression level of the parent sites is positively correlated to their potential in generating retro-DNAs.Furthermore, the ongoing expression of the parent sites in the current genomes suggests that they have the potential to generate more retro-DNAs in the future.
We also examined and compared the expression levels of retro-DNAs and their parent sites among gene context-based groups in the three human testis transcriptomes.One critical reason for including testis here is to ensure that these retro-DNAs are expressed in germline cells for generating transmissible daughter copies.As shown in Figure 9B, the average fpkm values of the parent sites were always higher than that of the retro-DNA entries as a whole group or divided into genic and intergenic regions.In addition, the entries located within genic regions showed higher expression than the ones  located outside the genic regions for both retro-DNAs and the parent sites (Figure 9B), suggesting that entries located in the genic regions may have more opportunities to be expressed passively as part of the host gene expression.This difference is larger for retro-DNAs than for the parent sites, likely because parent sites had to be expressed regardless of their position in order to be able to generate new copies.None of these differences are statistically significant, likely due to the small sample size.

Discussion
Retro-DNAs as retrotransposons derived from DNA transposons In this study, we focused on a small number of species-specific DNA transposons identified in primate genomes using a computational comparative genomics pipeline previously established for analyzing species-specific retrotransposons in the human genome and seven other genomes. 3,36Unlike for retrotransposons, for which the ongoing activity during evolution and in the current genomes of primates, as well as their contribution to the lineage-and species-specific MEs, have been well established, 3,32,47 similar research for DNA transposons in primate genomes remains very scarce.As a matter of fact, as at time of writing, no report of species-specific DNA transposons in these primate genomes has been documented, likely due to lack of effort, as DNA transposons are thought to have become inactive in primate genomes about 37 Mya. 17,19 trying to understand the mechanism underlying the mysterious species-specific DNA transposon insertions identified in our comparative genome analysis, we spotted a few interesting entries as exemplified by the case shown in Figure 2A, which manifests the characteristics of non-LTR retrotransposons by having longer TSDs and presence of a polyA tail, while lacking TIRs, the hallmark of new DNA transposon insertions.The remaining cases shown in Figure 2 have the same non-LTR features but do not necessarily have a typical polyA tail.For their non-LTR retrotransposon characteristics, we named them "retro-DNA" for being retrotransposons derived from DNA transposons.We then performed a systematic analysis to look for more of such "retro-DNA" cases.
For this, we expanded our search from the strict species-specific DNA retrotransposons, which are defined as those present in only one of the primate genomes, 3,36 to da-DNAs, which are defined as diallelic DNA transposons with the insertion allele and its pre-integration allele (i.e., the orthologous region without the DNA transposon) both present in at least one of the ten genomes we included.We obtained a total of 271,085 da-DNAs, and from these we then specifically searched for retro-DNA cases, which have long TSDs (≥8bp) and the absences of the TIRs using a protocol shown in Figure 3.This led to the identification of 1,750 of retro-DNA cases, which represent 748 unique events, covering all ten primate genomes with over half being species-specific and the rest being lineage-specific covering different lineages in this group of primates (Figure 6A).Our results indicate that the generation of retro-DNAs has occurred in all ten primate genomes included in our analysis and at a wide-spectrum of evolutionary time with an approximately constant rate (Figure 6).Furthermore, these retro-DNAs are not limited to a single subfamily, but rather cover all major DNA transposon families, suggesting that the existence of such "retro-DNAs" is the product of a consistent and common process actioning in primate evolution.
4][35][36] The major pieces of evidence include the lack of TIRs and the presence of the TPRT insertion site sequence motif and long TSDs.As seen in Figure 5A, the integration sites of the 748 retro-DNAs display, although at a much weaker signal, is the exact same core sequence motif of "TT/AAAA", found for non-LTR retrotransposons in the human genome (Figure 5B). 34,36,41The TSDs for these retro-DNAs show a dominant peak at 8bp (Figure 5C), which is much longer than that of TSDs typically found for DNA transposons (2 bp) and is similar to the secondary peak of TSD length observed for the human-specific L1s 36 (Figure 5D).Furthermore, the presence of parent sites in the same genome for a significant proportion of the retro-DNAs (325/748 or 43.5%) indicates their use of a "copy-and-paste" rather than the "cut-and-paste" mechanism used by canonical DNA transposons.The presence of a polyA tail in many (176/748 or 23.5%) of these retro-DNAs and the apparent occurrences of the 5 0 truncation suggested by data in Table 3, Figure 4 provide additional support for their use of the L1-based TPRT mechanism.
It is worth noting that, as described above, while there is sufficient similarity in sequence features between these retro-DNAs and the known non-LTR retrotransposons for treating these retro-DNAs as non-LTR retrotransposons, several unique aspects are also evident.These include the missing of the major TSD length peak at 15 bp observed for other non-LTR retrotransposons, the low percentage of entries with a polyA tail, and the weaker signal of the sequence motif, "TT/ AAAA", at the integration sites.4][35][36] In other words, the older age of the retro-DNAs leads to higher sequence divergence, which in turn lowers the sensitivity for detecting all of these sequence features.An additional reason for the weaker signal in the insertion site sequence motif for the retro-DNAs could be due to the small sample size.It is also possible that these unique characteristics may suggest that some differences in the detailed retrotransposition process of these DNA transposons, likely with regard to the interaction between the retro-DNA transcripts and the ORF1 and ORF2 proteins, may exist between the retro-DNAs and the canonical non-LTR retrotransposons.One known example for this is that Alu transposition does not seem to require ORF1p unlike for L1s. 32,48,49 is also worth pointing out that in addition to the well-known types of non-autonomous non-LTRs transposed by the TPRT machinery, including SINEs, SVAs, and retro-genes, evidence suggests that some copies of the LTRretrotransposon subfamily, HERV-W, might have also been transposed by this mechanism. 50,51However, these HERV-W sequences are part of retrotransposons and can continue to be transposed using their canonical retrotransposition mechanism.For this reason, we would like to argue that our identification of retro-DNAs is unique and significant in the sense that they represent DNA transposons, which would not be able to transpose anymore in the primate genomes, since their canonical mechanism is no longer active.Overall, the research from this study and others clearly suggests that the L1-based TPRT machinery may be able to transpose a much wider variety of genomic sequences than what are currently known.

The relative retro-DNA activity during primate evolution
In comparison with the other types of non-autonomous non-LTR retrotransposons, including Alus, SVAs, and processed pseudogenes, in primate genomes, 2,3,32,52 the number of retro-DNAs per genome was much lower, averaging at < 200 per genome (Table S2).This number was even substantially lower than that of processed pseudogenes, which represent the smallest class of non-LTR retrotransposons with 10,190 copies in the human genome. 53We reason that the very small copy number of retro-DNAs may primarily attribute to one factor, i.e., the lack of intrinsic internal promoters to drive their own transcription, leading to an overall low level of their transcripts available for retrotransposition.In contrast, retrotransposons carry their intrinsic promoters required for their canonical propagation mechanisms, while a promoter is not required for the canonical DNA transposon activity.This is in agreement with the observation that there is no clear hotspot in the DNA transposon consensus sequences used in generating retro-DNAs, as shown in Figure 4 for Tigger1.Should there be internal promoters driving the transcription, we would expect to observe one or more clear dominant peaks in the frequency of the regions used for retro-DNAs correlated with the location of the internal promoter(s).
Without the ability to drive their own transcription, the only way for DNA transposons to get transcribed is to get transcribed as a part of the host gene transcripts.If this is how retro-DNAs were generated, then we would expect to see a high percentage of retro-DNAs having their parent sites located in the genic regions, more specifically in the transcribed regions, i.e. exon and intron regions.By examining the gene context, 351 of the 715 parent sites (49.0%) for the retro-DNAs were found to locate in 371 unique genes/transcripts in the ten primate genomes.This ratio was higher than that for all DNA transposons in the genic regions (39%, detailed data not shown) as the expected for random distribution and for that of the retro-DNAs (40% in genic sites including promoters) (Table 4 and Table S5), thus supporting the role of passive expression for the parent sites in generating these retro-DNAs.
By the same rationale, we would expect that on average the parent sites should have a higher expression level than retro-DNAs since the parent sites were selected to be biased for this by locating in the genic regions, while the location of the retro-DNAs is more or less random, leading to a relatively lower proportion in genic regions than the parent sites as shown in our data (40% verse 49%) (Table 4, Table S5).This is supported by the expression data showing that among the 66 retro-DNA/parent site pairs, 57 pairs have parent sites with a fpkm > 0 compared to only 42 expressed entries for retro-DNAs (Figure 9A).Additionally, we identified two parent sites, which are the only sites potentially responsible for generating multiple retro-DNA entries, and they showed the highest levels of expression among the parent sites (Figure 9A).By comparing the expression levels of all parent sites with that of retro-DNAs in the human genome, we can see an overall higher expression for the parent sites (Figure 9B), and this is also true when comparing between the sites in the genic and intergenic regions (Figure 9B).Furthermore, the expression level of parent sites in the genic regions is much higher than their counterparts in the intergenic regions as expected (Figure 9B).Another possible factor contributing to the extremely small number of retro-DNAs might be that the sequences of these DNA transposons are much less optimal for TPRT-based retrotransposition than the canonical types of retrotransposons.
The use of the 10 primate genomes, representing several lineages with a large span in primate evolution, allowed us to examine whether there is any positive correlation between the length of evolutionary span and the number of retro-DNA insertional events.As shown in Figure 6B, a moderate positive correlation between the two is observed (R 2 = 0.5463), suggesting that the generation of retro-DNAs is relatively steady during the evolution of this group of primates.Furthermore, the observation that many of the retro-DNA parent sites, as well as 966 of the 1773 (~54.5%)retro-DNAs show certain levels of expression in the seven primate transcriptomes (Table 5 and Table S6), suggests the possibility of ongoing retro-DNA generation from the parent sites and perhaps also from some retro-DNAs.

Conclusions and future perspectives
In this study, through a comparative genomic analysis of 10 primates, we report the first identification of "retro-DNAs" for being non-LTR retrotransposons derived from DNA transposon sequences.This work is significant, as the generation of these retro-DNAs serves to propagate DNA transposon sequences in the absence of the canonical DNA transposon activity in primate genomes and the process involves two fundamentally different ME classes.Despite being very small in number, they do contribute to the genetic diversity among primate species along with other MEs, and our data seem to suggest that at least some of them have the capability to serve as a parent copy to further propagate, differentiating them from other elements that are passively retrotransposed by the L1 machinery, such as processed pseudogenes. 54urthermore, the discovery of these retro-DNAs suggests that the L1-based TRPT machinery may have been used by more diverse types of RNA transcripts than what we currently know.Interesting follow-up work ought to include the verification of the retrotransposition activity of these retro-DNAs and their parent sites using in vitro and in vivo assays 55 and extension of the similar analysis to other types of expressive DNA sequences, such as non-coding RNA genes.Should their intrinsic capacity of retrotransposition as in the case of Alus and SVAs be experimentally verified, we could then classify these retro-DNAs as a new type of non-LTR retrotransposons beyond the current LINE, SINE, and SVA.
In addition, research into the mechanisms underlying the remaining majority of the diallelic DNA transposons would also be very interesting and valuable.

Sources of primate genome sequences
In this study, we chose to use 10 primate genomes including human, among which eight genomes were included in our previous study for identifying species-specific MEs in primates. 3 ).The marmoset genome was added to expand the evolutionary span, also serving as an outgroup for the other nine genomes from the ape and monkey groups, while the gibbon genome was added to increase the coverage and evolutionary span of the ape group.All genome sequences in fasta format and the RepeatMasker annotation files were downloaded from the UCSC genome website onto our local high performance computing servers for in-house analyses.We have used the most recent genome versions available on the UCSC genome browser website at the time of analysis in all cases except for gorilla, for which there is a newer version (March 2016, GSMRT3/gorGor5) available but not scaffolded into chromosomes, making it inadequate for our analysis.

LiftOver overchain file generation
A total of 90 liftOver chain files were needed for all possible pair-wise comparisons of the 10 genomes used in this study.These files contain the information linking the orthologous positions in a pair of genomes based on lastZ alignment. 56 total of 22 of these were available and downloaded from the UCSC genome website, and another 34 liftOver chain files were generated using a modified version of UCSC pipeline RunLastzChain from a previous study. 3The remaining 36 liftOver chain files were newly generated for this study using the same pipeline.
Identification of DNA transposons with diallelic status in the ten primate genomes Pre-processing of DNA transposons: The starting list of DNA transposons in each primate genome was obtained based on the RepeatMasker ME annotation data from the UCSC website.As previously described, we performed a preprocessing to integrate the ME fragments annotated by RepeatMasker back to ME sequences representing the original transposition events. 36entification of DNA transposons with diallelic status: We modified a previously reported comparative genomics bioinformatics pipeline 36 to identify da-DNAs that have the presence of both the insertion and pre-integration alleles in at least one of the 10 primate genomes.Briefly, this pipeline uses a robust multi-way computational comparative genomic approach to determine the presence/absence status of DNA transposons among a group of genomes by using both the whole chromosome alignment-based liftOver tool and the local sequence alignment-based BLAT tool. 57,58The sequence of a DNA transposon at the insertion site and its two flanking regions in a genome were compared to the sequences of the orthologous regions available in all other genomes.If a DNA transposon is absent from the orthologous regions of any of the other nine genomes not due to the existence of a sequence gap (i.e.just missing the insertion), it is selected as a potential candidate of da-DNA subject to further analyses.

Identification of retro-DNAs
Identification of TSDs and TIRs: For the candidate entries from the previous step, using in-house PERL scripts as described previously, 36 we performed identification of the TSDs.Additionally, we modified our scripts to identify the TIRs, the hallmark of all cut-and-paste transposons except for Helitrons. 17da-DNA entries without identifiable TSDs or TSD length < 8 bp, as well as entries with identifiable TIRs, were excluded from further analysis.The 8 bp TSD length cutoff was chosen based on our observation for human-specific retrotransposons that 95% of identified TSDs are at least 8 bp long. 36Additionally, we used MiteFinderII, a tool designed to identify miniature inverted-repeat transposable elements, 59 to verify that none of our candidate entries contain TIRs.
Filtering against retrotransposon transductions: To ensure the presence of a DNA transposon was a result of active transposition, rather than a passive result of other processes, e.g., retrotransposition-mediated transductions, we mapped the candidate entries against the known retrotransposons in the ten primate genomes based on their genomic positions.Specifically, the sequences of candidates from the previous step were mapped back onto the host genome using BLAT, followed by removing all entries located within 50 bps to a retrotransposon (excluding entries inserted into a retrotransposon), because such entries could be a result of retrotransposition-mediated transduction.All entries left at this point were considered candidates of "retro-DNAs" for being retrotransposons derived from DNA transposon sequences but lacking TIRs and having TSD at 8 bp or longer.
Identification of polyA tail: For each candidate retro-DNA, we retrieved the 10 bp sequence from the 3' end of the positive-strand (by the DNA transposon consensus sequence).If the sequence contains six or more "A", the entry is considered to have a polyA tail.
Clustering retro-DNAs to identify unique retro-DNA events The retro-DNA candidates identified from the last step in the 10 primate genomes were subject to a round of "all-againstall" sequence similarity search using BLAT with the sequences of the retro-DNAs plus the 100 bp of the flanking region on each side.Entries with 95% or higher sequence similarity across the entirety of the sequences including the flanking sequences were identified as one orthologous cluster, representing one retro-DNA insertion event during the evolution of these primates.
Estimating the timeline for retro-DNA insertions An organismal phylogenetic tree of the 10 primate genomes with the marmoset genome as the outgroup was obtained from the TimeTree database 43 and displayed using the Treeview program. 60We then manually added the numbers of nonredundant retro-DNA entries onto the nodes and branches of this tree based on the presence of retro-DNAs in the specific genomes or lineages.
Multiple sequence alignment of retro-DNA and parent sites We performed multiple sequence alignment for a few selected retro-DNA entries, including their parent sites.For this, we first collected retro-DNA sequences including 100 bp on both flankings, as well as the orthologous sequences of the parent sites from the rest of primate genomes and performed multiple sequence alignment using the online version of MUltiple Sequence Comparison by Log-Expectation (MUSCLE) 61 from the European Bioinformatics Institute website.Phylogenetic analyses in some cases were performed using the Maximum Likelihood method and Tamura-Nei model 62 with bootstrapping 63 at 500 replications.
Expression analysis of retro-DNAs and their parent copies RNA sequencing (RNA-seq) data for the blood and the generic (mixed of twenty tissues) samples from chimpanzee, gorilla, crab-eating macaque, rhesus and baboon were retrieved from the Non-Human Primate Reference Transcriptome Resource (NHPRTR) 44 for expression analysis of the retro-DNAs and their parent copies.We also collected RNA-seq data for six human testis transcriptomes (three for blood and three for testis) 46 and two green monkey transcriptomes. 45,64he detailed information regarding the NCBI SRA accession numbers and the associated species and tissues is available in Table S6.Tophat2 (version 2.1.1)was used to align the RNA-seq reads to the corresponding reference primate genomes. 65Reads mapped to the retro-DNA/parent copies regions were retrieved in fasta format and aligned back to the reference genome using the NCBI blastn to ensure that each RNA-seq read was only assigned to only one genomic location with perfect match for use to calculate the fpkm values for each DNA transposon using an in-house Perl script.

Facility and software for computational analysis
The data analysis and figure plotting were performed using a combination of Linux shell scripting, R, and Microsoft Excel.The computational analysis was mostly performed on Compute Canada high-performance computing facilities running CentOS Linux.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This study identifies retro-DNAs in primate genomes as a new type of mobile elements (MEs), derived from DNA transposons but mobilizing via retrotransposition.Several characteristics indicate that retro-DNAs use the L1-based target-primed reverse transcription (TPRT) mechanism for their mobilization.The presence of retro-DNAs in genic regions underscores their impact on gene function in primate genomes.It thus suggests a broader range of targets for retrotransposition than previously known, expanding our understanding of ME activity in primate genomes.
In general, the manuscript has a smooth flow in its writing, presenting interesting and compelling results.Here are some suggestions for improvement: While the term "retro-DNAs" is appropriate, considering them as "a new type of MEs" may be misleading.Similar to retro-copies or retro-genes, retro-DNAs are by-products of L1mediated retrotransposition.In theory, any polyadenylated RNAs could be recognized by reverse transcriptase encoded by L1s [Ref-1].However, these sequences cannot 1.
autonomously transpose as MEs.I agree with the first reviewer that the authors should better tune down their statement.Consistency in formatting TE names is crucial.Ensure all TE names are italicized throughout the manuscript, as some inconsistencies were noted (e.g., hAT and TcMar superfamilies, hAT-Tip100, and Tigger7 on page 10). 2.
Figure 7 was not cited in the manuscript, and it was redundant with Figure 8A.Please delete Figure 7 and reorder the Figures.

3.
Is hAT-Trip100 is a typo of hAT-Tip100?Please check the writing.4.
Most L1copies are 5'-truncated, while their 3' ends are necessary for the recognition by reverse transcriptase [Ref-2].This is another important feature of retroposition by L1s.I noticed that authors identified retro-DNAs and their parent sites.Can they further check whether retro-DNAs are from the 3' ends of their parent sites?This will provide additional evidence supporting L1-mediated retro-DNA formation.The manuscript is well-written and easy to read in all its parts.
Find below a list of major issues.
-While retro transposition can be considered as a way to spread transposable elements fragments around the genome and the information provided in this manuscript is (to my knowledge) new, I would be cautious in defining retro-DNAs as "a new type of non-autonomous non-LTR retrotransposons".retro position of expressed sequences is a common process and indeed, we do not define processed pseudogenes this way.I would therefore recommend modifying such a conclusion to avoid overstatements and misinterpretation of the result.
-The expression analysis shows that some retro-DNA sequences are expressed in the testis.This finding compatible with what is usually seen for retrotranscribed pseudogenes?Testis activation of pseudogenes is well-documented, suggesting potential functional roles.Many pseudogenes are activated in the testis [1], [2].In the case of TEs (and especially non-functional TEs) testis expression may evolve new regulation functions as observed in other transposition systems which are subjected to piRNA regulation.
-I cannot see the supplementary files associated to the manuscript.First, regarding your caution in our claiming retro-DNAs as "a new type of retrotransposon", we would agree with you that many other types of DNAs, notably the mRNA of genes as processed pseudogenes or retrogenes, are not named as new types of retrotransposons in addition to LINEs/L1, SINEs/Alus and SVAs in the human genome.Very likely, we would agree with each other that a new type of retrotranspons (non-LTR specifically in our case), in addition to have the expected sequence features (i.e., polyA taile), would also have the ability to continue to retrotranspose like other non-autonomous non-LTR retrotransposons, such as Alus and SVAs, at least as a group, but not necessarily every individual copies.In other words, some of the retrotransposed copies (beyond the original copies) are able to be transcribed and retrotranspose.This extra feature would distinguish it from processed pseudogenes.Following this reasoning, we demonstrated that some of the retro-DNA copies have the capacity to transcribe or be transcribed and served as the parent copies of additional copies (Table 5 and Fig. 9A).Certainly, we realize the limitation of our data by lack of experimental data to the presence of demonstrate capacity of these copies, and ideally this is to be demonstrated by using the established retrotransposition assay (refs).However, this is beyond our reach as a bioinformatics-oriented research group, and it is our hope that by reporting such potential of these retro-DNA elements, interest can be stimulated by other groups to follow up with experimental verification.Considering limitation of our research data, we have adjusted the relevant point in our conclusion as "retro-DNA could be established as a new type of non-LTR retrotransposons if their intrinsic L1-based transposition capacity can be experimentally approved."Second, regarding our use of testis tissues for expression analysis of these retro-DNAs, our rationale is related to above point, their potential as parent copies of transmissible retro-DNAs, since only they need to be expressed in gamete cells for this to happen.It is true that a lot of genes show sporadic expression beyond their regular expression pattern in other differentiated tissue, likely attributed to the unique epigenetic profile during spermatogenesis.But this may likely also be the attributing factor for the generally higher levels of germline retrotransposition of all known retrotransposons over somatic retrotransposition, with the former responsible for all transmitted retrotransposition events observed in the genomes.For this reason, even though we did include a mixed tissues sample and blood, with the latter as a widely available somatic tissue, in our expression analysis and showed a certain level of their expression, we thought it would be more meaningful to examine their expression in germline tissues.We would like to also include ovary as the female germline tissue, but it is unavailable across multiple species.For these reasons, we would like to argue that our focus on expression in testis tissue is justifiable.If you still have some concerns, we would welcome some specific suggestions for improving our writing.
Lastly, you mentioned that you could not see the supplementary files associated to the manuscript.However, I check the link to the file, https://identifiers.org/biostudies:S-BSST1030, under the "Data availability" section, is working properly.On this linked page, there are 3 listed files you can click and download or you can choose to download all files (a button below the file list).We wonder if you could check it again and let us know if you can access properly.This is one of the public data repositories recommended by the journal.
Thank you in advance for your second read and further comments or full approval.

Competing Interests: none
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.The composition of diallelic DNA transposons and retro-DNAs by family in the ten primate genomes.Horizontal stack bar charts showing the family composition of diallelic DNA transposons (A) and retro-DNAs (B) in each of the 10 primate genomes.The color scheme is the same for both panels.

Figure 2 .
Figure 2. Examples of retro-DNAs in different primate genomes.A. A retro-DNA from the human genome (hg38_chr4:146335052-146335253) with the pre-integration allele from the chimpanzee genome (panTro 5_chr4:38758218-38758438). B. A retro-DNA from the green monkey genome (chlSab2_chr8:30005081-30005527) with the pre-integration allele from the gibbon genome (nomLeu3_chr8:37535028-37535236); C. A retro-DNA located from the green monkey genome (chlSab2_chrX:73456937-73457324) with the pre-integration allele from the orangutan genome (ponAbe2_chrX:82896142-82896360).D. A retro-DNA located from the human genome (hg38_chr4:38758216-38758442) with the pre-integration allele from green monkey genome (chlSab2_ch r27:11529606-11529817).In each panel, the sequence at the top is the insertion allele containing the retro-DNA and the sequence at the bottom is the pre-integration allele without the retro-DNA.The yellow highlights indicate TSDs, and the blue highlights indicate the DNA transposon sequences, while the purple highlights indicate possible polyA tail sequences.

Figure 3 .
Figure 3.A flow chart for identification of retro-DNAs.

Figure 4 .
Figure 4.A frequency of the Tigger1 subfamily DNA transposon consensus sequence used for retro-DNA sequences.The plot is based on the data for a total of 41 non-redundant retro-DNA entries from the Tigger1 subfamily.

Figure 5 .
Figure 5. Sequence motifs of pre-integration sites and target site duplications (TSDs) length distribution pattern for retro-DNAs.A. Sequence motif logos for retro-DNAs at the integration sites.B. Sequence motif logos for human-specific L1s at the integration sites, adopted from authors' publication.3C. A line plot showing the distribution of TSD length for retro-DNAs.D. A line plot showing the distribution of TSD length for human-specific L1s, adopted from authors' publication.3

Figure 6 .
Figure 6.The evolutionary timeline of the retro-DNA insertions during the evolution of the ten primate genomes.A. A rooted phylogenetic tree of the ten primate genomes from the TimeTree database (http://www.timetree.org/).The numeric values below each branch represent the number of retro-DNA insertion events happened during the corresponding period of primate evolution.The numeric value above each branch represents the millions of years (Mya) for that branch.The evolutionary time for marmoset has been manually corrected from 21.58 MY to 51.02 MY for the correlation analysis in panel B. The table insert below the tree shows the distribution of the retro-DNAs by the degree of conservation among the genomes as measured by the number of genomes owning a retro-DNA.B. A scatter plot between the number of retro-DNA insertion events and their evolutionary age based on the data in panel A. The trend line shows that the number of retro-DNA insertion events is positively correlated with the relative evolutionary distance (R 2 = 0.919).

Figure 7 .
Figure 7. Multiple sequence alignment and phylogenetic analysis of retro-DNAs. A. Multiple sequence alignment for a retro-DNA located in the human genome (hg38_chr4:146335052-146335253, the same entry in Figure1A) and the corresponding pre-integration sequences from the other eight primate genomes.The pre-integration sequence from the marmoset genome was not identified likely due to the high level of sequence divergence.B. Multiple sequence alignment for the sequences of a retro-DNA shared among green monkey, crab-eating macaque and rhesus genomes (chlSab2_chr8:30005081-30005527, macFas5_chr8:32527581-32528029, and rheMac8_chr8:31992158-31992606) with the flanking sequences, along with their orthologous pre-integration sequences from 7 other primate genomes.The red highlights indicate possible polyA tails with variable lengths across genomes, while the yellow highlights show the observed target site duplications (TSDs).

Figure 8 .
Figure 8. Sequence alignment and phylogenetic analysis of a human retro-DNA, its parent copy in the same genome, and its orthologous copies in other genomes.A. Multiple sequence alignment for a retro-DNA in the human genome (hg38_chr4:146335052-146335253) and its parent copy (hg38_chr9:70197633-70197828, showing only the sequence aligned with the retro-DNA) plus the orthologous sequences of the parent copy from the other 9 non-human primate genomes.The red arrow indicates the retro-DNA entry, while the blue arrow indicates the parent copy.Shared SNPs in red vertical boxes are seen among members of the Hominidae group.B. Phylogenetic analysis of the 11 DNA sequences from the 10 primate genomes shown in A using the Maximum Likelihood method and Tamura-Nei model.62The bootstrapped consensus tree inferred from 500 replicates63 is used to represent the evolutionary history of the taxa involved.Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed.The percentage of replicating trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches.63Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach followed by selecting the topology with superior likelihood value in logarithmic scale.This analysis involved 11 DNA sequences with a total of 222 positions in the final dataset.

Figure 9 .
Figure 9.The expression level of retro-DNAs and their parent sites in three human testis transcriptomes.A. A scatter plot based on 66 retro-DNA/parent site pairs which show a certain level of expression (fpkm > 0) for the retro-DNA and/or parent site.The two data points in red with the same value for the parent but different values for the retro-DNA copies point to the same parent copy in human at hg38:chr5:1570263-1570333, and the two data points in blue point to the same parent copy in human at chr5:259441-262665.B. Box plots showing the expression levels of the 66 retro-DNAs and parent sites divided into genic and intergenic groups.Expression data was based on the average fpkm value in the three human testis transcriptomes.

5 . References 1 . 1 Reviewer
Ma H, Wang M, Zhang YE, Tan S: The power of.J Genet Genomics.2023; 50 (7): 462-472 PubMed Abstract | Publisher Full Text 2. Zingler N, Willhoeft U, Brose HP, Schoder V, et al.: Analysis of 5' junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining.Genome Res.2005; 15 (6): 780-9 PubMed Abstract | Publisher Full Text Is the work clearly and accurately presented and does it cite the current literature?YesIs the study design appropriate and is the work technically sound?YesAre sufficient details of methods and analysis provided to allow replication by others?YesIf applicable, is the statistical analysis and its interpretation appropriate?YesAre all the source data underlying the results available to ensure full reproducibility?YesAre the conclusions drawn adequately supported by the results?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Transposon, TE-mediated gene duplication, genome evolution, new gene evolution I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Version Report 26 March 2024 https://doi.org/10.5256/f1000research.142770.r251957© 2024 Marsano R.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Rene Massimiliano MarsanoUniversità degli Studi di Bari "Aldo Moro", Bari, Italy In this manuscript, the Authors conduct a comparative analysis of 10 primate genomes unveiling the identification of a new type of mobile elements, which they call "retro-DNAs", displaying combined features of two Class I (non-LTR retrotransposons) and Class II (DNA transposons) elements.The Authors initially annotated DNA transposons (remnants) in the analyzed genomes and subsequently have characterized a subset of sequences featured by an unusually long TSD and a poly-A tail.Furthermore, the Authors inferred the lineage specificity of retro-DNA and determine the expression level of retro-DNA sequences identified.

Table 1 .
Summary of DNA transposons in the 10 primate genomes.

Table 3 .
The composition of retro-DNA by family and the size information.

Table 5 .
The numbers of expressed retro-DNAs and parent sites in 21 primate transcriptomes.