In Humans, Members of the Alu Family of Sine Elements:
NAR Genom Bioinform. 2020 Mar; two(one): lqz023.
Few SINEs of life: Alu elements have piddling bear witness for biological relevance despite elevated translation
Laura Martinez-Gomez
i Bioinformatics Unit of measurement, Spanish National Cancer Research Heart, 28029 Madrid, Espana
Federico Abascal
2 Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
Irwin Jungreis
3 MIT Informatics and Artificial Intelligence Laboratory, Cambridge, MA and Broad Constitute of MIT and Harvard, Cambridge, MA 02139, Us
Fernando Pozo
iv Bioinformatics Unit, Spanish National Cancer Inquiry Centre, 28029 Madrid, Espana
Manolis Kellis
3 MIT Informatics and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
Jonathan M Mudge
5 European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, Britain
Michael Fifty Tress
four Bioinformatics Unit, Spanish National Cancer Research Centre, 28029 Madrid, Espana
Received 2019 Sep 8; Revised 2019 October thirty; Accepted 2019 Dec 12.
Abstract
Transposable elements colonize genomes and with time may end up being incorporated into functional regions. SINE Alu elements, which appeared in the primate lineage, are ubiquitous in the human genome and more than a one thousand overlap annotated coding exons. Although virtually all Alu-derived coding exons appear to be in alternative transcripts, they have been incorporated into the primary coding transcript in at least 11 genes. The extent to which Alu regions are incorporated into functional proteins is unclear, but we detected reliable peptide testify to back up the translation to protein of 33 Alu-derived exons. All but i of the Alu elements for which we detected peptides were frame-preserving and there was proportionally seven times more than peptide prove for Alu elements as for other primate exons. Despite this potent evidence for translation to protein we found no prove of pick, either from cross species alignments or human being population variation information, among these Alu-derived exons. Overall, our results confirm that SINE Alu elements accept contributed to the expansion of the human proteome, and this contribution appears to exist stronger than might be expected over such a relatively short evolutionary timeframe. Despite this, the biological relevance of these modifications remains open to question.
INTRODUCTION
Transposable elements are mobile DNA sequences that are able to copy themselves into new genomic locations (one). Approximately one-half the human genome is made up of active and inactive transposable chemical element segments (2–4) but the actual proportion of mobile element-derived sequences in the human being genome may exist considerably college since many inactive mobile elements have diverged across the detection of normal search algorithms (5).
Transposable elements tin can be divided into 4 major and many smaller classes (2). Deoxyribonucleic acid transposons encode the transposase protein, which they demand to cutting and paste themselves into new genomic regions (6). At that place are iii types of retrotransposons that use RNA intermediates to copy themselves throughout the genome (7). Long concluding repeat (LTR) retrotransposons are derived from endogenous retroviruses with LTRs, most of which are no longer active in the human genome (eight). Non-LTR retrotransposons are made up of long interspersed nuclear elements (LINEs), which, like the LTRs, encode a reverse transcriptase, and curt interspersed nuclear elements (SINEs), which exercise not encode any ORF and rely on the LINEs to carry out the copying process (7).
Active transposons in the human genome are relatively infrequent and are vastly outnumbered past a 'graveyard' of fossil transposon copies (iii). Active retrotransposons exist among the non-LTR retrotransposons, including LINE-1, SINE Alu and SINE-VNTR-Alu (SVA) elements (three). These three families, which together make up more than than a quarter of the human genome, have appeared and proliferated over the past 80 million years (ix). All the same, nigh copies of these retrotransposons are no longer agile due to decay past truncations and mutations. For example, although there are more than 500 000 copies of the LINE-ane retrotransposon in the human being genome (x), fewer than 100 copies are yet intact and capable of transposition (11,12).
Accumulation of transposable elements has been shown to have a deleterious effect on fitness (xiii) and their presence has been associated with many diseases (14,15). Nevertheless, with fourth dimension transposable element sequences can also add to the functionality of genomic features through a process of co-pick in which the transposable chemical element sequence, or part of it, is recruited to perform some function. The incorporation of transposable elements (exaptation) has been shown to contribute to the evolution of regulatory motifs (16), promoters (17) and lncRNA (xviii) amongst others, and transposable elements have been co-opted into ancient poly peptide-coding genes, either in their primary isoform (19–21) or as alternative splice variants (22).
The SINE Alu family unit of retrotransposons are primate-specific elements (23) that derived from the small cytoplasmic 7SL RNA and are ∼300 nt long. The majority map to non-functional regions of introns or intergenic sequences (24). Alu elements can be divided into three big sub-families. The oldest, the AluJ sub-family, arose 65 million years ago and has go entirely extinct through deleterious sequence changes (25). The AluS family evolved 30 million years ago and most all elements are fossils, though some sub-families have been found to contain active members (25). Nigh all active Alu elements are from the youngest subfamily, AluY (26), though not all AluY elements are active. Like other transposable elements, Alu elements are potentially deleterious (27,28).
Unlike most transposable elements, Alu elements have a pair of dinucleotides that can form a weak 3′ splice site and facilitate their conversion into exons (29). In addition, 5′ splice sites (30) and polyadenylation sites (31) can be generated from a minimal number of base substitutions. Sorek et al (32) establish that while SINE Alu elements are incorporated into exons, they are found predominantly in culling exons rather than constitutive exons. These alternative exons are included in transcripts at lower frequencies than alternatively spliced exons derived from other sources, and they found that the vast bulk would lead to a frameshift or a premature termination codon. Notwithstanding, since exons generated from Alu elements are almost always alternatively spliced, the main isoform is intact, assuasive the Alu exons to learn functionality over fourth dimension (29).
It is non clear to what extent exaptation of primate-specific Alu elements contributes to cellular proteins. Gotea and Makałowski (20) concluded that functional proteins were unlikely to incorporate regions derived from immature transposable elements like LINE-1 and Alu. All the same support for the incorporation of Alu elements in coding genes has come from microarrays (33) and proteomics data (34). Lin et al (34) found peptide show for 85 Alu-derived exons, which led them to suggest that Alu elements may be a substantial source of novel coding exons and may represent species-specific differences betwixt humans and other primates. However, the peptides that supported these 85 Alu-derived exons came from the PRIDE proteomics database (35). While the PRIDE database is an important repository of experimental information, it is uncurated and the faux discovery rate cannot easily be controlled in such a huge database (36). Because of this, many novel sequences identified solely via PRIDE are likely to be false positives (37,38). The Lin et al. study (34) just managed to validate two of the Alu-derived exons when they searched the FDR-controlled Peptide Atlas database (39).
Hither we investigate to what extent SINE Alu elements are incorporated into coding genes in the man reference ready and attempt to determine what proportion of the Alu elements that overlap coding exons are likely to code for functional proteins.
MATERIALS AND METHODS
Human being reference set
The homo reference cistron set used in this study was v28 of the GENCODE manual annotation (twoscore), which is equivalent to Ensembl 92 (41). The GENCODE v28 factor set is annotated with 97 713 protein-coding transcripts.
APPRIS
The APPRIS database (42) annotates splice isoforms with structural and functional information and cantankerous-species conservation. Information technology also selects a single poly peptide sequence unique isoform as the principal isoform for that gene (43). We have shown that most genes have a main isoform at the cellular level (44) and that the principal isoforms selected by APPRIS are a highly reliable predictor of this main cellular isoform (44). Transcripts from the GENCODE v28 reference set were tagged every bit principal or alternative by the APPRIS database. The distinction can also be fabricated at the level of exons. We tagged exons whose translation would exist included in the principal isoform equally primary exons and the balance, exons that vest exclusively to culling splice variants, were tagged as culling exons.
RepeatMasker
RepeatMasker regions [Smit AFA, Hubley R and Green P, http://repeatmasker.org] were obtained from the UCSC genome browser at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.out.gz and mapped to transcripts from the GENCODE v28 reference set. For the SINE Alu analysis if a transposon mapped to both master and alternative isoforms, we counted just the principal isoform. Where a transposon or repeat mapped to more than one gene (generally where the transposon was nowadays in a coding gene and in a read-through factor), we merely counted the transposon once.
Selection tests
Using human population variation data (45) we estimated a global dN/dS value with the dNdScv R parcel (46) for sets of exons overlapping elementary repeats, low complexity regions, and transposable elements (all defined by RepeatMasker). dNdScv reports the ratio of the non-synonymous to synonymous exchange rates (dN/dS). Although dNdScv was originally designed for cancer genomic studies, it tin and has been used to quantify option in population variation data (46).
A dN/dS lower than ane implies purifying selection. Nether purifying selection, dN/dS values are expected to exist lower for mutual alleles than for rare alleles. Values of dN/dS close to one for both rare and common alleles are uniform with neutral evolution, but can as well mean there is not plenty statistical power to infer negative or positive selection, or too that there is a perfect balance between negative and positive choice.
To estimate dN/dS ratios cross-species nosotros obtained primate CDS alignments from the 100 vertebrate alignments generated with MultiZ (47) for each Alu-containing exon or exon fraction with evidence of protein expression. Alignments were visually inspected for frame-shifts and STOP codons and species carrying whatever of these were discarded from dN/dS calculations. To proceeds statistical power, the alignments of the coding portions of the 36 Alu elements with peptide evidence were concatenated into a single alignment. Based on this alignment a phylogenetic tree was inferred with Phyml 3.0 (48), selecting the all-time fit model with SMS (49). Then we used codeml from the PAML package (50) to optimize branch lengths, approximate dN/dS ratios and calculate likelihoods. The likelihood of a M0 model with a complimentary dN/dS ratio parameter was compared to the null hypothesis in which dN/dS was fixed at i (neutral development). P-values were calculated using a Likelihood Ratio Test (LRT) with ane caste of freedom. We tested three unlike alignments/trees: one containing all simians (Greenish monkey, Marmoset, Orangutan, Human, Chimp, Gorilla, Gibbon, Squirrel monkey, Baboon, Rhesus and Crab eating macaque), 1 containing apes (Orangutan, Human being, Chimp, Gorilla and Gibbon), and one with just Chimp and Human. In addition, we conducted a similar analysis but fitting M0 selection models separately for each private exon so gathering all the individual likelihoods together (sum of Log-likelihoods). A LRT with degrees of freedom equal to the number of exons tested was conducted to compare the neutral evolution and selection models.
We as well carried out an analysis of selective pressure within primates using PhyloCSF (51), which uses likelihood ratios calculated from multi-species alignments and pre-computed exchange frequencies to determine whether a given nucleotide sequence is probable to represent a functional, conserved protein-coding sequence. Scores were calculated for the simian subset of the 100-vertebrates MultiZ alignment and the primate subset (simian plus Bushbaby) using the PhyloCSF 'mle' option. A P-value was calculated for each region by estimating the probability a non-coding region of the aforementioned length would get the same or college PhyloCSF score, using the non-coding model previously described for PhyloCSF-psi (51), with a Holm–Bonferroni correction applied for the number of regions tested (36).
Cistron family analysis
Nosotros performed a phylostratification analysis following a previously described pipeline (52) based on the cistron family phylogenetic reconstructions of Ensembl Compara (53). Compara v95 is constructed out of genes from 152 species, providing 43,716 annotated gene family copse. Only species with enough coverage (>5×) were considered for the analysis. Compara assigns the speciation or duplication events represented by each internal tree node to the phylogenetic level in which these events were detected (53).
To guess the gene family unit age and the individual cistron historic period for all protein coding genes annotated in GENCODE v28 homo coding genes were classified in the following classes or phylostrata: Fungi/Metazoa, Bilateria, Chordata, Vertebrata, Euteleostomi, Sarcopterygii, Tetrapoda, Amniota, Mammalia, Theria, Eutheria, Boreoeutheria, Primates, Simiiformes, Catarrhini, Hominoidea, Hominidae, HomoPanGorilla and Homo sapiens.
Cistron family age was divers as the age class at the root of the family tree (the oldest mutual antecedent with a member of the gene family), while gene age is the phylostratum in which the most contempo genomic event took place. Gene age for duplicated genes represents the phylostratum of the terminal duplication, whereas gene historic period always agrees with family factor age for genes without a detectable duplication origin in their gene trees. Duplication events with a consistency score (54) below 0.iii were tagged as unclear and nodes with a score of 0 were dismissed from the analysis.
Primate-derived exons
To determine whether an exon arose in the primate clade we defined as alternative all those exons that did non overlap with any exon integrated in a main isoform in APPRIS. Nosotros removed sequences shorter than 45 bases, every bit these exons are likely to be besides short to identify homology in the TBLASTN search (55). There were 12 540 exons in the GENCODE v28 gene fix that met these criteria. The translated sequences of these exons were used as query to search confronting 6 dissimilar mammalian not-primate genomes, true cat, dog, mouse, sheep, polar acquit and pig, retrieved from Ensembl v95 (41), equivalent to GENCODE v28. In the TBLASTN search we turned off low complexity filtering, defined gap opening and extension penalties of 13 and 1, respectively, and set up a maximum E-value threshold of 0.1. All exons that had meaning homology striking in one of these species were discarded. We too used APPRIS annotations to filter out non-primate exons. Any culling exon that formed role of a transcript with a conservation score of more than than one.5 (conservation in human plus chimp) was also discarded from the primate exon listing. Nosotros defined 7566 primate-derived culling exons. A total of 777 of these overlapped an Alu element so were discarded. The last list of exons that nosotros were not able to map to whatever of the vi non-primate mammalian species totaled 6789 exons.
Proteomics assay
The proteomics analysis was carried out using the January 2019 human build of PeptideAtlas (39). We mapped peptides validated by PeptideAtlas to the 1224 Alu elements in the human proteome and to the 6789 culling primate-derived exons. The advantage of using the PeptideAtlas database is that identifications from large-scale MS experiments are outset subject area to a pre-processing step that reduces the numbers of false positive matches. For this assay we also rejected non-tryptic peptides, peptides that mapped to more than than 1 cistron and peptides shorter than seven amino acids.
The remaining peptides mapping to SINE Alu regions or primate-derived culling exons were validated past transmission inspection of the spectra. Skilful curation of peptide spectrum matches is an essential step when validating peptides that place novel coding regions. But those peptide-spectrum matches that passed manual inspection were deemed sufficiently reliable to confirm the translation of the inserted Alu elements or primate-derived exons.
Transcript bear witness
Pext (proportion expressed across transcripts) scores are normalized transcript level measures of RNAseq expression. They are generated as role of the GNOMAD projection from the large-scale RNAseq analyses carried out by the GTex consortium (56). Pext scores have been shown to distinguish highly conserved exons from exons with poor conservation. Here the Pext scores were used to measure the inclusion rates of Alu-derived exons and primate-derived exons with peptide evidence.
cDNA back up for Alu-derived exons and primate-derived exons with peptide bear witness came from the European Nucleotide Archive (57) and NCBI RefSeq (58). Exons were counted as supported past a cDNA when the cDNA mapped to the 5′ and 3′ boundaries of the exon. cDNAs that included the exon equally part of a retained intron were not counted as supporting the exon.
RESULTS
According to RepeatMasker remnants of transposon-based elements (not including regions predicted as Simple Repeats and Low Complexity) make upwardly just over half of the bases in the man reference genome (50.66%). More than 20% of the fragments predicted as transposable element-derived in the man genome are SINE Alu elements, though LINE/L1 elements are the nearly common past number of bases because LINE/L1 elements are longer than Alu elements. Past bases LINE/L1 elements make upwards 17.3% of the genome compared to the ten.iv% of the genome that is contributed by Alu elements (Supplementary Figure S1A).
Transposon-based elements were predicted to overlap CDS in 9% of GENCODE v28 transcripts (40). Near 25% of the transposable elements that overlap coding exons are SINE Alu elements. Alu elements overlapped a total of 1224 distinct coding exons. The next most common transposable element classes were SINE MIR (789) and LINE/L1 (684). Almost all mutual transposon classes were found in much lower proportions within coding sequences (CDS) than within the whole genome (Supplementary Figure S1B); Alu elements total simply 0.23% of the bases in the human coding reference set and LINE/L1 elements 0.12%. This is what would exist expected if the presence of transposable elements were selected against in coding exons. All the same, some transposable element families are exceptions to the rule. The proportion of LINE/RTE-BovB elements are almost every bit loftier in CDS regions as they are in the whole genome, and Deoxyribonucleic acid/hAT-Air-conditioning elements are actually more prevalent in CDS than in the genome equally a whole (Figure 1A).
Taken at face up value these proportions might suggest Dna/hat-Ac transposable elements are non selected against in CDS regions. However, these are ancient transposable elements (ii,59). While Dna/hAT-Ac elements preserved in CDS regions are still detectable by RepeatMasker, those outside CDS regions will non have been subject to purifying selection and may no longer be recognizable as deriving from transposable elements. This suggests that many of the ancient Dna/lid-Air conditioning elements accept been co-opted and are evolving under purifying pick. The same is probably true for many LINE/RTE-BovB elements.
Choice
In order to determine whether transposable elements that overlap annotated coding exons accept acquired functional importance as proteins, nosotros measured selection using the ratio of the rates of not-synonymous and synonymous changes (dN/dS). Nosotros estimated a global dN/dS value for exons overlapping each of the nigh common categories of RepeatMasker regions using dNdScv (46). The results (Effigy 1B) suggest that in full general DNA/hAT-Air conditioning, and LINE/RTE-BovB transposable elements (along with LINE1/CR1 elements, simple repeats and low complexity regions) are under purifying selection, as might be expected from their partitioning between genome and proteome (Figure 1A), whereas exons overlapping most other elements (including SINE Alu elements) are non, in full general, under choice and are therefore less probable to have functional importance.
SINE Alu elements locate preferentially to alternative exons
The APPRIS database (42) divides transcripts into those that give rise to the principal poly peptide isoform and those that if translated would produce alternative isoforms (see 'Materials and Methods' section for more than details). Exons that overlapped all RepeatMasker transposon classes were separated into those found in the APPRIS-defined principal transcripts, and those found solely in culling transcripts.
Alternative exons make up just over 10% of the exons in the reference genome, so if transposable elements were randomly distributed, we would expect to find i in 10 transposable elements in alternative elements and the other xc% should overlap with primary exons. This is true for exon-overlapping simple repeats (87.8%) and some older transposable elements are also constitute at the expected frequency in principal exons, including DNA/lid-ac (88.8%), SINE/SS-Deu-L2 (83.3%), SINE/tRNA (78.9%) and LINE/RTE-BovB (85.4%) elements (Supplementary Effigy S2). By dissimilarity, just 9.2% of SINE Alu elements were institute in principal exons.
It should be noted that APPRIS determines principal isoforms based on conserved structural and functional features and cross-species conservation. Since Alu elements arose in the primate lineage and do not form part of conserved functional or structural domains, we would expect few Alu element-derived exons to exist classified as main by APPRIS. In any example, APPRIS predictions are backed up by transcript level studies showing that internal exons overlapping Alu elements are predominantly alternatively spliced (32).
SINE Alu elements in the human reference genome
A full of 1074 singled-out coding genes in GENCODE v28 have coding exons that overlap SINE Alu elements. In that location are 1224 Alu elements that overlap coding exons, but several genes harbour more than than 1 element. For example ZNF506 contains 4 distinct Alu overlaps in alternative iii′ exons and 23 genes overlap iii different Alu elements.
Genes with coding regions that overlap Alu elements are significantly enriched in zinc finger motifs relative to the whole genome. A total of 93 genes are annotated with C2H2 zinc finger domains (Fisher'due south test, P-value of 9.iv e-16) according to SMART (threescore). But 1 other protein domain is significantly enriched in this set, KRAB domains (P-value of 4.viii e-24). KRAB domains are generally constitute in tandem with C2H2 zinc finger domains. Many of these genes are from the cluster of KRAB-ZNF genes at the centromere on chromosome 19. Just over half of these genes overlap a range of different Alu elements, including all vi members of the ZNF431 clade (61).
SINE Alu elements are more often plant in the concluding coding exon: about half of the coding exons that overlap Alu elements are 3′ CDS (591). Sixty per cent of the Alu elements that overlap zinc finger genes are establish in the final exon. This elevated number has two possible explanations. Information technology may be because Alu elements are likely to produce fewer deleterious effects when inserting into a 3′ exon, or it may be acquired past out of frame insertions that generate premature stop codons. The fact that Alu insertions can hands form polyadenylation signals (33) would conspicuously facilitate the establishment of 3′ exons.
Alu elements that insert into internal CDS may generate frameshifts in downstream exons. In fact 50.2% of annotated Alu elements that overlap internal or first CDS are predicted to lead to frameshifts. This is somewhat fewer than expected past chance and in contrast to what was found by Sorek et al. (32). This lower number may exist show in favour of these being truly functional exons, just information technology could too exist caused by systematic bias given the composition of Alu sequences.
Most all SINE Alu elements that overlap coding exons are inactive
More than 50% of the Alu elements that overlap exons are from the AluS family (55.ii%) confronting just 7.4% of the youngest Alu family (AluY family). The AluY sub-family itself is partly agile (28), but only three copies of elements from sub-families known to be agile (28) are annotated in (alternative) coding exons in the reference genome. The proportions of Alu sub-families overlapping coding exons are shown in Figure ii.
Over 37% of the Alu elements that overlap coding exons are from the older FRAM/FLAM or AluJ families, compared to just 29.4% across the whole genome (Supplementary Effigy S3). The deviation is significant in a Chi squared exam (<0.0001). This may be partly because older Alu elements are often no longer detectable outside of conserved regions such as coding exons.
The NPIPB sub-family
Genes with Alu-derived exons annotated in the reference genome have a similar historic period distribution to the residuum of the human being reference set up, except that there are proportionally more genes that have arisen in the primate lineage (Supplementary Figure S4). Though departure is significant (Chi-squared test, P-value of 0.00014), it is entirely due to the ten duplications in the NPIPB sub-family, which itself arose in the primate clade (62).
The 15 members of the nuclear pore complex-interacting protein family are primate specific and constitute in segmental duplications on chromosome sixteen (62). The nuclear pore complex-interacting proteins (NPIPs) are made up of one or two membrane-interacting regions, a central coiled-coil domain and a variable number of C-terminal repeats. 3 subfamilies tin can exist distinguished by the length and limerick of the repeats; the NPIPA subfamily does not contain any SINE Alu elements, but RepeatMasker defines 2 distinct SINE Alu elements for each member of the two NPIPB sub-families, NPIPB3/4/5/eleven/13 and NPIPB6/7/viii/nine/xv. In fact 1 of the three distinct types of repeats that make upwardly the concluding exon in this family seems to accept derived from Alu elements (Figure 3). The NPIPB6/7/viii/9/xv sub-family as well has an Alu-derived insertion in the 2nd coding exon.
Phylogenetic reconstruction suggests that the NPIPB sub-families derived from the ancestral NPIPA in stepwise way and that the evolution of NPIPB sub-families within the great apes clade coincided with the insertion of Alu elements in the coding region and a number of further retrotransposon events within the 5′ and 3′ UTRs of the NPIPB sub-family unit members.
Since the duplications are so recent, the genes are very like. It is not easy to distinguish whether all annotated genes are coding, or whether merely some are coding and others are pseudogenes. Nonetheless, at to the lowest degree one member of the NPIPB6/seven/eight/9/15 sub-family has articulate evidence of protein expression in testis. All the peptide evidence in PeptideAtlas mapped to a unmarried cistron (NPIPB6), so NPIPB6 was used to stand for the whole sub-family.
Alu elements in master isoforms
Alu elements were predicted to be nowadays in the principal exons of 103 coding genes. We carried out a detailed manual assay of these genes to determine whether the Alu chemical element had been incorporated into the chief transcript or an alternative variant and whether or not the Alu elements were function of bona fide coding genes (63). Details of the manual annotation tin can be found in the Supplementary Results department.
We found that the Alu element forms office of the main coding isoform of x genes and all the members of the NPIPB sub-family unit (Table 1).
Tabular array 1.
Gene | Gene family age | Function |
---|---|---|
BEND2 | Euteleostomi | Unknown function. Expressed in testis. Alu element inserts a whole exon into the highly divergent N-terminal. |
HSD17B7 | Fungi-Metazoa | 3-keto-steroid reductase, function of the estrogen synthesis pathway. Adds eight amino acids to the N-terminal. |
NLRP1 | Euteleostomi | Office of the NLRP1 inflammasome (64). The Alu region corresponds to an inserted exon that adds 27 amino acids. |
NPIPB6 | Simiiformes | Unknown office. Expressed in testis. Represents a primate-derived sub-family with three Alu inserts. All three extend exons. |
TTF1 | Chordata | Transcript termination cistron in ribosome biogenesis. The Alu chemical element adds 23 amino acids to the C-final. |
USP19 | Fungi-Metazoa | A multi-functional deubiquitinating enzyme. The Alu element extends exon 2 by 46 amino acids. |
ZNF101 | Bilateria | Unknown role. The Alu element inserts 49 base pairs and a stop codon into the iii′ exon of the CDS. |
ZNF394 | Euteleostomi | A transcriptional repressor in MAP kinase signaling (65). The chemical element adds viii amino acids to the C-terminal. |
ZNF433 | Bilateria | Activation of beta-catenin/TCF signaling. The Alu region changes a unmarried amino acid at the C-terminal. |
ZNF669 | Bilateria | Unknown function. Adds 22 amino acids to the stop codon. |
ZNF91 | Bilateria | SVA transposable element repressor (66). The Alu element displaces 2 zinc finger motifs while calculation 33 amino acids. |
Five of the 11 genes in which Alu elements have modified the main coding sequence code forzinc finger proteins and all but ZNF394 are primate-specific duplications of the same zinc finger family (61). The nigh interesting case is ZNF91. Here the Alu element, which only appears in the corking apes, adds 33 amino acids to the C-terminal while displacing eight zinc-bounden residues from the ancestral protein. A farther alter in the homo lineage led to the upstream insertion of seven zinc finger bounden motifs. The gain of these zinc fingers has enabled ZNF91 to become a repressor of SVA transposable elements (66). It is not clear whether the Alu insertion besides contributes to this role.
Eight of the Alu elements, including those in all five zinc finger genes, would extend the C-terminal of the resulting poly peptide. Information technology is known that zinc finger proteins are highly plastic at their C-terminals (67). All the elements, except those in BEND2 and NLRP1, have integrated into the principal isoform past 'hijacking' existing coding exons rather than creating new coding exons.
Peptide bear witness for SINE Alu functionality
It is possible that other Alu-derived exons, besides those present in principal isoforms, take evidence for functionality. Nosotros attempted to confirm the translation to poly peptide of the SINE Alu elements in the human proteome. Nosotros searched the PeptideAtlas database for validated peptides that mapped to the 1224 unique Alu-derived exons and manually verified the peptide-spectrum matches (PSMs) for these peptides (see 'Materials and Methods' section).
The peptide prove validated the translation of 33 of Alu-derived exons from 29 different genes (SLC3A2 and NPIPB6 both contain three Alu-derived exons and all 3 were validated past spectra from PeptideAtlas). The 29 genes with translated Alu-derived exons are shown in Supplementary Table S1.
There are validated peptides for 8 of the 13 of the Alu elements in principal isoforms, including all three SINE Alu regions in NPIPB6. The elements in ZNF91 and USP19 are supported by two non-overlapping peptides. Although nosotros practise non detect peptides that map to the Alu elements present in zinc finger proteins ZNF101, ZNF394, and ZNF669, in that location are peptides that uniquely place the exons that the Alu elements are role of, so we tin can assume that all these Alu elements are translated besides.
The remaining 25 Alu elements with validated translation are all in alternative isoforms, though some of the variants take so much peptide and RNAseq prove that they could be considered at to the lowest degree as strong alternative isoforms. The alternative C-concluding in CD55 is supported by three not-overlapping peptides and the inserted Alu region in NEK4 is supported by four peptides. The peptide data for these two Alu regions suggests that the Alu exons have at least as much support as the bequeathed isoforms.
20 two of the Alu elements for which we found valid peptides are inserts in the ancestral transcripts, and all only one insert was frame preserving (the indel in DLGAP5 adds 4 amino acids and a stop codon from the last coding exon of the principal variant as a result of a frameshift). Ix of the remaining Alu-derived exons (and DLGAP5) would touch the C-terminal of the proteins while two extend the Northward-terminal.
All SINE Alu elements for which we constitute verified peptide show modified existing CDS. In all cases the ancestral gene family unit predated the Alu element insertions, though we cannot be sure whether SINE Alu insertion occurred before or after gene duplication for genes ZNF101, ZNF195 and ZNF669.
Nosotros crosschecked the 85 genes identified in the Lin et al. (34) paper confronting evidence from the PeptideAtlas database. Nosotros validated just v of the peptides detected by Lin et al. for SINE Alu elements.
How do Alu-derived exons compare to other primate-derived exons?
In order to determine whether the peptide evidence we found for 33 SINE Alu elements was similar to what might be expected for primate derived culling exons, nosotros repeated the PeptideAtlas assay with exons that arose in the primate clade as comparing. Nosotros just looked at primate exons tagged by APPRIS as alternative considering exons within primary isoforms would exist expected to form office of the expressed proteins (nosotros institute peptides for eight of the 13 SINE Alu overlaps in verified chief exons).
We curated a prepare of 6789 primate-derived culling exons (encounter methods section for details). In comparison the curated ready of alternative SINE Alu-derived exons totalled 777 exons. SINE Alu elements make upwards ten.4% of the bases in the man genome and just over 10% of annotated primate exons are Alu-derived, which suggests that Alu elements are not whatsoever more likely to be annotated as coding exons than other non-coding region.
We mapped peptides from the PeptideAtlas database to the exons (as described in the 'Materials and Methods' section). Later transmission curation we institute reliable peptide identifications for only 25 primate-derived alternative exons, 0.37%. As a comparison, we found peptide evidence for 22 of the 777 SINE Alu-derived alternative exons (two.83%), proportionally more than seven times as much and significantly more would be expected for standard primate-derived exons (P-value of <0.0001 in Chi-squared tests). This shows that a significantly higher proportion of SINE Alu elements are incorporated into expressed proteins than would be expected.
Transcript evidence
We analysed transcript testify in the form of cDNA support and Pext scores (normalized exon inclusion rates) for the 47 alternative exons with peptide evidence. There was more supporting transcript bear witness for the translated Alu-derived exons than for the translated primate-derived exons. cDNA evidence supported the expression of 19 of the 22 alternative Alu-derived exons confronting only 14 of the 25 primate-derived exons, while 8 of the 22 alternative Alu-derived exons had Pext scores >0.5, against none of the primate-derived alternative exons. The differences between the ii sets of exons are pregnant: Fisher's tests showed a P-value of 0.0293 for the differences in cDNA support and 0.001 for the Pext scores.
Several of the Alu-derived exons had higher tissue-specific expression patterns. For example the Alu-derived exon in DLGAP5 had an average Pext score of merely 0.ane, but was completely included in endocervix, while the inclusion of the 3′ Alu-derived exon in CMC2 was noticeably college in brain than in other tissues.
SINE Alu inserts and domain limerick conservation
Events that cause changes in Pfam (68) domain limerick tend not to be detected in proteomics experiments (69). This is presumably considering, like frame-changing indels, this would normally lead to gross functional changes in the protein and be selected against. Fifty-fifty though all detected SINE Alu chemical element inserts were frame preserving, six of the events for which we plant peptides would interruption Pfam functional domains.
While this is somewhat surprising, v of the half dozen domain-disrupting events may not actually have much effect on the functional domain. For example, the insertion in the domain in TKT is relatively short, occurs in a loop region, and the Pfam seed alignment (68) includes sequences with like sized inserts at the same position. In CMC2 the Alu exon removes the C-last portion of the Pfam domain, but the C-terminal swap does not bear upon the beta-hairpin that this protein forms, nor the conserved cysteines. The C-terminal of the Cmc1 domain that is broken by the SINE Alu insertion is not conserved in the Pfam seed alignment (Figure 4A). The A_deamin domain in RNA-editing deaminase 1 from cistron ADARB1 has ii conserved Northward- and C-terminal sections and a central linker section without conservation. Sequences from Danio Rerio, chicken and Xenopus are among those that likewise have insertions in this cardinal 'linker' region and the central linker region is but where the ADARB1 SINE Alu exon inserts. The insertion can be visualized mapped onto the crystallized structure (Figure 4B)—it inserts into an already disordered region away from the catalytic site, in contrast to what is reported by Lin et al.
SINE Alu chemical element translation and selection
The substantial evidence for the expression and translation of a small set of Alu-derived exons suggested that this subset of Alu elements might accept gained functional roles in the jail cell. We investigated whether there was data to support this hypothesis. We defined 'functional role' for the purpose of this analysis every bit having prove of protein-similar purifying selection (71). Although SINE Alu elements equally a whole are not under selective pressure level (Figure 1B), information technology is possible that the subset of Alu elements with evidence of translation is nether measurable selective constraints.
Using PAML we estimated dN/dS from concatenated primate alignments (50) of the coding portion of the 33 elements with peptide evidence and for the Alu elements that overlap expressed coding exons in ZNF101, ZNF394 and ZNF669 that nosotros can assume are also expressed as proteins. The estimated dN/dS values were non significantly different from one for the alignments of all simians, of apes, or of human being and chimp (encounter Supplementary Table S2). An alternative analysis fitting the pick models separately for each individual exon and then multiplying the resulting likelihoods did not reject the naught hypothesis of neutral evolution either. Furthermore, we found cease gains and frame-shifts in 24 of the 36 Alu-derived exons across primates, suggesting that these Alu elements have not established important functional roles beyond the primate clade. In order to test for significance, we looked at stop gains and frame-shifts in the primate clade for 36 exons of like size selected at random from the 32 genes with Alu-derived exons that we analysed. Merely iv of these exons had frame-shifts or end gains in the primate clade.
Assay of the aforementioned 36 elements using PhyloCSF (51), a mensurate of evolutionary coding potential, produced similar conclusions. The average PhyloCSF score for the coding portion of these Alu elements using alignments of the primate and simian clades is negative, suggesting that these regions accept not been under protein-coding constraint in aggregate. However, at that place is one case for which we found weak evidence for coding selection. The eight-codon Alu-derived region in ZNF394 has a PhyloCSF score of 29.four, which is higher than would exist expected for a region of that length that was not nether protein-coding selection (uncorrected P = 0.003, multiple-hypothesis corrected P = 0.12). Farther back up comes from the fact that there are no indels and the finish codon immediately post-obit it is perfectly conserved (CMC2 is the but other C-final improver that conserves its stop codon throughout primates). The alignment of the ZNF394 region can be seen in Supplementary Figure S5.
From the bespeak of view of homo population variation there is non plenty data to assess selection on this small gear up of exons. Still, just eight of the 35 variants with a MAF greater than 0.1% are synonymous, while half-dozen (17.i%) are high impact (4 end gains and ii frameshifts). By style of comparison merely 3 of the 271 variants with an MAF in a higher place 0.i% in not Alu-derived exons from the relevant principal transcripts were loftier impact variants (ane.1%). The ii proportions of high bear on variants are significantly different (Fisher's exact test P-value of 0.0002). The high bear upon variants in the Alu-derived exons occurred in both chief (2) and alternative (four) Alu-derived exons. Although the data is scarce, the frequency of high bear on variants further supports the hypothesis that these Alu-derived exons have not yet gained relevant functions.
Give-and-take
SINE Alu elements make upward more 10% of the human genome; in total the genome has been colonized by close to i.ii million SINE Alu fragments. The vast majority map to intergenic and intronic regions and just 1224 Alu fragments (0.1%) overlap annotated coding exons. The reduced proportion of SINE Alu elements in exons suggests that there is selective pressure against their inclusion in coding regions.
Even where Alu fragments overlap coding exons, they do not appear to be functionally important. Coding regions that derive from SINE Alu elements are non under selective pressure level and almost all annotated Alu-derived exons are found in alternative coding transcripts. Trivial is known about the cellular roles of whatsoever of these Alu-derived exons, though the Alu-derived exon in LIN28B has been shown to be necessary for oncogene activation (72). Alu-derived coding exons are highly enriched in zinc finger proteins (67).
Although Alu elements as a whole are non under selective pressure, nosotros discover that Alu-derived exons accept become office of the primary splice variant in at to the lowest degree 11 coding genes. In all but two genes the Alu elements have 'colonized' the principal isoform by merging with existing coding exons. This is possibly not surprising since merging with functioning coding exons is likely to be a shortcut to condign established as part of the chief transcript.
Big-calibration proteomics experiments tend not to notice evidence for alternative splice variants (69), nor genes that accept evolved de novo in the primate lineage (63), so we would await to find little evidence of translation for Alu-derived exons. Despite this there is clear evidence for the translation of 33 Alu-derived exons and peptide and transcript evidence suggests that many of these alternative exons are strongly expressed. All but one of the 22 insertion events we detected were in-frame, significantly more than would be expected by run a risk. The proportion of SINE Alu-derived exons detected in large-scale proteomics experiments was also significantly college than expected; more than than vii times higher than that of other primate-derived exons. This may be related to the splice signals nowadays in Alu elements (29,30). Transcription evidence supported the strength of expression of these Alu-derived exons: both inclusion rates and cDNA back up were significantly stronger for the Alu-derived exons with peptide evidence than they were for the other primate-derived exons with peptide testify. A small subset of the 1224 Alu-derived exons has clearly added to the man proteome.
All the evidence suggests that these SINE Alu elements accept added to the human proteome via cistron modification rather than de novo gene generation. In 26 of the 29 genes with peptide evidence, the SINE Alu elements added to an established (often ancient) protein-coding gene, while in the remaining three genes the SINE Alu event may take been concurrent with, or but afterwards, a gene duplication. We find no evidence for the conversion of any SINE Alu element into a de novo homo coding gene.
Despite the lack of evidence for choice in SINE Alu-derived coding exons at the population level, nosotros expected to observe some evidence of evolutionary force per unit area for those Alu-derived exons with testify of translation. However, we found none. In that location was no evidence for any selection from cross-species alignments within the primate clade or even amongst great apes. While there were too few variants in common alleles to be able to describe any conclusions about purifying or positive selection from man population variation, the sizable frequency of high impact variations among the common variants supports the possibility that even those Alu-derived exons with peptide evidence have yet to proceeds biologically important roles. Overall it seems that although SINE Alu elements contribute to the man proteome, they add little to the range of protein functions.
Supplementary Textile
lqz023_Supplemental_Files
Notes
Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
FUNDING
National Man Genome Research Constitute of the National Institutes of Wellness [2 U41 HG007234 to M.K., I.J., Fifty.M., J.M.M., F.P., M.L.T., R01 HG004037 to I.J., M.K.].
Conflict of interest statement. None declared.
REFERENCES
i. McClintock B. Decision-making elements and the gene. Cold Jump Harb. Symp. Quant. Biol. 1956; 21:197–216. [PubMed] [Google Scholar]
two. Lander E.Due south., Linton L.Thousand., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle G., FitzHugh W. et al. .. Initial sequencing and assay of the human genome. Nature. 2001; 409:860–921. [PubMed] [Google Scholar]
3. Mills R.E., Bennett E.A., Iskow R.C., Devine South.E.. Which transposable elements are active in the human genome?. Trends Genet. 2007; 23:183–191. [PubMed] [Google Scholar]
4. Tang W., Mun South., Joshi A., Han K., Liang P.. Mobile elements contribute to the uniqueness of human genome with 15,000 human-specific insertions and 14 Mbp sequence increase. DNA Res. 2018; 25:521–533. [PMC free article] [PubMed] [Google Scholar]
5. de Koning A.P., Gu West., Castoe T.A., Batzer Thou.A., Pollock D.D.. Repetitive elements may comprise over 2-thirds of the human genome. PLoS Genet. 2001; 7:e1002384. [PMC gratuitous article] [PubMed] [Google Scholar]
6. Feschotte C., Pritham Due east.J.. DNA transposons and the development of eukaryotic genomes. Annu. Rev. Genet. 2007; 41:331–368. [PMC costless commodity] [PubMed] [Google Scholar]
7. Cordaux R., Batzer M.A.. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 2009; 10:691–703. [PMC gratis commodity] [PubMed] [Google Scholar]
8. Havecker E.R., Gao X., Voytas D.F.. The variety of LTR retrotransposons. Genome Biol. 2004; 5:225. [PMC free article] [PubMed] [Google Scholar]
ix. Konkel M.K., Walker J.A., Batzer M.A.. LINEs and SINEs of primate evolution. Evol. Anthropol. 2010; 19:236–249. [PMC gratis article] [PubMed] [Google Scholar]
x. Levin H.L., Moran J.V.. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 2011; 12:615–627. [PMC free commodity] [PubMed] [Google Scholar]
11. Brouha B., Schustak J., Badge R.M., Lutz-Prigge Due south., Farley A.H., Moran J.V., Kazazian H.H. Jr.. Hot L1s account for the bulk of retrotransposition in the homo population. Proc. Natl. Acad. Sci. U.S.A. 2003; 100:5280–5285. [PMC gratis article] [PubMed] [Google Scholar]
12. Beck C.R., Collier P., Macfarlane C., Malig M., Kidd J.M., Eichler Eastward.E., Badge R.M., Moran J.Five.. LINE-ane retrotransposition activity in human genomes. Cell. 2010; 141:1159–1170. [PMC complimentary article] [PubMed] [Google Scholar]
13. Pasyukova E.1000., Nuzhdin S.V., Morozova T.V., Mackay T.F.. Aggregating of transposable elements in the genome of Drosophila melanogaster is associated with a subtract in fitness. J. Hered. 2004; 95:284–290. [PubMed] [Google Scholar]
14. Reilly M.T., Faulkner Thou.J., Dubnau J., Ponomarev I., Gage F.H.. The part of transposable elements in health and diseases of the central nervous system. J. Neurosci. 2013; 33:17577–17586. [PMC free commodity] [PubMed] [Google Scholar]
15. Burns Yard.H. Transposable elements in cancer. Nat. Rev. Cancer. 2017; 17:415–424. [PubMed] [Google Scholar]
16. Feschotte C. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet. 2008; 9:397–405. [PMC complimentary commodity] [PubMed] [Google Scholar]
17. Cohen C.J, Lock W.Yard., Mager D.L.. Endogenous retroviral LTRs equally promoters for human genes: a critical assessment. Gene. 2009; 448:105–114. [PubMed] [Google Scholar]
xviii. Johnson R., Guigó R.. The RIDL hypothesis: transposable elements equally functional domains of long noncoding RNAs. RNA. 2014; twenty:959–976. [PMC free commodity] [PubMed] [Google Scholar]
19. Bejerano Thou., Lowe C.B., Ahituv N., King B., Siepel A., Salama S.R., Rubin E.M., Kent W.J., Haussler D.. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006; 441:87–xc. [PubMed] [Google Scholar]
xx. Gotea V., Makałowski W.. Practice transposable elements really contribute to proteomes?. Trends Genet. 2006; 22:260–267. [PubMed] [Google Scholar]
21. Tellier Grand., Chalmers R.. Human being SETMAR is a Dna sequence-specific histone-methylase with a broad effect on the transcriptome. Nucleic Acids Res. 2019; 47:122–133. [PMC free article] [PubMed] [Google Scholar]
22. Abascal F., Tress M.L., Valencia A.. Alternative splicing and co-pick of transposable elements: the case of TMPO/LAP2α and ZNF451 in mammals. Bioinformatics. 2015; 31:2257–2261. [PMC gratis article] [PubMed] [Google Scholar]
23. Kriegs J.O., Churakov G., Jurka J., Brosius J., Schmitz J.. Evolutionary history of 7SL RNA-derived SINEs in Supraprimates. Trends Genet. 2007; 23:158–161. [PubMed] [Google Scholar]
24. Krull Thou, Brosius J., Schmitz J.. Alu-SINE exonization: en route to protein-coding function. Mol. Biol. Evol. 2005; 22:1702–1711. [PubMed] [Google Scholar]
25. Bennett East.A., Keller H., Mills R.E., Schmidt S., Moran J.5., Weichenrieder O., Devine S.E.. Agile Alu retrotransposons in the homo genome. Genome Res. 2008; 18:1875–1883. [PMC free article] [PubMed] [Google Scholar]
26. Konkel Yard.K., Walker J.A., Hotard A.B., Ranck Grand.C., Fontenot C.C., Storer J., Stewart C., Marth G.T. 1000 Genomes Consortium 1000 Genomes Consortium Batzer M.A.. Sequence Assay and Characterization of Active Homo Alu Subfamilies Based on the thousand Genomes Pilot Project. Genome Biol. Evol. 2015; 7:2608–2622. [PMC free article] [PubMed] [Google Scholar]
27. Payer L.One thousand., Steranka J.P., Yang W.R., Kryatova 1000., Medabalimi S.I., Ardeljan D., Liu C., Boeke J.D., Avramopoulos D., Burns K.H.. Structural variants caused past Alu insertions are associated with risks for many human being diseases. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:E3984–E3992. [PMC gratis article] [PubMed] [Google Scholar]
28. Larsen P.A., Lutz Thousand.Westward., Hunnicutt K.E., Mihovilovic M., Saunders A.G., Yoder A.D., Roses A.D.. The Alu neurodegeneration hypothesis: a primate-specific mechanism for neuronal transcription noise, mitochondrial dysfunction, and manifestation of neurodegenerative illness. Alzheimers Dement. 2017; xiii:828–838. [PMC free article] [PubMed] [Google Scholar]
29. Lev-Maor Grand., Sorek R., Shomron Due north., Ast K.. The birth of an alternatively spliced exon: 3′ splice-selection in Alu exons. Science. 2003; 300:1288–1291. [PubMed] [Google Scholar]
xxx. Sorek R., Lev-Maor G., Reznik Thou., Dagan T., Belinky F., Graur D., Ast Grand.. Minimal weather condition for exonization of intronic sequences: five′ splice site germination in alu exons. Mol. Cell. 2004; xiv:221–231. [PubMed] [Google Scholar]
31. Lavi E., Carmel Fifty.. Alu exaptation enriches the homo transcriptome by introducing new gene ends. RNA Biol. 2018; 15:715–725. [PMC costless article] [PubMed] [Google Scholar]
32. Sorek R., Ast G., Graur D.. Alu-containing exons are alternatively spliced. Genome Res. 2002; 12:1060–1067. [PMC free article] [PubMed] [Google Scholar]
33. Lin L., Shen Due south., Tye A., Cai J.J., Jiang P., Davidson B.50., Xing Y.. Diverse splicing patterns of exonized Alu elements in homo tissues. PLoS Genet. 2008; four:e1000225. [PMC free article] [PubMed] [Google Scholar]
34. Lin L., Jiang P., Park J.Westward., Wang J., Lu Z.Ten., Lam M.P., Ping P., Xing Y.. The contribution of Alu exons to the man proteome. Genome Biol. 2016; 17:15. [PMC gratis article] [PubMed] [Google Scholar]
35. Vizcaíno J.A., Csordas A., del-Toro Due north., Dianes J.A., Griss J., Lavidas I., Mayer Chiliad., Perez-Riverol Y., Reisinger F., Ternent T. et al. .. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016; 44:D447–D456. [PMC free article] [PubMed] [Google Scholar]
36. Ezkurdia I., Calvo E., Del Pozo A., Vázquez J., Valencia A., Tress Thousand.50.. The potential clinical touch of the release of 2 drafts of the human proteome. Practiced. Rev. Proteomics. 2015; 12:579–593. [PMC free article] [PubMed] [Google Scholar]
37. Gascoigne D.K., Cheetham S.W., Cattenoz P.B., Clark M.B., Amaral P.P., Taft R.J., Wilhelm D., Dinger M.E., Mattick J.Southward.. Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics. 2012; 28:3042–3050. [PubMed] [Google Scholar]
38. Guerzoni D., McLysaght A.. De novo genes ascend at a slow only steady charge per unit along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol. Evol. 2016; 8:1222–1232. [PMC costless article] [PubMed] [Google Scholar]
39. Kusebauch U., Deutsch E.W., Campbell D.South., Sun Z., Farrah T., Moritz R.L.. Using PeptideAtlas, SRMAtlas, and PASSEL: comprehensive resources for discovery and targeted proteomics. Curr. Protoc. Bioinformatics. 2014; 46:thirteen.25.1–13.25.28. [PMC free article] [PubMed] [Google Scholar]
twoscore. Frankish A., Diekhans Yard., Ferreira A.K., Johnson R., Jungreis I., Loveland J., Mudge J.M., Sisu C., Wright J., Armstrong J. et al. .. GENCODE reference annotation for the man and mouse genomes. Nucleic Acids Res. 2019; 47:D766–D773. [PMC gratuitous article] [PubMed] [Google Scholar]
41. Zerbino D.R., Achuthan P., Akanni West., Amode Grand.R., Barrell D., Bhai J., Billis 1000., Cummins C., Gall A., Girón C.1000. et al. .. Ensembl 2018. Nucleic Acids Res. 2018; 46:D754–D761. [PMC gratuitous commodity] [PubMed] [Google Scholar]
42. Rodriguez J.M., Rodriguez-Rivas J., Di Domenico T., Vázquez J., Valencia A., Tress M.L.. APPRIS 2017: chief isoforms for multiple gene sets. Nucleic Acids Res. 2018; 46:D213–D217. [PMC gratuitous article] [PubMed] [Google Scholar]
43. Rodriguez J.G., Carro A., Valencia A., Tress M.L.. APPRIS WebServer and WebServices. Nucleic Acids Res. 2015; 43:W455–W459. [PMC free article] [PubMed] [Google Scholar]
44. Ezkurdia I., Rodriguez J.G., Carrillo-de Santa Pau E., Vázquez J., Valencia A., Tress M.L.. Most highly expressed protein-coding genes have a single ascendant isoform. J. Proteome Res. 2015; 14:1880–1887. [PMC gratuitous article] [PubMed] [Google Scholar]
45. 1000 Genomes Project Consortium A global reference forhuman genetic variation. Nature. 2015; 526:68–74. [PMC free article] [PubMed] [Google Scholar]
46. Martincorena I., Raine K.M., Gerstung Chiliad., Dawson K.J., Haase K., Van Loo P., Davies H., Stratton K.R., Campbell P.J.. Universal patterns of option in cancer and somatic tissues. Cell. 2018; 17:1029–1041. [PMC free commodity] [PubMed] [Google Scholar]
47. Blanchette M., Kent Westward.J., Riemer C., Elnitski L., Smit A.F., Roskin Grand.M., Baertsch R., Rosenbloom K., Clawson H., Dark-green E.D. et al. .. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004; 14:708–715. [PMC complimentary article] [PubMed] [Google Scholar]
48. Lefort V., Longueville J-Eastward., Gascuel O.. SMS: Smart Model Choice in PhyML. Mol. Biol. Evol. 2017; 34:2422–2424. [PMC gratis article] [PubMed] [Google Scholar]
49. Guindon South., Dufayard J.F., Lefort V., Anisimova K., Hordijk Due west., Gascuel O.. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 2010; 59:307–321. [PubMed] [Google Scholar]
50. Ziheng Y. PAML iv: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007; 24:1586–1591. [PubMed] [Google Scholar]
51. Lin M.F., Jungreis I., Kellis Grand.. PhyloCSF: a comparative genomics method to distinguish poly peptide coding and non-coding regions. Bioinformatics. 2011; 27:i275–i282. [PMC gratuitous article] [PubMed] [Google Scholar]
52. Ezkurdia I., Juan D., Rodriguez J.Yard., Frankish A., Diekhans Chiliad., Harrow J., Vazquez J., Valencia A., Tress Thou.L.. Multiple evidence strands advise that there may be as few as nineteen,000 human protein-coding genes. Hum. Mol. Genet. 2014; 23:5866–5878. [PMC free article] [PubMed] [Google Scholar]
53. Herrero J., Muffato One thousand., Aggravate Thou., Fitzgerald South., Gordon L., Pignatelli M., Vilella A.J., Searle Due south.M., Amode R., Brent S. et al. .. Ensembl comparative genomics resources. Database. 2016; 2016:baw053. [PMC complimentary commodity] [PubMed] [Google Scholar]
54. Vilella A.J., Severin J., Ureta-Vidal A., Heng L., Durbin R., Birney E.. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009; xix:327–335. [PMC gratuitous article] [PubMed] [Google Scholar]
55. Altschul S.F., Gish Westward., Miller West., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. [PubMed] [Google Scholar]
57. Silvester N., Alako B., Amid C., Cerdeño-Tarrága A., Clarke L., Cleland I., Harrison P.W., Jayathilaka S., Kay Due south., Keane T. et al. .. The European Nucleotide Archive in 2017. Nucleic Acids Res. 2018; 46:D36–D40. [PMC free article] [PubMed] [Google Scholar]
58. O'Leary N.A., Wright G.West., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. et al. .. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. [PMC free article] [PubMed] [Google Scholar]
59. Arensburger P., Hice R.H., Zhou L., Smith R.C., Tom A.C., Wright J.A., Knapp J., O'Brochta D.A., Craig Northward.L., Atkinson P.Westward.. Phylogenetic and functional characterization of the chapeau transposon superfamily. Genetics. 2011; 188:45–57. [PMC free commodity] [PubMed] [Google Scholar]
60. Letunic I., Doerks T., Bork P.. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 2015; 43:D257–D260. [PMC complimentary article] [PubMed] [Google Scholar]
61. Hamilton A.T., Huntley Southward., Tran-Gyamfi M., Baggott D.Yard., Gordon L., Stubbs 50.. Evolutionary expansion and divergence in the ZNF91 subfamily of primate-specific zinc finger genes. Genome Res. 2006; 16:584–594. [PMC gratuitous article] [PubMed] [Google Scholar]
62. Johnson M.East., Viggiano L., Bailey J.A., Abdul-Rauf M., Goodwin G., Rocchi Yard., Eichler Eastward.E.. Positive selection of a gene family during the emergence of humans and African apes. Nature. 2001; 413:514–519. [PubMed] [Google Scholar]
63. Abascal F., Juan D., Jungreis I., Kellis M., Martinez L., Rigau M., Rodriguez J.M., Vazquez J., Tress Thousand.L.. Loose ends: almost i in five human genes withal have unresolved coding status. Nucleic Acids Res. 2018; 46:7070–7084. [PMC free article] [PubMed] [Google Scholar]
64. Finger J.N., Lich J.D., Cartel 50.C., Cook Thousand.North., Brown Chiliad.Chiliad., Duraiswami C., Bertin J.J., Bertin J., Gough P.J.. Autolytic proteolysis within the role to notice domain (FIIND) is required for NLRP1 inflammasome activity. J Biol Chem. 2012; 287:25030–25037. [PMC gratuitous article] [PubMed] [Google Scholar]
65. Huang C., Wang Y., Li D., Li Y., Luo J., Yuan Westward, Ou Y., Zhu C., Zhang Y., Wang Z. et al. .. Inhibition of transcriptional activities of AP-1 and c-Jun by a new zinc finger protein ZNF394. Biochem. Biophys. Res. Commun. 2004; 320:1298–1305. [PubMed] [Google Scholar]
66. Jacobs F.M., Greenberg D., Nguyen N., Haeussler M., Ewing A.D., Katzman Southward., Paten B., Salama South.R., Haussler D.. An evolutionary arms race betwixt KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature. 2014; 516:242–245. [PMC free commodity] [PubMed] [Google Scholar]
67. Emerson R.O., Thomas J.H.. Adaptive evolution in zinc finger transcription factors. PLoS Genet. 2009; 5:e1000325. [PMC free article] [PubMed] [Google Scholar]
68. El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter Southward.C., Qureshim M., Richardson 50.J., Salazar Thou.A., Smart A. et al. .. The Pfam protein families database in 2019. Nucleic Acids Res. 2019; 47:D427–D432. [PMC free article] [PubMed] [Google Scholar]
69. Abascal F., Ezkurdia I., Rodriguez-Rivas J., Rodriguez J.M., del Pozo A., Vázquez J., Valencia A., Tress M.L.. Alternatively spliced homologous exons have ancient origins and are highly expressed at the protein level. PLoS Comput. Biol. 2015; 11:e1004325. [PMC gratis commodity] [PubMed] [Google Scholar]
70. Burley S.1000., Berman H.M., Christie C., Duarte J.One thousand., Feng Z., Westbrook J., Immature J., Zardecki C.. RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education. Protein Sci. 2017; 27:316–330. [PMC gratis article] [PubMed] [Google Scholar]
71. Ophir R., Itoh T., Graur D., Gojobori T.. A simple method for estimating the intensity of purifying pick in protein-coding genes. Mol. Biol. Evol. 1999; 16:49–53. [PubMed] [Google Scholar]
72. Jang H.South., Shah N.One thousand., Du A.Y., Dailey Z.Z., Pehrsson E.C., Godoy P.Grand., Zhang D., Li D., Xing X., Kim Due south. et al. .. Transposable elements drive widespread expression of oncogenes in homo cancers. Nat. Genet. 2019; 51:611–617. [PMC complimentary article] [PubMed] [Google Scholar]
73. Hamilton A.T., Huntley South., Tran-Gyamfi G., Baggott D.M., Gordon L., Stubbs Fifty.. Evolutionary expansion and deviation in the ZNF91 subfamily of primate-specific zinc finger genes. Genome Res. 2006; xvi:584–594. [PMC gratis article] [PubMed] [Google Scholar]
mendozabillostrand.blogspot.com
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6924539/
0 Response to "In Humans, Members of the Alu Family of Sine Elements:"
Postar um comentário