(1/53) The evolution of isochores: evidence from SNP frequency distributions.
The large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions. (+info)
(2/53) Vanishing GC-rich isochores in mammalian genomes.
To understand the origin and evolution of isochores-the peculiar spatial distribution of GC content within mammalian genomes-we analyzed the synonymous substitution pattern in coding sequences from closely related species in different mammalian orders. In primate and cetartiodactyls, GC-rich genes are undergoing a large excess of GC --> AT substitutions over AT --> GC substitutions: GC-rich isochores are slowly disappearing from the genome of these two mammalian orders. In rodents, our analyses suggest both a decrease in GC content of GC-rich isochores and an increase in GC-poor isochores, but more data will be necessary to assess the significance of this pattern. These observations question the conclusions of previous works that assumed that base composition was at equilibrium. Analysis of allele frequency in human polymorphism data, however, confirmed that in the GC-rich parts of the genome, GC alleles have a higher probability of fixation than AT alleles. This fixation bias appears not strong enough to overcome the large excess of GC --> AT mutations. Thus, whatever the evolutionary force (neutral or selective) at the origin of GC-rich isochores, this force is no longer effective in mammals. We propose a model based on the biased gene conversion hypothesis that accounts for the origin of GC-rich isochores in the ancestral amniote genome and for their decline in present-day mammals. (+info)
(3/53) DNA helix: the importance of being GC-rich.
A new explanation for the emergence of heavy (GC-rich) isochores is proposed, based on the study of thermostability, bendability, ability to B-Z transition and curvature of the DNA helix. The absolute values of thermostability, bendability and ability to B-Z transition correlated positively with GC content, whereas curvature correlated negatively. The relative values of these parameters were determined as compared to randomized sequences. In genes and intergenic spacers of warm-blooded animals, both the relative bendability and ability to B-Z transition increased with elevation of GC content, whereas the relative thermostability and curvature decreased. The usage of synonymous codons in GC-rich genes was also found to augment bendability and ability to B-Z transition and to reduce thermostability of DNA (as compared to synonymous codons with the same GC content). The analysis of transposable elements (Alu and B2 repeats in the human and mouse) showed that the level of their divergence from the consensus sequence positively correlated with relative bendability and ability to B-Z transition and negatively with relative thermostability. The bendability and ability to B-Z transition are known to relate to open chromatin and active transcription, whereas curvature facilitates chromatin condensation. Because heavy isochores are known to be gene-rich and show a high level of transcription, it is suggested here that isochores arose not as an adaptation to elevated temperature but because of a certain grade of general organization and correspondingly advanced level of genomic organization, reflected in genome structuring, with physical properties of DNA in the gene-rich regions being optimized for active transcription and in the gene-poor regions for chromatin condensation ('transcription/grade' concept). (+info)
(4/53) Distinct changes of genomic biases in nucleotide substitution at the time of Mammalian radiation.
Differences in the regional substitution patterns in the human genome created patterns of large-scale variation of base composition known as genomic isochores. To gain insight into the origin of the genomic isochores, we develop a maximum-likelihood approach to determine the history of substitution patterns in the human genome. This approach utilizes the vast amount of repetitive sequence deposited in the human genome over the past approximately 250 Myr. Using this approach, we estimate the frequencies of seven types of substitutions: the four transversions, two transitions, and the methyl-assisted transition of cytosine in CpG. Comparing substitutional patterns in repetitive elements of various ages, we reconstruct the history of the base-substitutional process in the different isochores for the past 250 Myr. At around 90 MYA (around the time of the mammalian radiation), we find an abrupt fourfold to eightfold increase of the cytosine transition rate in CpG pairs compared with that of the reptilian ancestor. Further analysis of nucleotide substitutions in regions with different GC content reveals concurrent changes in the substitutional patterns. Although the substitutional pattern was dependent on the regional GC content in such ways that it preserved the regional GC content before the mammalian radiation, it lost this dependence afterward. The substitutional pattern changed from an isochore-preserving to an isochore-degrading one. We conclude that isochores have been established before the radiation of the eutherian mammals and have been subject to the process of homogenization since then. (+info)
(5/53) A unification of mosaic structures in the human genome.
The human genome is a mosaic structure on many levels: there exist cytogenetic bands, GC composition bands (isochores) and clusters of broadly expressed genes. How might these inter-relate? It has been proposed that to optimize gene regulation, housekeeping genes should concentrate on transcriptionally competent chromosomal domains. Prior evidence suggests that regions of high GC and R bands are associated with such domains. Here we report that broadly expressed genes cluster in regions of high GC, and in R and lightest Giemsa bands. This is not only a confirmation of the adaptive hypothesis, but is also the first direct systematic evidence of a general interdependence of expression patterns with base composition and chromosome structure. (+info)
(6/53) Isochores and tissue-specificity.
The housekeeping (ubiquitously expressed) genes in the mammal genome were shown here to be on average slightly GC-richer than tissue-specific genes. Both housekeeping and tissue-specific genes occupy similar ranges of GC content, but the former tend to concentrate in the upper part of the range. In the human genome, tissue-specific genes show two maxima, GC-poor and GC-rich. The strictly tissue-specific human genes tend to concentrate in the GC-poor region; their distribution is left-skewed and thus reciprocal to the distribution of housekeeping genes. The intermediately tissue-specific genes show an intermediate GC content and the right-skewed distribution. Both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes. Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the biased gene conversion (BGC) explanation for elevated GC content in the heavy isochores of mammal genome. Genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC explanation. Both for the total data set and for the most part of tissues taken separately, a weak positive correlation was found between gene GC content and expression level. The fraction of ubiquitously expressed genes is nearly 1.5-fold higher in the mouse than in the human. This suggests that mouse tissues are comparatively less differentiated (on the molecular level), which can be related to a less pronounced isochoric structure of the mouse genome. In each separate tissue (in both species), tissue-specific genes do not form a clear-cut frequency peak (in contrast to housekeeping genes), but constitute a continuum with a gradually increasing degree of tissue-specificity, which probably reflects the path of cell differentiation and/or an independent use of the same protein in several unrelated tissues. (+info)
(7/53) Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.
Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts approximately 8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained "bombardment" over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for approximately 20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c. (+info)
(8/53) IsoFinder: computational prediction of isochores in genome sequences.
Isochores are long genome segments homogeneous in G+C. Here, we describe an algorithm (IsoFinder) running on the web (http://bioinfo2.ugr.es/IsoF/isofinder.html) able to predict isochores at the sequence level. We move a sliding pointer from left to right along the DNA sequence. At each position of the pointer, we compute the mean G+C values to the left and to the right of the pointer. We then determine the position of the pointer for which the difference between left and right mean values (as measured by the t-statistic) reaches its maximum. Next, we determine the statistical significance of this potential cutting point, after filtering out short-scale heterogeneities below 3 kb by applying a coarse-graining technique. Finally, the program checks whether this significance exceeds a probability threshold. If so, the sequence is cut at this point into two subsequences; otherwise, the sequence remains undivided. The procedure continues recursively for each of the two resulting subsequences created by each cut. This leads to the decomposition of a chromosome sequence into long homogeneous genome regions (LHGRs) with well-defined mean G+C contents, each significantly different from the G+C contents of the adjacent LHGRs. Most LHGRs can be identified with Bernardi's isochores, given their correlation with biological features such as gene density, SINE and LINE (short, long interspersed repetitive elements) densities, recombination rate or single nucleotide polymorphism variability. The resulting isochore maps are available at our web site (http://bioinfo2.ugr.es/isochores/), and also at the UCSC Genome Browser (http://genome.cse.ucsc.edu/). (+info)