Evolutionary role of restriction/modification systems as revealed by comparative genome analysis. (33/588)

Type II restriction modification systems (RMSs) have been regarded either as defense tools or as molecular parasites of bacteria. We extensively analyzed their evolutionary role from the study of their impact in the complete genomes of 26 bacteria and 35 phages in terms of palindrome avoidance. This analysis reveals that palindrome avoidance is not universally spread among bacterial species and that it does not correlate with taxonomic proximity. Palindrome avoidance is also not universal among bacteriophage, even when their hosts code for RMSs, and depends strongly on the genetic material of the phage. Interestingly, palindrome avoidance is intimately correlated with the infective behavior of the phage. We observe that the degree of palindrome and restriction site avoidance is significantly and consistently less important in phages than in their bacterial hosts. This result brings to the fore a larger selective load for palindrome and restriction site avoidance on the bacterial hosts than on their infecting phages. It is then consistent with a view where type II RMSs are considered as parasites possibly at the verge of mutualism. As a consequence, RMSs constitute a nontrivial third player in the host-parasite relationship between bacteria and phages.  (+info)

Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea. (34/588)

We have compared three complete genomes of closely related hyperthermophilic species of Archaea belonging to the Pyrococcus genus: Pyrococcus abyssi, Pyrococcus horikoshii, and Pyrococcus furiosus. At the genomic level, the comparison reveals a differential conservation among four regions of the Pyrococcus chromosomes correlated with the location of genetic elements mediating DNA reorganization. This discloses the relative contribution of the major mechanisms that promote genomic plasticity in these Archaea, namely rearrangements linked to the replication terminus, insertion sequence-mediated recombinations, and DNA integration within tRNA genes. The combination of these mechanisms leads to a high level of genomic plasticity in these hyperthermophilic Archaea, at least comparable to the plasticity observed between closely related bacteria. At the proteomic level, the comparison of the three Pyrococcus species sheds light on specific selection pressures acting both on their coding capacities and evolutionary rates. Indeed, thanks to two independent methods, the "reciprocal best hits" approach and a new distance ratio analysis, we detect the false orthology relationships within the Pyrococcus lineage. This reveals a high amount of differential gains and losses of genes since the divergence of the three closely related species. The resulting polymorphism is probably linked to an adaptation of these free-living organisms to differential environmental constraints. As a corollary, we delineate the set of orthologous genes shared by the three species, that is, the genes that may characterize the Pyrococcus genus. In this conserved core, the amino acid substitution rate is equal between P. abyssi and P. horikoshii for most of their shared proteins, even for fast-evolving ones. In contrast, strong discrepancies exist among the substitution rates observed in P. furiosus relative to the two other species, which is in disagreement with the molecular clock hypothesis.  (+info)

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. (35/588)

Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.  (+info)

Evolution of gene order conservation in prokaryotes. (36/588)

BACKGROUND: As more complete genomes are sequenced, conservation of gene order between different organisms is emerging as an informative property of the genomes. Conservation of gene order has been used for predicting function and functional interactions of proteins, as well as for studying the evolutionary relationships between genomes. The reasons for the maintenance of gene order are still not well understood, as the organization of the prokaryote genome into operons and lateral gene transfer cannot possibly account for all the instances of conservation found. Comprehensive studies of gene order are one way of elucidating the nature of these maintaining forces. RESULTS: Gene order is extensively conserved between closely related species, but rapidly becomes less conserved among more distantly related organisms, probably in a cooperative fashion. This trend could be universal in prokaryotic genomes, as archaeal genomes are likely to behave similarly to bacterial genomes. Gene order conservation could therefore be used as a valid phylogenetic measure to study relationships between species. Even between very distant species, remnants of gene order conservation exist in the form of highly conserved clusters of genes. This suggests the existence of selective processes that maintain the organization of these regions. Because the clusters often span more than one operon, common regulation probably cannot be invoked as the cause of the maintenance of gene order. CONCLUSIONS: Gene order conservation is a genomic measure that can be useful for studying relationships between prokaryotes and the evolutionary forces shaping their genomes. Gene organization is extensively conserved in some genomic regions, and further studies are needed to elucidate the reason for this conservation.  (+info)

Genetic fidelity under harsh conditions: analysis of spontaneous mutation in the thermoacidophilic archaeon Sulfolobus acidocaldarius. (37/588)

Microbes whose genomes are encoded by DNA and for which adequate information is available display similar genomic mutation rates (average 0.0034 mutations per chromosome replication, range 0.0025 to 0.0046). However, this value currently is based on only a few well characterized microbes reproducing within a narrow range of environmental conditions. In particular, no genomic mutation rate has been determined either for a microbe whose natural growth conditions may extensively damage DNA or for any member of the archaea, a prokaryotic lineage deeply diverged from both bacteria and eukaryotes. Both of these conditions are met by the extreme thermoacidophile Sulfolobus acidocaldarius. We determined the genomic mutation rate for this species when growing at pH 3.5 and 75 degrees C based on the rate of forward mutation at the pyrE gene and the nucleotide changes identified in 101 independent mutants. The observed value of about 0.0018 extends the range of DNA-based microbes with rates close to the standard rate simultaneously to an archaeon and to an extremophile whose cytoplasmic pH and normal growth temperature greatly accelerate the spontaneous decomposition of DNA. The mutations include base pair substitutions (BPSs) and additions and deletions of various sizes, but the S. acidocaldarius spectrum differs from those of other DNA-based organisms in being relatively poor in BPSs. The paucity of BPSs cannot yet be explained by known properties of DNA replication or repair enzymes of Sulfolobus spp. It suggests, however, that molecular evolution per genome replication may proceed more slowly in S. acidocaldarius than in other DNA-based organisms examined to date.  (+info)

The complete genome of the crenarchaeon Sulfolobus solfataricus P2. (38/588)

The genome of the crenarchaeon Sulfolobus solfataricus P2 contains 2,992,245 bp on a single chromosome and encodes 2,977 proteins and many RNAs. One-third of the encoded proteins have no detectable homologs in other sequenced genomes. Moreover, 40% appear to be archaeal-specific, and only 12% and 2.3% are shared exclusively with bacteria and eukarya, respectively. The genome shows a high level of plasticity with 200 diverse insertion sequence elements, many putative nonautonomous mobile elements, and evidence of integrase-mediated insertion events. There are also long clusters of regularly spaced tandem repeats. Different transfer systems are used for the uptake of inorganic and organic solutes, and a wealth of intracellular and extracellular proteases, sugar, and sulfur metabolizing enzymes are encoded, as well as enzymes of the central metabolic pathways and motility proteins. The major metabolic electron carrier is not NADH as in bacteria and eukarya but probably ferredoxin. The essential components required for DNA replication, DNA repair and recombination, the cell cycle, transcriptional initiation and translation, but not DNA folding, show a strong eukaryal character with many archaeal-specific features. The results illustrate major differences between crenarchaea and euryarchaea, especially for their DNA replication mechanism and cell cycle processes and their translational apparatus.  (+info)

A Web-based classification system of DNA-binding protein families. (39/588)

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The family of DNA-binding proteins is one of the most populated and studied amongst the various genomes of bacteria, archaea and eukaryotes and the Web-based system presented here is an approach to their classification. The DnaProt resource is an annotated and searchable collection of protein sequences for the families of DNA-binding proteins. The database contains 3238 full-length sequences (retrieved from the SWISS-PROT database, release 38) that include, at least, a DNA-binding domain. Sequence entries are organized into families defined by PROSITE patterns, PRINTS motifs and de novo excised signatures. Combining global similarities and functional motifs into a single classification scheme, DNA-binding proteins are classified into 33 unique classes, which helps to reveal comprehensive family relationships. To maximize family information retrieval, DnaProt contains a collection of multiple alignments for each DNA-binding family while the recognized motifs can be used as diagnostically functional fingerprints. All available structural class representatives have been referenced. The resource was developed as a Web-based management system for online free access of customized data sets. Entries are fully hyperlinked to facilitate easy retrieval of the original records from the source databases while functional and phylogenetic annotation will be applied to newly sequenced genomes. The database is freely available for online search of a library containing specific patterns of the identified DNA-binding protein classes and retrieval of individual entries from our WWW server (http://kronos.biol.uoa.gr/~mariak/dbDNA.html).  (+info)

Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7. (40/588)

The complete genomic sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7 which optimally grows at 80 degrees C, at low pH, and under aerobic conditions, has been determined by the whole genome shotgun method with slight modifications. The genomic size was 2,694,756 bp long and the G + C content was 32.8%. The following RNA-coding genes were identified: a single 16S-23S rRNA cluster, one 5S rRNA gene and 46 tRNA genes (including 24 intron-containing tRNA genes). The repetitive sequences identified were SR-type repetitive sequences, long dispersed-type repetitive sequences and Tn-like repetitive elements. The genome contained 2826 potential protein-coding regions (open reading frames, ORFs). By similarity search against public databases, 911 (32.2%) ORFs were related to functional assigned genes, 921 (32.6%) were related to conserved ORFs of unknown function, 145 (5.1%) contained some motifs, and remaining 849 (30.0%) did not show any significant similarity to the registered sequences. The ORFs with functional assignments included the candidate genes involved in sulfide metabolism, the TCA cycle and the respiratory chain. Sequence comparison provided evidence suggesting the integration of plasmid, rearrangement of genomic structure, and duplication of genomic regions that may be responsible for the larger genomic size of the S. tokodaii strain7 genome. The genome contained eukaryote-type genes which were not identified in other archaea and lacked the CCA sequence in the tRNA genes. The result suggests that this strain is closer to eukaryotes among the archaea strains so far sequenced. The data presented in this paper are also available on the internet homepage (http://www.bio.nite.go.jp/E-home/genome_list-e.html/).  (+info)