Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. (57/588)

We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies.  (+info)

Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. (58/588)

BACKGROUND: Analysis of any newly sequenced bacterial genome starts with the identification of protein-coding genes. Despite the accumulation of multiple complete genome sequences, which provide useful comparisons with close relatives among other organisms during the annotation process, accurate gene prediction remains quite difficult. A major reason for this situation is that genes are tightly packed in prokaryotes, resulting in frequent overlap. Thus, detection of translation initiation sites and/or selection of the correct coding regions remain difficult unless appropriate biological knowledge (about the structure of a gene) is imbedded in the approach. RESULTS: We have developed a new program that automatically identifies biologically significant candidate genes in a bacterial genome. Twenty-six complete prokaryotic genomes were analyzed using this tool, and the accuracy of gene finding was assessed by comparison with existing annotations. This analysis revealed that, despite the enormous effort of genome program annotators, a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong. Moreover, the analysis of several putative new genes shows that, as expected, many short genes have escaped annotation. In most cases, these new genes revealed frameshifts that could be either artifacts or genuine frameshifts. Some entirely unexpected new genes have also been identified. This allowed us to get a more complete picture of prokaryotic genomes. The results of this procedure are progressively integrated into the SWISS-PROT reference databank. CONCLUSIONS: The results described in the present study show that our procedure is very satisfactory in terms of gene finding accuracy. Except in few cases, discrepancies between our results and annotations provided by individual authors can be accounted for by the nature of each annotation process or by specific characteristics of some genomes. This stresses that close cooperation between scientists, regular update and curation of the findings in databases are clearly required to reduce the level of errors in genome annotation (and hence in reducing the unfortunate spreading of errors through centralized data libraries).  (+info)

Evolutionary analysis by whole-genome comparisons. (59/588)

A total of 37 complete genome sequences of bacteria, archaea, and eukaryotes were compared. The percentage of orthologous genes of each species contained within any of the other 36 genomes was established. In addition, the mean identity of the orthologs was calculated. Several conclusions result: (i) a greater absolute number of orthologs of a given species is found in larger species than in smaller ones; (ii) a greater percentage of the orthologous genes of smaller genomes is contained in other species than is the case for larger genomes, which corresponds to a larger proportion of essential genes; (iii) before species can be specifically related to one another in terms of gene content, it is first necessary to correct for the size of the genome; (iv) eukaryotes have a significantly smaller percentage of bacterial orthologs after correction for genome size, which is consistent with their placement in a separate domain; (v) the archaebacteria are specifically related to one another but are not significantly different in gene content from the bacteria as a whole; (vi) determination of the mean identity of all orthologs (involving hundreds of gene comparisons per genome pair) reduces the impact of errors in misidentification of orthologs and to misalignments, and thus it is far more reliable than single gene comparisons; (vii) however, there is a maximum amount of change in protein sequences of 37% mean identity, which limits the use of percentage sequence identity to the lower taxa, a result which should also be true for single gene comparisons of both proteins and rRNA; (viii) most of the species that appear to be specifically related based upon gene content also appear to be specifically related based upon the mean identity of orthologs; (ix) the genes of a majority of species considered in this study have diverged too much to allow the construction of all-encompassing evolutionary trees. However, we have shown that eight species of gram-negative bacteria, six species of gram-positive bacteria, and eight species of archaebacteria are specifically related in terms of gene content, mean identity of orthologs, or both.  (+info)

Complete reconstitution of the human coenzyme A biosynthetic pathway via comparative genomics. (60/588)

The biosynthesis of CoA from pantothenic acid (vitamin B5) is an essential universal pathway in prokaryotes and eukaryotes. The CoA biosynthetic genes in bacteria have all recently been identified, but their counterparts in humans and other eukaryotes remained mostly unknown. Using comparative genomics, we have identified human genes encoding the last four enzymatic steps in CoA biosynthesis: phosphopantothenoylcysteine synthetase (EC ), phosphopantothenoylcysteine decarboxylase (EC ), phosphopantetheine adenylyltransferase (EC ), and dephospho-CoA kinase (EC ). Biological functions of these human genes were verified using a complementation system in Escherichia coli based on transposon mutagenesis. The individual human enzymes were overexpressed in E. coli and purified, and the corresponding activities were experimentally verified. In addition, the entire pathway from phosphopantothenate to CoA was successfully reconstituted in vitro using a mixture of purified recombinant enzymes. Human recombinant bifunctional phosphopantetheine adenylyltransferase/dephospho-CoA kinase was kinetically characterized. This enzyme was previously suggested as a point of CoA biosynthesis regulation, and we have observed significant differences in mRNA levels of the corresponding human gene in normal and tumor cells by Northern blot analysis.  (+info)

Comparative genomics using data mining tools. (61/588)

We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis were Methanococcus jannaschii, Haemophilus influenzae and Saccharomyces cerevisiae. We have identified the common and different features between the three genomes in the protein evolution patterns. M. jannaschii has been seen to have a greater number of proteins with more charged amino acids whereas S. cerevisiae has been observed to have a greater number of hydrophilic proteins. Despite the differences in intrinsic compositional characteristics between the proteins from the different genomes we have also identified certain common characteristics. We have carried out exploratory Principal Component Analysis of the multivariate data on the proteins of each organism in an effort to classify the proteins into clusters. Interestingly, we found that most of the proteins in each organism cluster closely together, but there are a few 'outliers'. We focus on the outliers for the functional investigations, which may aid in revealing any unique features of the biology of the respective organisms  (+info)

The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. (62/588)

We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2'-modified oligonucleotides (Fimers). Sequencing redundancy (3.3x) was sufficient to assemble the genome with less than one error per 40 kb. Using a combination of sequence database searches and coding potential prediction, 1,692 protein-coding genes and 39 genes for structural RNAs were identified. M. kandleri proteins show an unusually high content of negatively charged amino acids, which might be an adaptation to the high intracellular salinity. Previous phylogenetic analysis of 16S RNA suggested that M. kandleri belonged to a very deep branch, close to the root of the archaeal tree. However, genome comparisons indicate that, in both trees constructed using concatenated alignments of ribosomal proteins and trees based on gene content, M. kandleri consistently groups with other archaeal methanogens. M. kandleri shares the set of genes implicated in methanogenesis and, in part, its operon organization with Methanococcus jannaschii and Methanothermobacter thermoautotrophicum. These findings indicate that archaeal methanogens are monophyletic. A distinctive feature of M. kandleri is the paucity of proteins involved in signaling and regulation of gene expression. Also, M. kandleri appears to have fewer genes acquired via lateral transfer than other archaea. These features might reflect the extreme habitat of this organism.  (+info)

The genome of M. acetivorans reveals extensive metabolic and physiological diversity. (63/588)

Methanogenesis, the biological production of methane, plays a pivotal role in the global carbon cycle and contributes significantly to global warming. The majority of methane in nature is derived from acetate. Here we report the complete genome sequence of an acetate-utilizing methanogen, Methanosarcina acetivorans C2A. Methanosarcineae are the most metabolically diverse methanogens, thrive in a broad range of environments, and are unique among the Archaea in forming complex multicellular structures. This diversity is reflected in the genome of M. acetivorans. At 5,751,492 base pairs it is by far the largest known archaeal genome. The 4524 open reading frames code for a strikingly wide and unanticipated variety of metabolic and cellular capabilities. The presence of novel methyltransferases indicates the likelihood of undiscovered natural energy sources for methanogenesis, whereas the presence of single-subunit carbon monoxide dehydrogenases raises the possibility of nonmethanogenic growth. Although motility has not been observed in any Methanosarcineae, a flagellin gene cluster and two complete chemotaxis gene clusters were identified. The availability of genetic methods, coupled with its physiological and metabolic diversity, makes M. acetivorans a powerful model organism for the study of archaeal biology. [Sequence, data, annotations and analyses are available at http://www-genome.wi.mit.edu/.]  (+info)

Pyrococcus genome comparison evidences chromosome shuffling-driven evolution. (64/588)

The genomes of three Pyrococcus species, P.abyssi, P.furiosus and P.horikoshii, were compared at the DNA level, taking advantage of our identification of their replication origins. Three types of rearrangements have been identified: (i) inversion and translation across the replication axis (origin/terminus), (ii) inversion and translocation restricted to a replichore (the half chromosome divided by the replication axis) and (iii) apparent mobility of long clusters of repeated sequences. Rearrangements restricted within a replichore were more common between P.furiosus and the two other Pyrococcus species than between P.horikoshii and P.abyssi. A strong correlation was found between 23 homologous insertion sequence elements, present only in P.furiosus, and recombined segment boundaries, suggesting that transposition events have been a major cause of genomic disruption in this species. Moreover, gene orientation bias was much more disrupted than strand composition biases in fragments that switched their orientation within a replichore upon recombination. This allowed us to conclude that one reversion and one translation occurred in P.abyssi after its divergence from P.horikoshii, and that a smaller segment has specifically recombined in P.furiosus. Whereas a majority of genes are transcribed in the same direction as DNA replication in P.horikoshii and P.abyssi, the colinearity of transcription and replication is only maintained for highly transcribed genes in P.furiosus. We discuss the implications of genomic rearrangements on gene orientation and composition biases, and their consequences on sequence evolution.  (+info)