Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. (17/3795)

We measured the expression pattern and analyzed codon usage in 8,133, 1,550, and 2,917 genes, respectively, from Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana. In those three species, we observed a clear correlation between codon usage and gene expression levels and showed that this correlation is not due to a mutational bias. This provides direct evidence for selection on silent sites in those three distantly related multicellular eukaryotes. Surprisingly, there is a strong negative correlation between codon usage and protein length. This effect is not due to a smaller size of highly expressed proteins. Thus, for a same-expression pattern, the selective pressure on codon usage appears to be lower in genes encoding long rather than short proteins. This puzzling observation is not predicted by any of the current models of selection on codon usage and thus raises the question of how translation efficiency affects fitness in multicellular organisms.  (+info)

Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs. (18/3795)

We have developed a rapid visual method for identifying novel members of gene families. Starting with an evolutionary tree, 20-50 protein query sequences for a gene family are selected from different branches of the tree. These query sequences are used to search the GenBank and expressed sequence tag (EST) DNA databases and their nightly updates using the tfastx3 or tfasty3 programs. The results of all 20-50 searches are collated and resorted to highlight EST or genomic sequences that share significant similarity with the query sequences. The statistical significance of each DNA/protein alignment is plotted, highlighting the portion of the query sequence that is present in the database sequence and the percent identity in the aligned region. The collated results for database sequences are linked using the WWW to the underlying scores and alignments; these links can also be used to perform additional searches to characterize the novel sequence further. With traditional "deep" scoring matrices (BLOSUM50) one can search for previously unrecognized families of large protein superfamilies. Alternatively, by using query sequences and EST libraries from the same species (e. g., human or mouse) together with "shallow" scoring matrices and filters that remove high-identity sequences, one can highlight new paralogs of previously described subfamilies. Using query sequences from the glutathione transferase superfamily, we identified two novel mammalian glutathione transferase families that were recognized previously only in plants. Using query sequences from known mammalian glutathione transferase subfamilies, we identified new candidate paralogs from the mouse class-mu, class-pi, and class-theta families.  (+info)

Murine Gcm1 gene is expressed in a subset of placental trophoblast cells. (19/3795)

The gcm gene of Drosophila melanogaster encodes a transcription factor that is an important component in cell fate specification within the nervous system. In the absence of a functional gcm gene, progenitor cells differentiate into neurons, whereas when the gene is ectopically expressed the cells produce excess glial cells at the expense of neuronal differentiation. Recent searches of databases have uncovered high sequence similarity between the Drosophila gem gene and an anonymous human placental cDNA clone (Altschuller et al., 1996; this communication). Here we report the molecular organization of the murine Gcm1, its spatio-temporal pattern of expression in developing placenta, and its map position at E1-E3 on murine chromosome 9. The murine gene is composed of at least 6 exons. The promoter region contains an "initiation sequence" and is GC rich, characteristics of the promoters of several transcription factors. The mRNA has a modest 5'UTR (ca. 200 bases) but an extensive 3' UTR (ca. 2 kb). Northern blot and mRNA in situ hybridization studies showed that Gcm1 expression was readily detectable only in the placenta. It began at embryonic day 7.5 within trophoblast cells of the chorion and continued to about embryonic day 17.5 within a subset of labyrinthine trophoblast cells. Comparison with other transcription factors revealed that Gcm1 expression defines a unique subset of trophoblast cells.  (+info)

DNA microarray technology: the anticipated impact on the study of human disease. (20/3795)

One can imagine that, one day, there will be a general requirement that relevant array data be deposited, at the time of publication of manuscripts in which they are described, into a single site made available for the storage and analysis of array data (modeled after the GenBank submission requirements for DNA sequence information). With this system in place, one can anticipate a time when data from thousands of gene expression experiments will be available for meta-analysis, which has the potential to balance out artifacts from many individual studies, thus leading to more robust results and subtle conclusions. This will require that data adhere to some type of uniform structure and format that would ideally be independent of the particular expression technology used to generate it. The pros and cons of various publication modalities for these large electronic data sets have been discussed elsewhere [12], but, practical difficulties aside, general depositing must occur for this technology to reach the broadest range of investigators. Finally, as mentioned at the beginning of this review, it is unfortunate that this important research tool remains largely restricted to a few laboratories that have developed expertise in this area and to a growing number of commercial interests. Ultimately the real value of microarray technology will only be realized when this approach is generally available. It is hoped that issues including platforms, instrumentation, clone availability, and patents [20] will be resolved shortly, making this technology accessible to the broadest range of scientists at the earliest possible moment.  (+info)

Identifying and mapping novel retinal-expressed ESTs from humans. (21/3795)

PURPOSE: The goal of this study was to develop efficient methods to identify tissue-specific expressed sequence tags (ESTs) and to map their locations in the human genome. Through a combination of database analysis and laboratory investigation, unique retina-specific ESTs were identified and mapped as candidate genes for inherited retinal diseases. METHODS: DNA sequences from retina-specific EST clusters were obtained from the TIGR Human Gene Index Database. Further processing of the EST sequence data was necessary to ensure that each EST cluster represented a novel, non-redundant mapping candidate. Processing involved screening for homologies to known genes and proteins using BLAST, excluding known human gene sequences and repeat sequences, and developing primers for PCR amplification of the gene encoding each cDNA cluster from genomic DNA. The EST clusters were mapped using the GeneBridge 4.0 Radiation Hybrid Mapping Panel with standard PCR conditions. RESULTS: A total of 83 retinal-expressed EST clusters were examined as potential novel, non-redundant mapping candidates. Fifty-five clusters were mapped successfully and their locations compared to the locations of known retinal disease genes. Fourteen EST clusters localize to candidate regions for inherited retinal diseases. CONCLUSIONS: This pilot study developed methodology for mapping uniquely expressed retinal ESTs and for identifying potential candidate genes for inherited retinal disorders. Despite the overall success, several complicating factors contributed to the high failure rate (33%) for mapping EST-clustered sequences. These include redundancy in the sequence data, widely dispersed sequences, ambiguous nucleotides within the sequences, the possibility of amplifying through introns and the presence of repetitive elements within the sequence. However, the combination of database analysis and laboratory mapping is a powerful method for identification of candidate genes for inherited diseases.  (+info)

Expressed sequence tags from immature female sexual organ of a liverwort, Marchantia polymorpha. (22/3795)

A total of 970 expressed sequence tag (EST) clones were generated from immature female sexual organ of a liverwort, Marchantia polymorpha. The 376 ESTs resulted in 123 redundant groups, thus the total number of unique sequences in the EST set was 717. Database search by BLAST algorithm showed that 302 of the unique sequences shared significant similarities to known nucleotide or amino acid sequences. Six unique sequences showed significant similarities to genes that are involved in flower development and sexual reproduction, such as cynarase, fimbriata-associated protein and S-receptor kinase genes. The remaining unique 415 sequences have no significant similarity with any database-registered genes or proteins. The redundant 123 ESTs implied the presence of gene families and abundant transcripts of unknown identity. Analyses of the coding sequences of 61 unique sequences, which contained no ambiguous bases in the predicted coding regions, highly homologous to known sequences at the amino acid level with a similarity score greater than 400, and with stop codons at similar positions as their possible orthologues, indicated the presence of biased codon usage and higher GC content within the coding sequences (50.4%) than that within 3' flanking sequences (41.9%).  (+info)

Molecular cloning and tissue-specific expression of a novel murine laminin gamma3 chain. (23/3795)

A novel laminin gamma3 chain was identified from the expressed sequence tag data base at the National Center for Biotechnology Information. A complete cDNAderived peptide sequence reveals a 1592-amino acid-long primary translation product, including a tentative 33-amino acid-long signal peptide. Comparison with the laminin gamma1 chain predicts that the two polypeptides have equal spatial dimensions. In addition, the well conserved domains VI and III(LE4) predict that gamma3 containing laminins are able to integrate to the laminin network and also via nidogen connect to other protein networks in the basement membranes. Combination of Northern analysis and in situ hybridization experiments indicate that expression of the gamma3 chain is highly tissue- and cell-specific, being significantly strong in capillaries and arterioles of kidney as well as in interstitial Leydig cells of testis.  (+info)

Inventory of high-abundance mRNAs in skeletal muscle of normal men. (24/3795)

G42875rial analysis of gene expression (SAGE) method was used to generate a catalog of 53,875 short (14 base) expressed sequence tags from polyadenylated RNA obtained from vastus lateralis muscle of healthy young men. Over 12,000 unique tags were detected. The frequency of occurrence of each tag reflects the relative abundance of the corresponding mRNA. The mRNA species that were detected 10 or more times, each comprising >/=0.02% of the mRNA population, accounted for 64% of the mRNA mass but <10% of the total number of mRNA species detected. Almost all of the abundant tags matched mRNA or EST sequences cataloged in GenBank. Mitochondrial transcripts accounted for approximately 20% of the polyadenylated RNA. Transcripts encoding proteins of the myofibrils were the most abundant nuclear-encoded mRNAs. Transcripts encoding ribosomal proteins, and those encoding proteins involved in energy metabolism, also were very abundant. The database can be used as a reference for investigations of alterations in gene expression associated with conditions that influence muscle function, such as muscular dystrophies, aging, and exercise.  (+info)