There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs. (+info)
UCP4, a novel brain-specific mitochondrial protein that reduces membrane potential in mammalian cells.
Uncoupling proteins (UCPs) are a family of mitochondrial transporter proteins that have been implicated in thermoregulatory heat production and maintenance of the basal metabolic rate. We have identified and partially characterized a novel member of the human uncoupling protein family, termed uncoupling protein-4 (UCP4). Protein sequence analyses showed that UCP4 is most related to UCP3 and possesses features characteristic of mitochondrial transporter proteins. Unlike other known UCPs, UCP4 transcripts are exclusively expressed in both fetal and adult brain tissues. UCP4 maps to human chromosome 6p11.2-q12. Consistent with its potential role as an uncoupling protein, UCP4 is localized to the mitochondria and its ectopic expression in mammalian cells reduces mitochondrial membrane potential. These findings suggest that UCP4 may be involved in thermoregulatory heat production and metabolism in the brain. (+info)
Combining SSH and cDNA microarrays for rapid identification of differentially expressed genes.
Comparing patterns of gene expression in cell lines and tissues has important applications in a variety of biological systems. In this study we have examined whether the emerging technology of cDNA microarrays will allow a high throughput analysis of expression of cDNA clones generated by suppression subtractive hybridization (SSH). A set of cDNA clones including 332 SSH inserts amplified by PCR was arrayed using robotic printing. The cDNA arrays were hybridized with fluorescent labeled probes prepared from RNA from ER-positive (MCF7 and T47D) and ER-negative (MDA-MB-231 and HBL-100) breast cancer cell lines. Ten clones were identified that were over-expressed by at least a factor of five in the ER-positive cell lines. Northern blot analysis confirmed over-expression of these 10 cDNAs. Sequence analysis identified four of these clones as cytokeratin 19, GATA-3, CD24 and glutathione-S-transferase mu-3. Of the remaining six cDNA clones, four clones matched EST sequences from two different genes and two clones were novel sequences. Flow cytometry and immunofluorescence confirmed that CD24 protein was over-expressed in the ER-positive cell lines. We conclude that SSH and microarray technology can be successfully applied to identify differentially expressed genes. This approach allowed the identification of differentially expressed genes without the need to obtain previously cloned cDNAs. (+info)
The Dictyostelium developmental cDNA project: generation and analysis of expressed sequence tags from the first-finger stage of development.
In an effort to identify and characterize genes expressed during multicellular development ill Dictyostelium, we have undertaken a cDNA sequencing project. Using size-fractionated subsets of cDNA from the first finger stage, two sets of gridded libraries were constructed for cDNA sequencing. One, library S, consisting of 9984 clones, carries relatively short inserts, and the other, library L, which consists of 8448 clones, has longer inserts. We sequenced all the selected clones in library S from their 3'-ends, and this generated 3093 non-redundant, expressed sequence tags (ESTs). Among them, 246 ESTs hit known Dictyostelium genes and 910 showed significant similarity to genes of Dictyostelium and other organisms. For library L, 1132 clones were randomly sequenced and 471 non-redundant ESTs were obtained. In combination, the ESTs from the two libraries represent approximately 40% of genes expressed in late development, assuming that the non-redundant ESTs correspond to independent genes. They will provide a useful resource for investigating the genetic networks that regulate multicellular development of this organism. (+info)
Molecular identification of human G-substrate, a possible downstream component of the cGMP-dependent protein kinase cascade in cerebellar Purkinje cells.
G-substrate, an endogenous substrate for cGMP-dependent protein kinase, exists almost exclusively in cerebellar Purkinje cells, where it is possibly involved in the induction of long-term depression. A G-substrate cDNA was identified by screening expressed sequence tag databases from a human brain library. The deduced amino acid sequence of human G-substrate contained two putative phosphorylation sites (Thr-68 and Thr-119) with amino acid sequences [KPRRKDT(p)PALH] that were identical to those reported for rabbit G-substrate. G-substrate mRNA was expressed almost exclusively in the cerebellum as a single transcript. The human G-substrate gene was mapped to human chromosome 7p15 by radiation hybrid panel analysis. In vitro translation products of the cDNA showed an apparent molecular mass of 24 kDa on SDS/PAGE which was close to that of purified rabbit G-substrate (23 kDa). Bacterially expressed human G-substrate is a heat-stable and acid-soluble protein that cross-reacts with antibodies raised against rabbit G-substrate. Recombinant human G-substrate was phosphorylated efficiently by cGMP-dependent protein kinase exclusively at Thr residues, and it was recognized by antibodies specific for rabbit phospho-G-substrate. The amino acid sequences surrounding the sites of phosphorylation in G-substrate are related to those around Thr-34 and Thr-35 of the dopamine- and cAMP-regulated phosphoprotein DARPP-32 and inhibitor-1, respectively, two potent inhibitors of protein phosphatase 1. However, purified G-substrate phosphorylated by cGMP-dependent protein kinase inhibited protein phosphatase 2A more effectively than protein phosphatase 1, suggesting a distinct role as a protein phosphatase inhibitor. (+info)
Cloning, expression, and genetic mapping of Sema W, a member of the semaphorin family.
The semaphorins comprise a large family of membrane-bound and secreted proteins, some of which have been shown to function in axon guidance. We have cloned a transmembrane semaphorin, Sema W, that belongs to the class IV subgroup of the semaphorin family. The mouse and rat forms of Sema W show 97% amino acid sequence identity with each other, and each shows about 91% identity with the human form. The gene for Sema W is divided into 15 exons, up to 4 of which are absent in the human cDNAs that we sequenced. Unlike many other semaphorins, Sema W is expressed at low levels in the developing embryo but was found to be expressed at high levels in the adult central nervous system and lung. Functional studies with purified membrane fractions from COS7 cells transfected with a Sema W expression plasmid showed that Sema W has growth-cone collapse activity against retinal ganglion-cell axons, indicating that vertebrate transmembrane semaphorins, like secreted semaphorins, can collapse growth cones. Genetic mapping of human SEMAW with human/hamster radiation hybrids localized the gene to chromosome 2p13. Genetic mapping of mouse Semaw with mouse/hamster radiation hybrids localized the gene to chromosome 6, and physical mapping placed the gene on bacteria artificial chromosomes carrying microsatellite markers D6Mit70 and D6Mit189. This localization places Semaw within the locus for motor neuron degeneration 2, making it an attractive candidate gene for this disease. (+info)
Exon shuffling by L1 retrotransposition.
Long interspersed nuclear elements (LINE-1s or L1s) are the most abundant retrotransposons in the human genome, and they serve as major sources of reverse transcriptase activity. Engineered L1s retrotranspose at high frequency in cultured human cells. Here it is shown that L1s insert into transcribed genes and retrotranspose sequences derived from their 3' flanks to new genomic locations. Thus, retrotransposition-competent L1s provide a vehicle to mobilize non-L1 sequences, such as exons or promoters, into existing genes and may represent a general mechanism for the evolution of new genes. (+info)
A RAPID algorithm for sequence database comparisons: application to the identification of vector contamination in the EMBL databases.
MOTIVATION: Word-matching algorithms such as BLAST are routinely used for sequence comparison. These algorithms typically use areas of matching words to seed alignments which are then used to assess the degree of sequence similarity. In this paper, we show that by formally separating the word-matching and sequence-alignment process, and using information about word frequencies to generate alignments and similarity scores, we can create a new sequence-comparison algorithm which is both fast and sensitive. The formal split between word searching and alignment allows users to select an appropriate alignment method without affecting the underlying similarity search. The algorithm has been used to develop software for identifying entries in DNA sequence databases which are contaminated with vector sequence. RESULTS: We present three algorithms, RAPID, PHAT and SPLAT, which together allow vector contaminations to be found and assessed extremely rapidly. RAPID is a word search algorithm which uses probabilities to modify the significance attached to different words; PHAT and SPLAT are alignment algorithms. An initial implementation has been shown to be approximately an order of magnitude faster than BLAST. The formal split between word searching and alignment not only offers considerable gains in performance, but also allows alignment generation to be viewed as a user interface problem, allowing the most useful output method to be selected without affecting the underlying similarity search. Receiver Operator Characteristic (ROC) analysis of an artificial test set allows the optimal score threshold for identifying vector contamination to be determined. ROC curves were also used to determine the optimum word size (nine) for finding vector contamination. An analysis of the entire expressed sequence tag (EST) subset of EMBL found a contamination rate of 0.27%. A more detailed analysis of the 50 000 ESTs in est10.dat (an EST subset of EMBL) finds an error rate of 0.86%, principally due to two large-scale projects. AVAILABILITY: A Web page for the software exists at http://bioinf.man.ac.uk/rapid, or it can be downloaded from ftp://ftp.bioinf.man.ac.uk/RAPID CONTACT: [email protected] (+info)