Archive of mass spectral data files on recordable CD-ROMs and creation and maintenance of a searchable computerized database.

A database containing names of mass spectral data files generated in a forensic toxicology laboratory and two Microsoft Visual Basic programs to maintain and search this database is described. The data files (approximately 0.5 KB/each) were collected from six mass spectrometers during routine casework. Data files were archived on 650 MB (74 min) recordable CD-ROMs. Each recordable CD-ROM was given a unique name, and its list of data file names was placed into the database. The present manuscript describes the use of search and maintenance programs for searching and routine upkeep of the database and creation of CD-ROMs for archiving of data files.

Mining SNPs from EST databases.

There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs.

An effective approach for analyzing "prefinished" genomic sequence data.

Ongoing efforts to sequence the human genome are already generating large amounts of data, with substantial increases anticipated over the next few years. In most cases, a shotgun sequencing strategy is being used, which rapidly yields most of the primary sequence in incompletely assembled sequence contigs ("prefinished" sequence) and more slowly produces the final, completely assembled sequence ("finished" sequence). Thus, in general, prefinished sequence is produced in excess of finished sequence, and this trend is certain to continue and even accelerate over the next few years. Even at a prefinished stage, genomic sequence represents a rich source of important biological information that is of great interest to many investigators. However, analyzing such data is a challenging and daunting task, both because of its sheer volume and because it can change on a day-by-day basis. To facilitate the discovery and characterization of genes and other important elements within prefinished sequence, we have developed an analytical strategy and system that uses readily available software tools in new combinations. Implementation of this strategy for the analysis of prefinished sequence data from human chromosome 7 has demonstrated that this is a convenient, inexpensive, and extensible solution to the problem of analyzing the large amounts of preliminary data being produced by large-scale sequencing efforts. Our approach is accessible to any investigator who wishes to assimilate additional information about particular sequence data en route to developing richer annotations of a finished sequence.  (+info)

The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics.

Expression profiles of 5058 human gene transcripts represented by an array of 7451 clones from the first IMAGE Consortium cDNA library from infant brain have been collected by semiquantitative hybridization of the array with complex probes derived by reverse transcription of mRNA from brain and five other human tissues. Twenty-one percent of the clones corresponded to transcripts that could be classified in general categories of low, moderate, or high abundance. These expression profiles were integrated with cDNA clone and sequence clustering and gene mapping information from an upgraded version of the Genexpress Index. For seven gene transcripts found to be transcribed preferentially or specifically in brain, the expression profiles were confirmed by Northern blot analyses of mRNA from eight adult and four fetal tissues, and 15 distinct regions of brain. In four instances, further documentation of the sites of expression was obtained by in situ hybridization of rat-brain tissue sections. A systematic effort was undertaken to further integrate available cytogenetic, genetic, physical, and genic map informations through radiation-hybrid mapping to provide a unique validated map location for each of these genes in relation to the disease map. The resulting Genexpress IMAGE Knowledge Base is illustrated by five examples presented in the printed article with additional data available on a dedicated Web site.

Renal failure predisposes patients to adverse outcome after coronary artery bypass surgery. VA Cooperative Study #5.

BACKGROUND: More than 600,000 coronary artery bypass graft (CABG) procedures are done annually in the United States. Some data indicate that 10 to 20% of patients who are undergoing a CABG procedure have a serum creatinine of more than 1.5 mg/dl. There are few data on the impact of a mild increase in serum creatinine concentration on CABG outcome. METHODS: We analyzed a Veterans Affairs database obtained prospectively from 1992 through 1996 at 14 of 43 centers performing heart surgery. We compared the outcome after CABG in patients with a baseline serum creatinine of less than 1.5 mg/dl (median 1.1 mg/dl, N = 3271) to patients with a baseline serum creatinine of 1.5 to 3.0 mg/dl (median 1.7, N = 631). RESULTS: Univariate analysis revealed that patients with a serum creatinine of 1.5 to 3.0 mg/dl had a higher 30-day mortality (7% vs. 3%, P < 0.001) requirement for prolonged mechanical ventilation (15% vs. 8%, P = 0.001), stroke (7% vs. 2%, P < 0.001), renal failure requiring dialysis at discharge (3% vs. 1%, P < 0.001), and bleeding complications (8% vs. 3%, P < 0.001) than patients with a baseline serum creatinine of less than 1.5 mg/dl. Multiple logistic regression analyses found that patients with a baseline serum creatinine of less than 1.5 mg/dl had significantly lower (P < 0.02) 30-day mortality and postoperative bleeding and ventilatory complications than patients with a serum creatinine of 1.5 to 3.0 mg/dl when controlling for all other variables. CONCLUSION: These results demonstrate that mild renal failure is an independent risk factor for adverse outcome after CABG.

Complete exon-intron organization of the mouse fibulin-1 gene and its comparison with the human fibulin-1 gene.

Fibulin-1 is a 90 kDa calcium-binding protein present in the extracellular matrix and in the blood. Two major variants, C and D, differ in their C-termini as well as the ability to bind the basement membrane protein nidogen. Here we characterized genomic clones encoding the mouse fibulin-1 gene, which contains 18 exons spanning at least 75 kb of DNA. The two variants are generated by alternative splicing of exons in the 3' end. By searching the database we identified most of the exons encoding the human fibulin-1 gene and showed that its exon-intron organization is similar to that of the mouse gene.

Estimation of the number of alpha-helical and beta-strand segments in proteins using circular dichroism spectroscopy.

A simple approach to estimate the number of alpha-helical and beta-strand segments from protein circular dichroism spectra is described. The alpha-helix and beta-sheet conformations in globular protein structures, assigned by DSSP and STRIDE algorithms, were divided into regular and distorted fractions by considering a certain number of terminal residues in a given alpha-helix or beta-strand segment to be distorted. The resulting secondary structure fractions for 29 reference proteins were used in the analyses of circular dichroism spectra by the SELCON method. From the performance indices of the analyses, we determined that, on an average, four residues per alpha-helix and two residues per beta-strand may be considered distorted in proteins. The number of alpha-helical and beta-strand segments and their average length in a given protein were estimated from the fraction of distorted alpha-helix and beta-strand conformations determined from the analysis of circular dichroism spectra. The statistical test for the reference protein set shows the high reliability of such a classification of protein secondary structure. The method was used to analyze the circular dichroism spectra of four additional proteins and the predicted structural characteristics agree with the crystal structure data.

A novel clan of zinc metallopeptidases with possible intramembrane cleavage properties.

Computer-based database searching and protein multiple sequence alignment has identified a novel clan of zinc metallopeptidases, which, by phylogenetic analysis, has been shown to contain six subfamilies. The family is characterized by four common transmembrane segments and three conserved sequence motifs. The combination of topology analysis and motif identification has detected three potential Zn2+ coordinating residues. Only two of the sequences of this novel zinc metallopeptidase clan possess any functional annotation, one of which is able to cleave its substrate within a cytosol/transmembrane segment junction. A number of observations suggest that the remaining members of this novel clan may also cleave their substrates within transmembrane segments.