Organization of heterogeneous scientific data using the EAV/CR representation. (33/2276)

Entity-attribute-value (EAV) representation is a means of organizing highly heterogeneous data using a relatively simple physical database schema. EAV representation is widely used in the medical domain, most notably in the storage of data related to clinical patient records. Its potential strengths suggest its use in other biomedical areas, in particular research databases whose schemas are complex as well as constantly changing to reflect evolving knowledge in rapidly advancing scientific domains. When deployed for such purposes, the basic EAV representation needs to be augmented significantly to handle the modeling of complex objects (classes) as well as to manage interobject relationships. The authors refer to their modification of the basic EAV paradigm as EAV/CR (EAV with classes and relationships). They describe EAV/CR representation with examples from two biomedical databases that use it.  (+info)

Analyzing qualitative data with computer software. (34/2276)

OBJECTIVE: To provide health services researchers with an overview of the qualitative data analysis process and the role of software within it; to provide a principled approach to choosing among software packages to support qualitative data analysis; to alert researchers to the potential benefits and limitations of such software; and to provide an overview of the developments to be expected in the field in the near future. DATA SOURCES, STUDY DESIGN, METHODS: This article does not include reports of empirical research. CONCLUSIONS: Software for qualitative data analysis can benefit the researcher in terms of speed, consistency, rigor, and access to analytic methods not available by hand. Software, however, is not a replacement for methodological training.  (+info)

The COG database: a tool for genome-scale analysis of protein functions and evolution. (35/2276)

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.  (+info)

The EcoCyc and MetaCyc databases. (36/2276)

EcoCyc is an organism-specific Pathway/Genome Database that describes the metabolic and signal-transduction pathways of Escherichia coli, its enzymes, and-a new addition-its transport proteins. MetaCyc is a new metabolic-pathway database that describes pathways and enzymes of many different organisms, with a microbial focus. Both databases are queried using the Pathway Tools graphical user interface, which provides a wide variety of query operations and visualization tools. EcoCyc and MetaCyc are available at http://ecocyc.PangeaSystems.com/ecocyc/  (+info)

Integrating functional genomic information into the Saccharomyces genome database. (37/2276)

The Saccharomyces Genome Database (SGD) stores and organizes information about the nearly 6200 genes in the yeast genome. The information is organized around the 'locus page' and directs users to the detailed information they seek. SGD is endeavoring to integrate the existing information about yeast genes with the large volume of data generated by functional analyses that are beginning to appear in the literature and on web sites. New features will include searches of systematic analyses and Gene Summary Paragraphs that succinctly review the literature for each gene. In addition to current information, such as gene product and phenotype descriptions, the new locus page will also describe a gene product's cellular process, function and localization using a controlled vocabulary developed in collaboration with two other model organism databases. We describe these developments in SGD through the newly reorganized locus page. The SGD is accessible via the WWW at http://genome-www.stanford.edu/Saccharomyces/  (+info)

The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans. (38/2276)

The Intronerator (http://www.cse.ucsc.edu/ approximately kent/intronerator/ ) is a set of web-based tools for exploring RNA splicing and gene structure in Caenorhabditis elegans. It includes a display of cDNA alignments with the genomic sequence, a catalog of alternatively spliced genes and a database of introns. The cDNA alignments include >100 000 ESTs and almost 1000 full-length cDNAs. ESTs from embryos and mixed stage animals as well as full-length cDNAs can be compared in the alignment display with each other and with predicted genes. The alt-splicing catalog includes 844 open reading frames for which there is evidence of alternative splicing of pre-mRNA. The intron database includes 28 478 introns, and can be searched for patterns near the splice junctions.  (+info)

DAtA: database of Arabidopsis thaliana annotation. (39/2276)

The Database of Arabidopsis thaliana Annotation (D At A) was created to enable easy access to and analysis of all the Arabidopsis genome project annotation. The database was constructed using the completed A.thaliana genomic sequence data currently in GenBank. An automated annotation process was used to predict coding sequences for GenBank records that do not include annotation. D At A also contains protein motifs and protein similarities derived from searches of the proteins in D At A with motif databases and the non-redundant protein database. The database is routinely updated to include new GenBank submissions for Arabidopsis genomic sequences and new Blast and protein motif search results. A web interface to D At A allows coding sequences to be searched by name, comment, blast similarity or motif field. In addition, browse options present lists of either all the protein names or identified motifs present in the sequenced A.thaliana genome. The database can be accessed at http://baggage. stanford.edu/group/arabprotein/  (+info)

UK CropNet: a collection of databases and bioinformatics resources for crop plant genomics. (40/2276)

The UK Crop Plant Bioinformatics Network (UK CropNet) was established in 1996 in order to harness the extensive work in genome mapping in crop plants in the UK. Since this date we have published five databases from our central UK CropNet WWW site (http://synteny.nott.ac.uk/) with a further three to follow shortly. Our resource facilitates the identification and manipulation of agronomically important genes by laying a foundation for comparative analysis among crop plants and model species. In addition, we have developed a number of software tools that facilitate the visualisation and analysis of our data. Many of our tools are made freely available for use with both crop plant data and with data from other species.  (+info)