Representing genomic knowledge in the UMLS semantic network. (73/8470)

Genomics research has a significant impact on the understanding and treatment of human hereditary diseases, and biomedical literature concerning the genome project is becoming more and more important for clinicians. The Unified Medical Language System (UMLS) is designed to facilitate the retrieval and integration of information from multiple-readable biomedical information resources. This paper describes our efforts to integrate concepts important to genomics research with the UMLS semantic network. We found that the UMLS contains over 30 semantic types and most of the semantic relations that are essential for representing the underlying genomic knowledge. In addition, we observed that the organization of the network was appropriate for representing the hierarchical organization of the concepts. Because some of the concepts critical to the genomic domain were found to be missing, we propose to extend the network by adding six new semantic types and sixteen new semantic relations.  (+info)

Hunting with traps: genome-wide strategies for gene discovery and functional analysis. (74/8470)

With sequence analysis of the human genome well underway, there is an increasingly urgent challenge to understand the fundamental function and interplay of genes that build and maintain an organism. Several approaches will be critical for interpreting gene function, including random cDNA sequencing, expression profiling in different tissues, genetic analysis of human or model organism phenotypes, and creation of transgenic or "knockout" animals. Traditional gene-trapping approaches, in which genes are randomly disrupted with DNA elements inserted throughout the genome, have been used to generate large numbers of mutant organisms for genetic analysis. Recent modifications of gene-trapping methods and their increased use in mammalian systems are likely to result in a wealth of new information on gene function. Various trapping strategies allow genes to be segregated based on criteria like the specific subcellular location of an encoded protein, the tissue expression profile, or responsiveness to specific stimuli. Genome-wide gene-trapping strategies, which integrate gene discovery and expression profiling, can be applied in a massively parallel format to produce living assays for drug discovery.  (+info)

A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. (75/8470)

The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313, 103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1. 86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented.  (+info)

Improved fidelity of thermostable ligases for detection of microsatellite repeat sequences using nucleoside analogs. (76/8470)

Microsatellite repeats consisting of dinucleotide sequences are ubiquitous in the human genome and have proven useful for linkage analysis, positional cloning and forensic identification purposes. In this study, the potential of utilizing the ligase detection reaction for the analysis of such microsatellite repeat sequences was investigated. Initially, the fidelity of thermostable DNA ligases was measured for model dinucleotide repeat sequences. Subsequently, the effect of modified oligonucleotides on ligation fidelity for dinucleotide repeats was determined using the nucleoside analogs nitroimidazole, inosine, 7-deazaguanosine and 2-pyrimidinone, as well as natural base mismatches. The measured error rates for a standard dinucleotide template indicated that the nitroimidazole nucleoside analogs could be used to increase the fidelity of ligation when compared to unmodified primers. Furthermore, use of formamide in the ligation buffer also increased ligation fidelity for dinucleotide repeat sequences. Using ligation-based assays to detect polymorphic alleles of microsatellite repeats in the human genome opens the possibility of using array-based typing of these loci for human identification, loss-of-heterozygosity studies and linkage analysis.  (+info)

Mutation rates in humans. II. Sporadic mutation-specific rates and rate of detrimental human mutations inferred from hemophilia B. (77/8470)

We estimated the rates per base per generation of specific types of mutations, using our direct estimate of the overall mutation rate for hemophilia B and information on the mutations present in the United Kingdom's population as well as those reported year by year in the hemophilia B world database. These rates are as follows: transitions at CpG sites 9.7x10-8, other transitions 7.3x10-9, transversions at CpG sites 5.4x10-9, other transversions 6.9x10-9, and small deletions/insertions causing frameshifts 3.2x10-10. By taking into account the ratio of male to female mutation rates, the above figures were converted into rates appropriate for autosomal DNA-namely, 1.3x10-7, 9.9x10-9, 7.3x10-9, 9.4x10-9, 6.5x10-10, where the latter is the rate for all small deletion/insertion events. Mutation rates were also independently estimated from the sequence divergence observed in randomly chosen sequences from the human and chimpanzee X and Y chromosomes. These estimates were highly compatible with those obtained from hemophilia B and showed higher mutation rates in the male, but they showed no evidence for a significant excess of transitions at CpG sites in the spectrum of Y-sequence divergence relative to that of X-chromosome divergence. Our data suggest an overall mutation rate of 2.14x10-8 per base per generation, or 128 mutations per human zygote. Since the effective target for hemophilia B mutations is only 1.05% of the factor IX gene, the rate of detrimental mutations, per human zygote, suggested by the hemophilia data is approximately 1.3.  (+info)

Variant genotypes of the low-affinity Fcgamma receptors in two control populations and a review of low-affinity Fcgamma receptor polymorphisms in control and disease populations. (78/8470)

Fcgamma-receptors (FcgammaR) provide a critical link between humoral and cellular immunity. The genes of the low-affinity receptors for IgG and their isoforms, namely, FcgammaRIIa, FcgammaRIIb, FcgammaRIIIa, FcgammaRIIIb, and SH-FcgammaRIIIb, are located in close proximity on chromosome 1q22. Variant alleles may differ in biologic activity and a number of studies have reported the frequencies of variant FcgammaR alleles in both disease and control populations. No large study has evaluated the possibility of a nonrandom distribution of variant genotypes. We analyzed 395 normal individuals (172 African Americans [AA] and 223 Caucasians [CA]) at the following loci: FcgammaRIIa, FcgammaRIIIa, and FcgammaRIIIb, including the SH-FcgammaRIIIb. The genotypic distributions of FcgammaRIIa, FcgammaRIIIa, and FcgammaRIIIb conform to the Hardy-Weinberg law in each group. There was no strong evidence that combinations of 2-locus genotypes of the 3 loci deviated from random distributions in these healthy control populations. The distribution of SH-FcgammaRIIIb is underrepresented in CA compared with AA (P < .0001) controls. A previously reported variant FcgammaRIIb was not detected in 70 normal individuals, indicating that this allele, if it exists, is very rare (<1%). In conclusion, we present data that should serve as the foundation for the interpretation of association studies involving multiple variant alleles of the low-affinity FcgammaR.  (+info)

Database resources of the National Center for Biotechnology Information. (79/8470)

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing pages, GeneMap'99, Davis Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP) pages, Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP) pages, SAGEmap, Online Mendelian Inheritance in Man (OMIM) and the Molecular Modeling Database (MMDB). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov  (+info)

Cloning of polymorphisms (COP): enrichment of polymorphic sequences from complex genomes. (80/8470)

Here we describe a new procedure (cloning of polymorphisms, COP) for enrichment of single nucleotide polymorphisms (SNPs) that represent restriction fragment length polymorphisms (RFLPs). COP would be applicable to the isolation of SNPs from particular regions of the genome, e.g. CpG islands, chromosomal bands, YACs or PAC contigs. A combination of digestion with restriction enzymes, treatment with uracil-DNA glycosylase and mung bean nuclease, PCR amplification and purification with streptavidin magnetic beads was used to isolate polymorphic sequences from the genomes of two human samples. After only two cycles of enrichment, 80% of the isolated clones were found to contain RFLPs. A simple method for the PCR detection of these polymorphisms was also developed.  (+info)