Annotation. Medical search. Definitions

The addition of descriptive information about the function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA record.

Databases devoted to knowledge about specific genes and gene products.

Sequential operating programs and data which instruct the functioning of a digital computer.

A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets.

A loose confederation of computer communication networks around the world. The networks that make up the Internet are connected through several backbone networks. The Internet grew out of the US Government ARPAnet project and was designed to facilitate information exchange.

The systematic study of the complete DNA sequences (GENOME) of organisms.

The portion of an interactive computer program that issues messages to and receives commands from a user.

Databases containing information about PROTEINS such as AMINO ACID SEQUENCE; PROTEIN CONFORMATION; and other properties.

Software designed to store, manipulate, manage, and control data for specific uses.

Systematic organization, storage, retrieval, and dissemination of specialized information, especially of a scientific or technical nature (From ALA Glossary of Library and Information Science, 1983). It often involves authenticating or validating information.

A specified list of terms with a fixed and unalterable meaning, and from which a selection is made when CATALOGING; ABSTRACTING AND INDEXING; or searching BOOKS; JOURNALS AS TOPIC; and other documents. The control is intended to avoid the scattering of related subjects under different headings (SUBJECT HEADINGS). The list may be altered or extended only by the publisher or issuing agency. (From Harrod's Librarians' Glossary, 7th ed, p163)

Databases containing information about NUCLEIC ACIDS such as BASE SEQUENCE; SNPS; NUCLEIC ACID CONFORMATION; and other properties. Information about the DNA fragments kept in a GENE LIBRARY or GENOMIC LIBRARY is often maintained in DNA databases.

A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.

The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA.

A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.

Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived.

The process of pictorial communication, between human and computers, in which the computer input and output have the form of charts, drawings, or other appropriate pictorial representation.

The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell.

Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.

The genetic complement of a BACTERIA as represented in its DNA.

Computer processing of a language with rules that reflect and describe current usage rather than prescribed usage.

The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.

Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references.

Use of sophisticated analysis tools to sort through, organize, examine, and combine large sets of information.

The terms, expressions, designations, or symbols used in a particular science, discipline, or specialized subject area.

Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING.

The genetic complement of a plant (PLANTS) as represented in its DNA.

The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs.

The protein complement of an organism coded for by its genome.

A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both.

Sets of structured vocabularies used for describing and categorizing genes, and gene products by their molecular function, involvement in biological processes, and cellular location. These vocabularies and their associations to genes and gene products (Gene Ontology annotations) are generated and curated by the Gene Ontology Consortium.

A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE.

Any method used for determining the location of and relative distances between genes on a chromosome.

Specific languages used to prepare computer programs.

Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language.

The systematic study of the complete complement of proteins (PROTEOME) of organisms.

Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.

Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc.

Methods for determining interaction between PROTEINS.

The complete gene complement contained in a set of chromosomes in a fungus.

The pattern of GENE EXPRESSION at the level of genetic transcription in a specific organism or under specific circumstances in specific cells.

Collections of facts, assumptions, beliefs, and heuristics that are used in combination with databases to achieve desired results, such as a diagnosis, an interpretation, or a solution to a problem (From McGraw Hill Dictionary of Scientific and Technical Terms, 6th ed).

Software used to locate data or information stored in machine-readable form locally or at a distance such as an INTERNET site.

In INFORMATION RETRIEVAL, machine-sensing or identification of visible patterns (shapes, forms, and configurations). (Harrod's Librarians' Glossary, 7th ed)

The genetic complement of an archaeal organism (ARCHAEA) as represented in its DNA.

The relationships of groups of organisms as reflected by their genetic makeup.

The relationships between symbols and their meanings.

A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon (CODON, TERMINATOR).

A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms.

Activities performed to identify concepts and aspects of published information and research reports.

Specifications and instructions applied to the software.

A bibliographic database that includes MEDLINE as its primary subset. It is produced by the National Center for Biotechnology Information (NCBI), part of the NATIONAL LIBRARY OF MEDICINE. PubMed, which is searchable through NLM's Web site, also includes access to additional citations to selected life sciences journals not in MEDLINE, and links to other resources such as the full-text of articles at participating publishers' Web sites, NCBI's molecular biology databases, and PubMed Central.

A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences.

Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.

The process of cumulative change at the level of DNA; RNA; and PROTEINS, over successive generations.

The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence.

Biological molecules that possess catalytic activity. They may occur naturally or be synthetically created. Enzymes are usually proteins, however CATALYTIC RNA and CATALYTIC DNA molecules have also been identified.

Complex sets of enzymatic reactions connected to each other via their product and substrate metabolites.

Genes bearing close resemblance to known genes at different loci, but rendered non-functional by additions or deletions in structure that prevent normal transcription or translation. When lacking introns and containing a poly-A segment near the downstream end (as a result of reverse copying from processed nuclear RNA into double-stranded DNA), they are called processed genes.

Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment.

A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes. Examples of multigene families include those that encode the hemoglobins, immunoglobulins, histocompatibility antigens, actins, tubulins, keratins, collagens, heat shock proteins, salivary glue proteins, chorion proteins, cuticle proteins, yolk proteins, and phaseolins, as well as histones, ribosomal RNA, and transfer RNA genes. The latter three are examples of reiterated genes, where hundreds of identical genes are present in a tandem array. (King & Stanfield, A Dictionary of Genetics, 4th ed)

The complete genetic complement contained in a set of CHROMOSOMES in a protozoan.

A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often composed of conserved sequences.

Controlled operation of an apparatus, process, or system by mechanical or electronic devices that take the place of human organs of observation, effort, and decision. (From Webster's Collegiate Dictionary, 1993)

The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results.

Biological activities and function of the whole organism in human, animal, microorgansims, and plants, and of the biosphere.

The functional hereditary units of PLANTS.

Annual cereal grass of the family POACEAE and its edible starchy grain, rice, which is the staple food of roughly one-half of the world's population.

The parts of the messenger RNA sequence that do not code for product, i.e. the 5' UNTRANSLATED REGIONS and 3' UNTRANSLATED REGIONS.

A definite pathologic process with a characteristic set of signs and symptoms. It may affect the whole body or any of its parts, and its etiology, pathology, and prognosis may be known or unknown.

The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION.

A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information.

The genetic complement of a helminth (HELMINTHS) as represented in its DNA.

Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region.

Application of statistical procedures to analyze specific observed or assumed facts from a particular study.

The premier bibliographic database of the NATIONAL LIBRARY OF MEDICINE. MEDLINE® (MEDLARS Online) is the primary subset of PUBMED and can be searched on NLM's Web site in PubMed or the NLM Gateway. MEDLINE references are indexed with MEDICAL SUBJECT HEADINGS (MeSH).

Text editing and storage functions using computer software.

The presence of two or more genetic loci on the same chromosome. Extensions of this original definition refer to the similarity in content and organization between chromosomes, of different species for example.

Social media model for enabling public involvement and recruitment in participation. Use of social media to collect feedback and recruit volunteer subjects.

Interacting DNA-encoded regulatory subsystems in the GENOME that coordinate input from activator and repressor TRANSCRIPTION FACTORS during development, cell differentiation, or in response to environmental cues. The networks function to ultimately specify expression of particular sets of GENES for specific conditions, times, or locations.

The degree of 3-dimensional shape similarity between proteins. It can be an indication of distant AMINO ACID SEQUENCE HOMOLOGY and used for rational DRUG DESIGN.

Single-stranded complementary DNA synthesized from an RNA template by the action of RNA-dependent DNA polymerase. cDNA (i.e., complementary DNA, not circular DNA, not C-DNA) is used in a variety of molecular cloning experiments as well as serving as a specific hybridization probe.

Computerized compilations of information units (text, sound, graphics, and/or video) interconnected by logical nonlinear linkages that enable users to follow optimal paths through the material and also the systems used to create and display this information. (From Thesaurus of ERIC Descriptors, 1994)

The genetic complement of an insect (INSECTS) as represented in its DNA.

The genomic analysis of assemblages of organisms.

A process whereby multiple RNA transcripts are generated from a single gene. Alternative splicing involves the splicing together of other possible sets of EXONS during the processing of some, but not all, transcripts of the gene. Thus a particular exon may be connected to any one of several alternative exons to form a mature RNA. The alternative forms of mature MESSENGER RNA produce PROTEIN ISOFORMS in which one part of the isoforms is common while the other parts are different.

Structured vocabularies describing concepts from the fields of biology and relationships between concepts.

Systems where the input data enter the computer directly from the point of origin (usually a terminal or workstation) and/or in which output data are transmitted directly to that terminal point of origin. (Sippl, Computer Dictionary, 4th ed)

Description of pattern of recurrent functions or procedures frequently found in organizational processes, such as notification, decision, and action.

The level of protein structure in which combinations of secondary protein structures (alpha helices, beta sheets, loop regions, and motifs) pack together to form folded shapes called domains. Disulfide bridges between cysteines in two different parts of the polypeptide chain along with other interactions between the chains play a role in the formation and stabilization of tertiary structure. Small proteins usually consist of only one domain but larger proteins may contain a number of domains connected by segments of polypeptide chain which lack regular secondary structure.

Graphs representing sets of measurable, non-covalent physical contacts with specific PROTEINS in living organisms or in cells.

The systematic arrangement of entities in any field into categories classes based on common characteristics such as properties, morphology, subject matter, etc.

The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species.

RNA which does not code for protein but has some enzymatic, structural or regulatory function. Although ribosomal RNA (RNA, RIBOSOMAL) and transfer RNA (RNA, TRANSFER) are also untranslated RNAs they are not included in this scope.

The degree of similarity between sequences of amino acids. This information is useful for the analyzing genetic relatedness of proteins and species.

Statistical formulations or analyses which, when applied to data and found to fit the data, are then used to verify the assumptions and parameters used in the analysis. Examples of statistical models are the linear model, binomial model, polynomial model, two-parameter model, etc.

A single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population.

The act of testing the software for compliance with a standard.

A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system.

The functional hereditary units of INSECTS.

The different gene transcripts generated from a single gene by RNA EDITING or ALTERNATIVE SPLICING of RNA PRECURSORS.

A plant genus of the family BRASSICACEAE that contains ARABIDOPSIS PROTEINS and MADS DOMAIN PROTEINS. The species A. thaliana is used for experiments in classical plant genetics as well as molecular genetic studies in plant physiology, biochemistry, and development.

A publication issued at stated, more or less regular, intervals.

The degree of similarity between sequences. Studies of AMINO ACID SEQUENCE HOMOLOGY and NUCLEIC ACID SEQUENCE HOMOLOGY provide useful information about the genetic relatedness of genes, gene products, and species.

Genotypic differences observed among individuals in a population.

The parts of a transcript of a split GENE remaining after the INTRONS are removed. They are spliced together to become a MESSENGER RNA or other functional RNA.

Data processing largely performed by automatic means.

The genetic complement of a microorganism as represented in its DNA or in some microorganisms its RNA.

A codon that directs initiation of protein translation (TRANSLATION, GENETIC) by stimulating the binding of initiator tRNA (RNA, TRANSFER, MET). In prokaryotes, the codons AUG or GUG can act as initiators while in eukaryotes, AUG is the only initiator codon.

Nucleotide sequences of a gene that are involved in the regulation of GENETIC TRANSCRIPTION.

The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function.

A coordinated effort of researchers to map (CHROMOSOME MAPPING) and sequence (SEQUENCE ANALYSIS, DNA) the human GENOME.

A species of fruit fly much used in genetics because of the large size of its chromosomes.

A system containing any combination of computers, computer terminals, printers, audio or visual display devices, or telephones interconnected by telecommunications equipment or cables: used to transmit or receive information. (Random House Unabridged Dictionary, 2d ed)

The functional hereditary units of BACTERIA.

Works containing information articles on subjects in every field of knowledge, usually arranged in alphabetical order, or a similar work limited to a special field or subject. (From The ALA Glossary of Library and Information Science, 1983)

A system for verifying and maintaining a desired level of quality in a product or process by careful planning, use of proper equipment, continued inspection, and corrective action as required. (Random House Unabridged Dictionary, 2d ed)

The first nucleotide of a transcribed DNA sequence where RNA polymerase (DNA-DIRECTED RNA POLYMERASE) begins synthesizing the RNA transcript.

Management of the acquisition, organization, storage, retrieval, and dissemination of information. (From Thesaurus of ERIC Descriptors, 1994)

A genus of pufferfish commonly used for research.

Lists of words, usually in alphabetical order, giving information about form, pronunciation, etymology, grammar, and meaning.

Comprehensive, methodical analysis of complex biological systems by monitoring responses to perturbations of biological processes. Large scale, computerized collection and analysis of the data are used to develop and test models of biological systems.

The outward appearance of the individual. It is the product of interactions between genes, and between the GENOTYPE and the environment.

Computer-based representation of physical systems and phenomena such as chemical processes.

Short tracts of DNA sequence that are used as landmarks in GENOME mapping. In most instances, 200 to 500 base pairs of sequence define a Sequence Tagged Site (STS) that is operationally unique in the human genome (i.e., can be specifically detected by the polymerase chain reaction in the presence of all other genomic sequences). The overwhelming advantage of STSs over mapping landmarks defined in other ways is that the means of testing for the presence of a particular STS can be completely described as information in a database.

The parts of a macromolecule that directly participate in its specific combination with another molecule.

A family of gram-negative, non-motile bacteria from human and animal sources. One saprophytic species is known.

Theoretical representations that simulate the behavior or activity of biological processes or diseases. For disease models in living animals, DISEASE MODELS, ANIMAL is available. Biological models include the use of mathematical equations, computers, and other electronic equipment.

Processes occurring in various organisms by which new genes are copied. Gene duplication may result in a MULTIGENE FAMILY; supergenes or PSEUDOGENES.

DNA constructs that are composed of, at least, a REPLICATION ORIGIN, for successful replication, propagation to and maintenance as an extra chromosome in bacteria. In addition, they can carry large amounts (about 200 kilobases) of other sequence for a variety of bioengineering purposes.

Genes whose nucleotide sequences overlap to some degree. The overlapped sequences may involve structural or regulatory genes of eukaryotic or prokaryotic cells.

The sequential location of genes on a chromosome.

RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm.

Writings having excellence of form or expression and expressing ideas of permanent or universal interest. The body of written works produced in a particular language, country, or age. (Webster, 3d ed)

Organized collections of computer records, standardized in format and content, that are stored in any of a variety of computer-readable modes. They are the basic sets of data from which computer-readable files are created. (from ALA Glossary of Library and Information Science, 1983)

Proteins found in any species of bacterium.

Commonly observed structural components of proteins formed by simple combinations of adjacent secondary structures. A commonly observed structure may be composed of a CONSERVED SEQUENCE which can be represented by a CONSENSUS SEQUENCE.

Sequences of DNA in the genes that are located between the EXONS. They are transcribed along with the exons but are removed from the primary gene transcript by RNA SPLICING to leave mature RNA. Some introns code for separate genes.

Any of the processes by which nuclear, cytoplasmic, or intercellular factors influence the differential control (induction or repression) of gene action at the level of transcription or translation.

Commonly observed BASE SEQUENCE or nucleotide structural components which can be represented by a CONSENSUS SEQUENCE or a SEQUENCE LOGO.

Ribonucleic acid in plants having regulatory and catalytic roles as well as involvement in protein synthesis.

A basis of value established for the measure of quantity, weight, extent or quality, e.g. weight standards, standard solutions, methods, techniques, and procedures used in diagnosis and therapy.

A species of the genus SACCHAROMYCES, family Saccharomycetaceae, order Saccharomycetales, known as "baker's" or "brewer's" yeast. The dried form is used as a dietary supplement.

Nucleotide sequences located at the ends of EXONS and recognized in pre-messenger RNA by SPLICEOSOMES. They are joined during the RNA SPLICING reaction, forming the junctions between exons.

The systematic identification and quantitation of all the metabolic products of a cell, tissue, organ, or organism under varying conditions. The METABOLOME of a cell or organism is a dynamic collection of metabolites which represent its net response to current conditions.

Deoxyribonucleic acid that makes up the genetic material of plants.

One of the three domains of life (the others being BACTERIA and ARCHAEA), also called Eukarya. These are organisms whose cells are enclosed in membranes and possess a nucleus. They comprise almost all multicellular and many unicellular organisms, and are traditionally divided into groups (sometimes called kingdoms) including ANIMALS; PLANTS; FUNGI; and various algae and other taxa that were previously part of the old kingdom Protista.

The simultaneous analysis, on a microchip, of multiple samples or targets arranged in an array format.

A genus of gram-negative, aerotolerant, spiral-shaped bacteria isolated from water and associated with diarrhea in humans and animals.

The biosynthesis of RNA carried out on a template of DNA. The biosynthesis of DNA from an RNA template is called REVERSE TRANSCRIPTION.

Proteins found in plants (flowers, herbs, shrubs, trees, etc.). The concept does not include proteins found in vegetables for which VEGETABLE PROTEINS is available.

Any of the processes by which nuclear, cytoplasmic, or intercellular factors influence the differential control of gene action in plants.

Small double-stranded, non-protein coding RNAs, 21-25 nucleotides in length generated from single-stranded microRNA gene transcripts by the same RIBONUCLEASE III, Dicer, that produces small interfering RNAs (RNA, SMALL INTERFERING). They become part of the RNA-INDUCED SILENCING COMPLEX and repress the translation (TRANSLATION, GENETIC) of target RNA by binding to homologous 3'UTR region as an imperfect match. The small temporal RNAs (stRNAs), let-7 and lin-4, from C. elegans, are the first 2 miRNAs discovered, and are from a class of miRNAs involved in developmental timing.

The phenotypic manifestation of a gene or genes by the processes of GENETIC TRANSCRIPTION and GENETIC TRANSLATION.

Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures.

An analytical method used in determining the identity of a chemical based on its mass using mass analyzers/mass spectrometers.

The visual display of data in a man-machine system. An example is when data is called from the computer and transmitted to a CATHODE RAY TUBE DISPLAY or LIQUID CRYSTAL display.

Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process.

Any of the DNA in between gene-coding DNA, including untranslated regions, 5' and 3' flanking regions, INTRONS, non-functional pseudogenes, and non-functional repetitive sequences. This DNA may or may not encode regulatory functions.

Computer programs based on knowledge developed from consultation with experts on a problem, and the processing and/or formalizing of this knowledge using these programs in such a manner that the problems may be solved.

Approximate, quantitative reasoning that is concerned with the linguistic ambiguity which exists in natural or synthetic language. At its core are variables such as good, bad, and young as well as modifiers such as more, less, and very. These ordinary terms represent fuzzy sets in a particular problem. Fuzzy logic plays a key role in many medical expert systems.

Genetic loci associated with a QUANTITATIVE TRAIT.

A research and development program initiated by the NATIONAL LIBRARY OF MEDICINE to build knowledge sources for the purpose of aiding the development of systems that help health professionals retrieve and integrate biomedical information. The knowledge sources can be used to link disparate information systems to overcome retrieval problems caused by differences in terminology and the scattering of relevant information across many databases. The three knowledge sources are the Metathesaurus, the Semantic Network, and the Specialist Lexicon.

The characteristic 3-dimensional shape of a protein, including the secondary, supersecondary (motifs), tertiary (domains) and quaternary structure of the peptide chain. PROTEIN STRUCTURE, QUATERNARY describes the conformation assumed by multimeric proteins (aggregates of more than one polypeptide chain).

A species of nematode that is widely used in biological, biochemical, and genetic studies.

The functional hereditary units of FUNGI.

An analysis comparing the allele frequencies of all available (or a whole GENOME representative set of) polymorphic markers in unrelated patients with a specific symptom or disease condition, and those of healthy controls to identify markers associated with a specific disease or condition.

Discrete segments of DNA which can excise and reintegrate to another site in the genome. Most are inactive, i.e., have not been found to exist outside the integrated state. DNA transposable elements include bacterial IS (insertion sequence) elements, Tn elements, the maize controlling elements Ac and Ds, Drosophila P, gypsy, and pogo elements, the human Tigger elements and the Tc and mariner elements which are found throughout the animal kingdom.

The functional genetic units of ARCHAEA.

The field of information science concerned with the analysis and dissemination of data through the application of computers.

Multicellular, eukaryotic life forms of kingdom Plantae (sensu lato), comprising the VIRIDIPLANTAE; RHODOPHYTA; and GLAUCOPHYTA; all of which acquired chloroplasts by direct endosymbiosis of CYANOBACTERIA. They are characterized by a mainly photosynthetic mode of nutrition; essentially unlimited growth at localized regions of cell divisions (MERISTEMS); cellulose within cells providing rigidity; the absence of organs of locomotion; absence of nervous and sensory systems; and an alternation of haploid and diploid generations.

Learning algorithms which are a set of related supervised computer learning methods that analyze data and recognize patterns, and used for classification and regression analysis.

A genus of the family POXVIRIDAE, subfamily CHORDOPOXVIRINAE, comprising poxviruses infecting sheep, goats, and cattle. Transmission is usually mechanical by arthropods, but also includes contact, airborne routes, and non-living reservoirs (fomites).

A chemical dictionary is a reference book or digital resource that provides definitions, descriptions, and information about various chemicals, their properties, reactions, uses, and safety measures, organized in an alphabetical or systematic order for easy lookup and understanding.

Complex nucleoprotein structures which contain the genomic DNA and are part of the CELL NUCLEUS of PLANTS.

Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed)

Cells of the higher organisms, containing a true nucleus bounded by a nuclear membrane.

Rapid methods of measuring the effects of an agent in a biological or chemical assay. The assay usually involves some form of automation or a way to conduct multiple assays at the same time using sample arrays.

A polynucleotide consisting essentially of chains with a repeating backbone of phosphate and ribose units to which nitrogenous bases are attached. RNA is unique among biological macromolecules in that it can encode genetic information, serve as an abundant structural component of cells, and also possesses catalytic activity. (Rieger et al., Glossary of Genetics: Classical and Molecular, 5th ed)

A plant genus of the family SALICACEAE. Balm of Gilead is a common name used for P. candicans, or P. gileadensis, or P. jackii, and sometimes also used for ABIES BALSAMEA or for COMMIPHORA.

The parts of the gene sequence that carry out the different functions of the GENES.

A collective genome representative of the many organisms, primarily microorganisms, existing in a community.

The complete genetic complement contained in a DNA or RNA molecule in a virus.

One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal. Bacteria can be classified by their response to OXYGEN: aerobic, anaerobic, or facultatively anaerobic; by the mode by which they obtain their energy: chemotrophy (via chemical reaction) or PHOTOTROPHY (via light reaction); for chemotrophs by their source of chemical energy: CHEMOLITHOTROPHY (from inorganic compounds) or chemoorganotrophy (from organic compounds); and by their source for CARBON; NITROGEN; etc.; HETEROTROPHY (from organic sources) or AUTOTROPHY (from CARBON DIOXIDE). They can also be classified by whether or not they stain (based on the structure of their CELL WALLS) with CRYSTAL VIOLET dye: gram-negative or gram-positive.

Sets of enzymatic reactions occurring in organisms and that form biochemicals by making new covalent bonds.

The functional hereditary units of HELMINTHS.

A mass spectrometry technique using two (MS/MS) or more mass analyzers. With two in tandem, the precursor ions are mass-selected by a first mass analyzer, and focused into a collision region where they are then fragmented into product ions which are then characterized by a second mass analyzer. A variety of techniques are used to separate the compounds, ionize them, and introduce them to the first mass analyzer. For example, for in GC-MS/MS, GAS CHROMATOGRAPHY-MASS SPECTROMETRY is involved in separating relatively small compounds by GAS CHROMATOGRAPHY prior to injecting them into an ionization chamber for the mass selection.

Tabular numerical representations of sequence motifs displaying their variability as likelihood values for each possible residue at each position in a sequence. Position-specific scoring matrices (PSSMs) are calculated from position frequency matrices.

Nucleic acid sequences involved in regulating the expression of genes.

The branch of science concerned with the means and consequences of transmission and generation of the components of biological inheritance. (Stedman, 26th ed)

Short RNA, about 200 base pairs in length or shorter, that does not code for protein.

The dynamic collection of metabolites which represent a cell's or organism's net metabolic response to current conditions.

A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result.

An agency of the NATIONAL INSTITUTES OF HEALTH concerned with overall planning, promoting, and administering programs pertaining to advancement of medical and related sciences. Major activities of this institute include the collection, dissemination, and exchange of information important to the progress of medicine and health, research in medical informatics and support for medical library development.

A variation of the PCR technique in which cDNA is made from RNA via reverse transcription. The resultant cDNA is then amplified using standard PCR protocols.

Mapping of the linear order of genes on a chromosome with units indicating their distances by using methods other than genetic recombination. These methods include nucleotide sequencing, overlapping deletions in polytene chromosomes, and electron micrography of heteroduplex DNA. (From King & Stansfield, A Dictionary of Genetics, 5th ed)

Extensive collections, reputedly complete, of references and citations to books, articles, publications, etc., generally on a single subject or specialized subject area. Databases can operate through automated files, libraries, or computer disks. The concept should be differentiated from DATABASES, FACTUAL which is used for collections of data and facts apart from bibliographic references to them.

Individual's rights to obtain and use information collected or generated by others.

A genus of gram-negative, facultatively anaerobic rods. It is a saprophytic, marine organism which is often isolated from spoiling fish.

Discussion of lists of works, documents or other publications, usually with some relationship between them, e.g., by a given author, on a given subject, or published in a given place, and differing from a catalog in that its contents are restricted to holdings of a single collection, library, or group of libraries. (from The ALA Glossary of Library and Information Science, 1983)

A species of gram-negative, rod-shaped bacteria belonging to the K serogroup of ESCHERICHIA COLI. It lives as a harmless inhabitant of the human LARGE INTESTINE and is widely used in medical and GENETIC RESEARCH.

A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is the one which occurs most frequently at that site in the different sequences which occur in nature. The phrase also refers to an actual sequence which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensus sequence. Commonly observed supersecondary protein structures (AMINO ACID MOTIFS) are often formed by conserved sequences.

Anatomy5

Organisms27

Diseases2

Chemicals and Drugs28

Analytical, Diagnostic and Therapeutic Techniques and Equipment35

Psychiatry and Psychology2

Phenomena and Processes93

Disciplines and Occupations12

Anthropology, Education, Sociology and Social Phenomena2

Technology, Industry, Agriculture3

Humanities2

Information Science67

Health Care14

Molecular Sequence Annotation

Databases, Genetic

Software

Computational Biology

Internet

Genomics

User-Computer Interface

Databases, Protein

Database Management Systems

Documentation

Vocabulary, Controlled

Databases, Nucleic Acid

Algorithms

Genome

Sequence Analysis, Protein

Expressed Sequence Tags

Computer Graphics

Gene Expression Profiling

Proteins

Genome, Bacterial

Natural Language Processing

Sequence Alignment

Databases, Factual

Data Mining

Terminology as Topic

Oligonucleotide Array Sequence Analysis

Genome, Plant

Genome, Human

Proteome

Cluster Analysis

Gene Ontology

Sequence Analysis, RNA

Chromosome Mapping

Programming Languages

Artificial Intelligence

Proteomics

Molecular Sequence Data

High-Throughput Nucleotide Sequencing

Protein Interaction Mapping

Genome, Fungal

Transcriptome

Knowledge Bases

Search Engine

Pattern Recognition, Automated

Genome, Archaeal

Phylogeny

Semantics

Open Reading Frames

Genes

Abstracting and Indexing as Topic

Software Design

PubMed

Gene Library

Contig Mapping

Evolution, Molecular

Base Sequence

Enzymes

Metabolic Networks and Pathways

Pseudogenes

Models, Genetic

Multigene Family

Genome, Protozoan

Conserved Sequence

Automation

Reproducibility of Results

Biological Processes

Genes, Plant

Oryza sativa

Untranslated Regions

Disease

Amino Acid Sequence

Sequence Analysis

Genome, Helminth

Prokaryotic Cells

Data Interpretation, Statistical

MEDLINE

Word Processing

Synteny

Crowdsourcing

Gene Regulatory Networks