The addition of descriptive information about the function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA record.
Sequential operating programs and data which instruct the functioning of a digital computer.
The portion of an interactive computer program that issues messages to and receives commands from a user.
The relationships of groups of organisms as reflected by their genetic makeup.
A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets.
Databases devoted to knowledge about specific genes and gene products.
The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.
Software designed to store, manipulate, manage, and control data for specific uses.
Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.
A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis.
A loose confederation of computer communication networks around the world. The networks that make up the Internet are connected through several backbone networks. The Internet grew out of the US Government ARPAnet project and was designed to facilitate information exchange.
A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.
Systematic organization, storage, retrieval, and dissemination of specialized information, especially of a scientific or technical nature (From ALA Glossary of Library and Information Science, 1983). It often involves authenticating or validating information.
A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.
Databases containing information about PROTEINS such as AMINO ACID SEQUENCE; PROTEIN CONFORMATION; and other properties.
The process of cumulative change at the level of DNA; RNA; and PROTEINS, over successive generations.
The systematic study of the complete DNA sequences (GENOME) of organisms.
Databases containing information about NUCLEIC ACIDS such as BASE SEQUENCE; SNPS; NUCLEIC ACID CONFORMATION; and other properties. Information about the DNA fragments kept in a GENE LIBRARY or GENOMIC LIBRARY is often maintained in DNA databases.
Organized activities related to the storage, location, search, and retrieval of information.
A theorem in probability theory named for Thomas Bayes (1702-1761). In epidemiology, it is used to obtain the probability of disease in a group of people with some characteristic on the basis of the overall rate of that disease and of the likelihood of that characteristic in healthy and diseased individuals. The most familiar application is in clinical decision analysis where it is used for estimating the probability of a particular diagnosis given the appearance of some symptoms or test result.
The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA.
The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence.
Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived.
Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment.
The process of pictorial communication, between human and computers, in which the computer input and output have the form of charts, drawings, or other appropriate pictorial representation.
Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references.
Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.
Functions constructed from a statistical model and a set of observed data which give the probability of that data for various values of the unknown model parameters. Those parameter values that maximize the probability are the maximum likelihood estimates of the parameters.
A stochastic process such that the conditional probability distribution for a state at any future instant, given the present state, is unaffected by any additional knowledge of the past history of the system.
Remains, impressions, or traces of animals or plants of past geological times which have been preserved in the earth's crust.
A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information.
Constituent of the 40S subunit of eukaryotic ribosomes. 18S rRNA is involved in the initiation of polypeptide synthesis in eukaryotes.
The process of cumulative change over successive generations through which organisms acquire their distinguishing morphological and physiological characteristics.
The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION.
In statistics, a technique for numerically approximating the solution of a mathematical problem by studying the distribution of some random variable, often generated by a computer. The name alludes to the randomness characteristic of the games of chance played at the gambling casinos in Monte Carlo. (From Random House Unabridged Dictionary, 2d ed, 1993)
A specified list of terms with a fixed and unalterable meaning, and from which a selection is made when CATALOGING; ABSTRACTING AND INDEXING; or searching BOOKS; JOURNALS AS TOPIC; and other documents. The control is intended to avoid the scattering of related subjects under different headings (SUBJECT HEADINGS). The list may be altered or extended only by the publisher or issuing agency. (From Harrod's Librarians' Glossary, 7th ed, p163)
Double-stranded DNA of MITOCHONDRIA. In eukaryotes, the mitochondrial GENOME is circular and codes for ribosomal RNAs, transfer RNAs, and about 10 proteins.
Computer-based representation of physical systems and phenomena such as chemical processes.
Genotypic differences observed among individuals in a population.
DNA sequences encoding RIBOSOMAL RNA and the segments of DNA separating the individual ribosomal RNA genes, referred to as RIBOSOMAL SPACER DNA.
The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell.
The genetic complement of a BACTERIA as represented in its DNA.
Computer processing of a language with rules that reflect and describe current usage rather than prescribed usage.
Use of sophisticated analysis tools to sort through, organize, examine, and combine large sets of information.
The terms, expressions, designations, or symbols used in a particular science, discipline, or specialized subject area.
Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING.
The genetic complement of a plant (PLANTS) as represented in its DNA.
The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs.
The protein complement of an organism coded for by its genome.
A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both.
Sets of structured vocabularies used for describing and categorizing genes, and gene products by their molecular function, involvement in biological processes, and cellular location. These vocabularies and their associations to genes and gene products (Gene Ontology annotations) are generated and curated by the Gene Ontology Consortium.
A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE.
Any method used for determining the location of and relative distances between genes on a chromosome.
Specific languages used to prepare computer programs.
Theory and development of COMPUTER SYSTEMS which perform tasks that normally require human intelligence. Such tasks may include speech recognition, LEARNING; VISUAL PERCEPTION; MATHEMATICAL COMPUTING; reasoning, PROBLEM SOLVING, DECISION-MAKING, and translation of language.
The systematic study of the complete complement of proteins (PROTEOME) of organisms.
Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc.
Methods for determining interaction between PROTEINS.
The complete gene complement contained in a set of chromosomes in a fungus.
The pattern of GENE EXPRESSION at the level of genetic transcription in a specific organism or under specific circumstances in specific cells.
Collections of facts, assumptions, beliefs, and heuristics that are used in combination with databases to achieve desired results, such as a diagnosis, an interpretation, or a solution to a problem (From McGraw Hill Dictionary of Scientific and Technical Terms, 6th ed).
Software used to locate data or information stored in machine-readable form locally or at a distance such as an INTERNET site.
In INFORMATION RETRIEVAL, machine-sensing or identification of visible patterns (shapes, forms, and configurations). (Harrod's Librarians' Glossary, 7th ed)
The genetic complement of an archaeal organism (ARCHAEA) as represented in its DNA.
The relationships between symbols and their meanings.
A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon (CODON, TERMINATOR).
A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms.
Activities performed to identify concepts and aspects of published information and research reports.
Specifications and instructions applied to the software.
A bibliographic database that includes MEDLINE as its primary subset. It is produced by the National Center for Biotechnology Information (NCBI), part of the NATIONAL LIBRARY OF MEDICINE. PubMed, which is searchable through NLM's Web site, also includes access to additional citations to selected life sciences journals not in MEDLINE, and links to other resources such as the full-text of articles at participating publishers' Web sites, NCBI's molecular biology databases, and PubMed Central.
A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences.
Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.
Biological molecules that possess catalytic activity. They may occur naturally or be synthetically created. Enzymes are usually proteins, however CATALYTIC RNA and CATALYTIC DNA molecules have also been identified.
Complex sets of enzymatic reactions connected to each other via their product and substrate metabolites.
Genes bearing close resemblance to known genes at different loci, but rendered non-functional by additions or deletions in structure that prevent normal transcription or translation. When lacking introns and containing a poly-A segment near the downstream end (as a result of reverse copying from processed nuclear RNA into double-stranded DNA), they are called processed genes.
A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes. Examples of multigene families include those that encode the hemoglobins, immunoglobulins, histocompatibility antigens, actins, tubulins, keratins, collagens, heat shock proteins, salivary glue proteins, chorion proteins, cuticle proteins, yolk proteins, and phaseolins, as well as histones, ribosomal RNA, and transfer RNA genes. The latter three are examples of reiterated genes, where hundreds of identical genes are present in a tandem array. (King & Stanfield, A Dictionary of Genetics, 4th ed)
The complete genetic complement contained in a set of CHROMOSOMES in a protozoan.
A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often composed of conserved sequences.
Controlled operation of an apparatus, process, or system by mechanical or electronic devices that take the place of human organs of observation, effort, and decision. (From Webster's Collegiate Dictionary, 1993)
The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results.
Biological activities and function of the whole organism in human, animal, microorgansims, and plants, and of the biosphere.
The functional hereditary units of PLANTS.
Annual cereal grass of the family POACEAE and its edible starchy grain, rice, which is the staple food of roughly one-half of the world's population.
The parts of the messenger RNA sequence that do not code for product, i.e. the 5' UNTRANSLATED REGIONS and 3' UNTRANSLATED REGIONS.
A definite pathologic process with a characteristic set of signs and symptoms. It may affect the whole body or any of its parts, and its etiology, pathology, and prognosis may be known or unknown.
The genetic complement of a helminth (HELMINTHS) as represented in its DNA.
Cells lacking a nuclear membrane so that the nuclear material is either scattered in the cytoplasm or collected in a nucleoid region.
Application of statistical procedures to analyze specific observed or assumed facts from a particular study.
The premier bibliographic database of the NATIONAL LIBRARY OF MEDICINE. MEDLINE® (MEDLARS Online) is the primary subset of PUBMED and can be searched on NLM's Web site in PubMed or the NLM Gateway. MEDLINE references are indexed with MEDICAL SUBJECT HEADINGS (MeSH).
Text editing and storage functions using computer software.
The presence of two or more genetic loci on the same chromosome. Extensions of this original definition refer to the similarity in content and organization between chromosomes, of different species for example.
Social media model for enabling public involvement and recruitment in participation. Use of social media to collect feedback and recruit volunteer subjects.
Interacting DNA-encoded regulatory subsystems in the GENOME that coordinate input from activator and repressor TRANSCRIPTION FACTORS during development, cell differentiation, or in response to environmental cues. The networks function to ultimately specify expression of particular sets of GENES for specific conditions, times, or locations.
The degree of 3-dimensional shape similarity between proteins. It can be an indication of distant AMINO ACID SEQUENCE HOMOLOGY and used for rational DRUG DESIGN.
Single-stranded complementary DNA synthesized from an RNA template by the action of RNA-dependent DNA polymerase. cDNA (i.e., complementary DNA, not circular DNA, not C-DNA) is used in a variety of molecular cloning experiments as well as serving as a specific hybridization probe.
Computerized compilations of information units (text, sound, graphics, and/or video) interconnected by logical nonlinear linkages that enable users to follow optimal paths through the material and also the systems used to create and display this information. (From Thesaurus of ERIC Descriptors, 1994)
The genetic complement of an insect (INSECTS) as represented in its DNA.
The genomic analysis of assemblages of organisms.
A process whereby multiple RNA transcripts are generated from a single gene. Alternative splicing involves the splicing together of other possible sets of EXONS during the processing of some, but not all, transcripts of the gene. Thus a particular exon may be connected to any one of several alternative exons to form a mature RNA. The alternative forms of mature MESSENGER RNA produce PROTEIN ISOFORMS in which one part of the isoforms is common while the other parts are different.
Structured vocabularies describing concepts from the fields of biology and relationships between concepts.
Systems where the input data enter the computer directly from the point of origin (usually a terminal or workstation) and/or in which output data are transmitted directly to that terminal point of origin. (Sippl, Computer Dictionary, 4th ed)
Description of pattern of recurrent functions or procedures frequently found in organizational processes, such as notification, decision, and action.
The level of protein structure in which combinations of secondary protein structures (alpha helices, beta sheets, loop regions, and motifs) pack together to form folded shapes called domains. Disulfide bridges between cysteines in two different parts of the polypeptide chain along with other interactions between the chains play a role in the formation and stabilization of tertiary structure. Small proteins usually consist of only one domain but larger proteins may contain a number of domains connected by segments of polypeptide chain which lack regular secondary structure.
Graphs representing sets of measurable, non-covalent physical contacts with specific PROTEINS in living organisms or in cells.
The systematic arrangement of entities in any field into categories classes based on common characteristics such as properties, morphology, subject matter, etc.
The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species.
RNA which does not code for protein but has some enzymatic, structural or regulatory function. Although ribosomal RNA (RNA, RIBOSOMAL) and transfer RNA (RNA, TRANSFER) are also untranslated RNAs they are not included in this scope.