Human Genome Project
The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs.
The systematic study of the complete DNA sequences (GENOME) of organisms.
The theory that human CHARACTER and BEHAVIOR are shaped by the GENES that comprise the individual's GENOTYPE rather than by CULTURE; ENVIRONMENT; and individual choice.
Sequence Analysis, DNA
Genetic Diseases, Inborn
Diseases that are caused by genetic mutations present during embryo or fetal development, although they may be observed later in life. The mutations may be inherited from a parent's genome or they may be acquired in utero.
A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets.
Sequence Tagged Sites
Short tracts of DNA sequence that are used as landmarks in GENOME mapping. In most instances, 200 to 500 base pairs of sequence define a Sequence Tagged Site (STS) that is operationally unique in the human genome (i.e., can be specifically detected by the polymerase chain reaction in the presence of all other genomic sequences). The overwhelming advantage of STSs over mapping landmarks defined in other ways is that the means of testing for the presence of a particular STS can be completely described as information in a database.
Molecular Sequence Data
Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.
Extensive collections, reputedly complete, of facts and data garnered from material of a specialized subject area and made available for analysis and application. The collection can be automated by various contemporary methods for retrieval. The concept should be differentiated from DATABASES, BIBLIOGRAPHIC which is restricted to collections of bibliographic references.
A coordinated international effort to identify and catalog patterns of linked variations (HAPLOTYPES) found in the human genome across the entire human population.
Genetic Predisposition to Disease
Databases, Nucleic Acid
Databases containing information about NUCLEIC ACIDS such as BASE SEQUENCE; SNPS; NUCLEIC ACID CONFORMATION; and other properties. Information about the DNA fragments kept in a GENE LIBRARY or GENOMIC LIBRARY is often maintained in DNA databases.
Polymorphism, Single Nucleotide
Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.
The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.
Expressed Sequence Tags
Physical Chromosome Mapping
Mapping of the linear order of genes on a chromosome with units indicating their distances by using methods other than genetic recombination. These methods include nucleotide sequencing, overlapping deletions in polytene chromosomes, and electron micrography of heteroduplex DNA. (From King & Stansfield, A Dictionary of Genetics, 5th ed)
The Alu sequence family (named for the restriction endonuclease cleavage enzyme Alu I) is the most highly repeated interspersed repeat element in humans (over a million copies). It is derived from the 7SL RNA component of the SIGNAL RECOGNITION PARTICLE and contains an RNA polymerase III promoter. Transposition of this element into coding and regulatory regions of genes is responsible for many heritable diseases.
Genes bearing close resemblance to known genes at different loci, but rendered non-functional by additions or deletions in structure that prevent normal transcription or translation. When lacking introns and containing a poly-A segment near the downstream end (as a result of reverse copying from processed nuclear RNA into double-stranded DNA), they are called processed genes.
Chromosomes, Artificial, Bacterial
Genomic Structural Variation
High-Throughput Nucleotide Sequencing
Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc.
A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes. Examples of multigene families include those that encode the hemoglobins, immunoglobulins, histocompatibility antigens, actins, tubulins, keratins, collagens, heat shock proteins, salivary glue proteins, chorion proteins, cuticle proteins, yolk proteins, and phaseolins, as well as histones, ribosomal RNA, and transfer RNA genes. The latter three are examples of reiterated genes, where hundreds of identical genes are present in a tandem array. (King & Stanfield, A Dictionary of Genetics, 4th ed)
Molecular Sequence Annotation
The addition of descriptive information about the function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA record.