High-Throughput Nucleotide Sequencing: Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced. The sequencing may be done by analysis of the synthesis or ligation products, hybridization to preexisting sequences, etc.High-Throughput Screening Assays: Rapid methods of measuring the effects of an agent in a biological or chemical assay. The assay usually involves some form of automation or a way to conduct multiple assays at the same time using sample arrays.Molecular Sequence Data: Descriptions of specific amino acid, carbohydrate, or nucleotide sequences which have appeared in the published literature and/or are deposited in and maintained by databanks such as GENBANK, European Molecular Biology Laboratory (EMBL), National Biomedical Research Foundation (NBRF), or other sequence repositories.Base Sequence: The sequence of PURINES and PYRIMIDINES in nucleic acids and polynucleotides. It is also called nucleotide sequence.Sequence Analysis, DNA: A multistage process that includes cloning, physical mapping, subcloning, determination of the DNA SEQUENCE, and information analysis.Amino Acid Sequence: The order of amino acids as they occur in a polypeptide chain. This is referred to as the primary structure of proteins. It is of fundamental importance in determining PROTEIN CONFORMATION.Cloning, Molecular: The insertion of recombinant DNA molecules from prokaryotic and/or eukaryotic sources into a replicating vehicle, such as a plasmid or virus vector, and the introduction of the resultant hybrid molecules into recipient cells without altering the viability of those cells.Nucleotides: The monomeric units from which DNA or RNA polymers are constructed. They consist of a purine or pyrimidine base, a pentose sugar, and a phosphate group. (From King & Stansfield, A Dictionary of Genetics, 4th ed)Polymerase Chain Reaction: In vitro method for producing large amounts of specific DNA or RNA fragments of defined length and sequence from small amounts of short oligonucleotide flanking sequences (primers). The essential steps include thermal denaturation of the double-stranded target molecules, annealing of the primers to their complementary sequences, and extension of the annealed primers by enzymatic synthesis with DNA polymerase. The reaction is efficient, specific, and extremely sensitive. Uses for the reaction include disease diagnosis, detection of difficult-to-isolate pathogens, mutation analysis, genetic testing, DNA sequencing, and analyzing evolutionary relationships.Genes, Bacterial: The functional hereditary units of BACTERIA.Phylogeny: The relationships of groups of organisms as reflected by their genetic makeup.Sequence Homology, Nucleic Acid: The sequential correspondence of nucleotides in one nucleic acid molecule with those of another nucleic acid molecule. Sequence homology is an indication of the genetic relatedness of different organisms and gene function.Restriction Mapping: Use of restriction endonucleases to analyze and generate a physical map of genomes, genes, or other segments of DNA.Mutation: Any detectable and heritable change in the genetic material that causes a change in the GENOTYPE and which is transmitted to daughter cells and to succeeding generations.DNA, Bacterial: Deoxyribonucleic acid that makes up the genetic material of bacteria.Escherichia coli: A species of gram-negative, facultatively anaerobic, rod-shaped bacteria (GRAM-NEGATIVE FACULTATIVELY ANAEROBIC RODS) commonly found in the lower part of the intestine of warm-blooded animals. It is usually nonpathogenic, but some strains are known to produce DIARRHEA and pyogenic infections. Pathogenic strains (virotypes) are classified by their specific pathogenic mechanisms such as toxins (ENTEROTOXIGENIC ESCHERICHIA COLI), etc.DNA: A deoxyribonucleotide polymer that is the primary genetic material of all cells. Eukaryotic and prokaryotic organisms normally contain DNA in a double-stranded state, yet several important biological processes transiently involve single-stranded regions. DNA, which consists of a polysugar-phosphate backbone possessing projections of purines (adenine and guanine) and pyrimidines (thymine and cytosine), forms a double helix that is held together by hydrogen bonds between these purines and pyrimidines (adenine to thymine and guanine to cytosine).Polymorphism, Single Nucleotide: A single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population.Sequence Homology, Amino Acid: The degree of similarity between sequences of amino acids. This information is useful for the analyzing genetic relatedness of proteins and species.Genotype: The genetic constitution of the individual, comprising the ALLELES present at each GENETIC LOCUS.Sequence Alignment: The arrangement of two or more amino acid or base sequences from an organism or organisms in such a way as to align areas of the sequences sharing common properties. The degree of relatedness or homology between the sequences is predicted computationally or statistically based on weights assigned to the elements aligned between the sequences. This in turn can serve as a potential indicator of the genetic relatedness between the organisms.Plasmids: Extrachromosomal, usually CIRCULAR DNA molecules that are self-replicating and transferable from one organism to another. They are found in a variety of bacterial, archaeal, fungal, algal, and plant species. They are used in GENETIC ENGINEERING as CLONING VECTORS.Genetic Variation: Genotypic differences observed among individuals in a population.DNA Primers: Short sequences (generally about 10 base pairs) of DNA that are complementary to sequences of messenger RNA and allow reverse transcriptases to start copying the adjacent sequences of mRNA. Primers are used extensively in genetic and molecular biology techniques.Open Reading Frames: A sequence of successive nucleotide triplets that are read as CODONS specifying AMINO ACIDS and begin with an INITIATOR CODON and end with a stop codon (CODON, TERMINATOR).Gene Library: A large collection of DNA fragments cloned (CLONING, MOLECULAR) from a given organism, tissue, organ, or cell type. It may contain complete genomic sequences (GENOMIC LIBRARY) or complementary DNA sequences, the latter being formed from messenger RNA and lacking intron sequences.Bacterial Proteins: Proteins found in any species of bacterium.Blotting, Southern: A method (first developed by E.M. Southern) for detection of DNA that has been electrophoretically separated and immobilized by blotting on nitrocellulose or other type of paper or nylon membrane followed by hybridization with labeled NUCLEIC ACID PROBES.Genes: A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms.DNA Restriction Enzymes: Enzymes that are part of the restriction-modification systems. They catalyze the endonucleolytic cleavage of DNA sequences which lack the species-specific methylation pattern in the host cell's DNA. Cleavage yields random or specific double-stranded fragments with terminal 5'-phosphates. The function of restriction enzymes is to destroy any foreign DNA that invades the host cell. Most have been studied in bacterial systems, but a few have been found in eukaryotic organisms. They are also used as tools for the systematic dissection and mapping of chromosomes, in the determination of base sequences of DNAs, and have made it possible to splice and recombine genes from one organism into the genome of another. EC 3.21.1.Exons: The parts of a transcript of a split GENE remaining after the INTRONS are removed. They are spliced together to become a MESSENGER RNA or other functional RNA.RNA, Messenger: RNA sequences that serve as templates for protein synthesis. Bacterial mRNAs are generally primary transcripts in that they do not require post-transcriptional processing. Eukaryotic mRNA is synthesized in the nucleus and must be exported to the cytoplasm for translation. Most eukaryotic mRNAs have a sequence of polyadenylic acid at the 3' end, referred to as the poly(A) tail. The function of this tail is not known for certain, but it may play a role in the export of mature mRNA from the nucleus as well as in helping stabilize some mRNA molecules by retarding their degradation in the cytoplasm.Chromosome Mapping: Any method used for determining the location of and relative distances between genes on a chromosome.Genomics: The systematic study of the complete DNA sequences (GENOME) of organisms.DNA Mutational Analysis: Biochemical identification of mutational changes in a nucleotide sequence.Nucleic Acid Hybridization: Widely used technique which exploits the ability of complementary sequences in single-stranded DNAs or RNAs to pair with each other to form a double helix. Hybridization can take place between two complimentary DNA sequences, between a single-stranded DNA and a complementary RNA, or between two RNA sequences. The technique is used to detect and isolate specific sequences, measure homology, or define other characteristics of one or both strands. (Kendrew, Encyclopedia of Molecular Biology, 1994, p503)DNA, Viral: Deoxyribonucleic acid that makes up the genetic material of viruses.Software: Sequential operating programs and data which instruct the functioning of a digital computer.Sequence Analysis, RNA: A multistage process that includes cloning, physical mapping, subcloning, sequencing, and information analysis of an RNA SEQUENCE.Transcription, Genetic: The biosynthesis of RNA carried out on a template of DNA. The biosynthesis of DNA from an RNA template is called REVERSE TRANSCRIPTION.RNA, Viral: Ribonucleic acid that makes up the genetic material of viruses.Adenine NucleotidesSmall Molecule Libraries: Large collections of small molecules (molecular weight about 600 or less), of similar or diverse nature which are used for high-throughput screening analysis of the gene function, protein interaction, cellular processing, biochemical pathways, or other chemical interactions.DNA, Complementary: Single-stranded complementary DNA synthesized from an RNA template by the action of RNA-dependent DNA polymerase. cDNA (i.e., complementary DNA, not circular DNA, not C-DNA) is used in a variety of molecular cloning experiments as well as serving as a specific hybridization probe.Species Specificity: The restriction of a characteristic behavior, anatomical structure or physical system, such as immune response; metabolic response, or gene or gene variant to the members of one species. It refers to that property which differentiates one species from another but it is also used for phylogenetic levels higher or lower than the species.Computational Biology: A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories for solving biological problems including manipulation of models and datasets.Codon: A set of three nucleotides in a protein coding sequence that specifies individual amino acids or a termination signal (CODON, TERMINATOR). Most codons are universal, but some organisms do not produce the transfer RNAs (RNA, TRANSFER) complementary to all codons. These codons are referred to as unassigned codons (CODONS, NONSENSE).DNA Transposable Elements: Discrete segments of DNA which can excise and reintegrate to another site in the genome. Most are inactive, i.e., have not been found to exist outside the integrated state. DNA transposable elements include bacterial IS (insertion sequence) elements, Tn elements, the maize controlling elements Ac and Ds, Drosophila P, gypsy, and pogo elements, the human Tigger elements and the Tc and mariner elements which are found throughout the animal kingdom.Genome, Viral: The complete genetic complement contained in a DNA or RNA molecule in a virus.Guanine NucleotidesDrug Evaluation, Preclinical: Preclinical testing of drugs in experimental animals or in vitro for their biological and toxic effects and potential clinical applications.Algorithms: A procedure consisting of a sequence of algebraic formulas and/or logical steps to calculate or determine a given task.Gene Expression Profiling: The determination of the pattern of genes expressed at the level of GENETIC TRANSCRIPTION, under specific circumstances or in a specific cell.Haloarcula: A genus of HALOBACTERIACEAE distinguished from other genera in the family by the presence of specific derivatives of TGD-2 polar lipids. Haloarcula are found in neutral saline environments such as salt lakes, marine salterns, and saline soils.Alleles: Variant forms of the same gene, occupying the same locus on homologous CHROMOSOMES, and governing the variants in production of the same gene product.Genes, Viral: The functional hereditary units of VIRUSES.Reverse Transcriptase Polymerase Chain Reaction: A variation of the PCR technique in which cDNA is made from RNA via reverse transcription. The resultant cDNA is then amplified using standard PCR protocols.Sequence Homology: The degree of similarity between sequences. Studies of AMINO ACID SEQUENCE HOMOLOGY and NUCLEIC ACID SEQUENCE HOMOLOGY provide useful information about the genetic relatedness of genes, gene products, and species.Phenotype: The outward appearance of the individual. It is the product of interactions between genes, and between the GENOTYPE and the environment.Nucleic Acid Conformation: The spatial arrangement of the atoms of a nucleic acid or polynucleotide that results in its characteristic 3-dimensional shape.Cell Line: Established cell cultures that have the potential to propagate indefinitely.Polymorphism, Single-Stranded Conformational: Variation in a population's DNA sequence that is detected by determining alterations in the conformation of denatured DNA fragments. Denatured DNA fragments are allowed to renature under conditions that prevent the formation of double-stranded DNA and allow secondary structure to form in single stranded fragments. These fragments are then run through polyacrylamide gels to detect variations in the secondary structure that is manifested as an alteration in migration through the gels.Genomic Library: A form of GENE LIBRARY containing the complete DNA sequences present in the genome of a given organism. It contrasts with a cDNA library which contains only sequences utilized in protein coding (lacking introns).Polymorphism, Restriction Fragment Length: Variation occurring within a species in the presence or length of DNA fragment generated by a specific endonuclease at a specific site in the genome. Such variations are generated by mutations that create or abolish recognition sites for these enzymes or change the length of the fragment.Purine Nucleotides: Purines attached to a RIBOSE and a phosphate that can polymerize to form DNA and RNA.Reproducibility of Results: The statistical reproducibility of measurements (often in a clinical context), including the testing of instrumentation or techniques to obtain reproducible results. The concept includes reproducibility of physiological measurements, which may be used to develop rules to assess probability or prognosis, or response to a stimulus; reproducibility of occurrence of a condition; and reproducibility of experimental results.Operon: In bacteria, a group of metabolically related genes, with a common promoter, whose transcription into a single polycistronic MESSENGER RNA is under the control of an OPERATOR REGION.Sequence Analysis: A multistage process that includes the determination of a sequence (protein, carbohydrate, etc.), its fragmentation and analysis, and the interpretation of the resulting sequence information.Repetitive Sequences, Nucleic Acid: Sequences of DNA or RNA that occur in multiple copies. There are several types: INTERSPERSED REPETITIVE SEQUENCES are copies of transposable elements (DNA TRANSPOSABLE ELEMENTS or RETROELEMENTS) dispersed throughout the genome. TERMINAL REPEAT SEQUENCES flank both ends of another sequence, for example, the long terminal repeats (LTRs) on RETROVIRUSES. Variations may be direct repeats, those occurring in the same direction, or inverted repeats, those opposite to each other in direction. TANDEM REPEAT SEQUENCES are copies which lie adjacent to each other, direct or inverted (INVERTED REPEAT SEQUENCES).Exome: That part of the genome that corresponds to the complete complement of EXONS of an organism or cell.Genome, Human: The complete genetic complement contained in the DNA of a set of CHROMOSOMES in a HUMAN. The length of the human genome is about 3 billion base pairs.Oligonucleotide Array Sequence Analysis: Hybridization of a nucleic acid sample to a very large set of OLIGONUCLEOTIDE PROBES, which have been attached individually in columns and rows to a solid support, to determine a BASE SEQUENCE, or to detect variations in a gene sequence, GENE EXPRESSION, or for GENE MAPPING.Point Mutation: A mutation caused by the substitution of one nucleotide for another. This results in the DNA molecule having a change in a single base pair.Introns: Sequences of DNA in the genes that are located between the EXONS. They are transcribed along with the exons but are removed from the primary gene transcript by RNA SPLICING to leave mature RNA. Some introns code for separate genes.RNA: A polynucleotide consisting essentially of chains with a repeating backbone of phosphate and ribose units to which nitrogenous bases are attached. RNA is unique among biological macromolecules in that it can encode genetic information, serve as an abundant structural component of cells, and also possesses catalytic activity. (Rieger et al., Glossary of Genetics: Classical and Molecular, 5th ed)Recombination, Genetic: Production of new arrangements of DNA by various mechanisms such as assortment and segregation, CROSSING OVER; GENE CONVERSION; GENETIC TRANSFORMATION; GENETIC CONJUGATION; GENETIC TRANSDUCTION; or mixed infection of viruses.Promoter Regions, Genetic: DNA sequences which are recognized (directly or indirectly) and bound by a DNA-dependent RNA polymerase during the initiation of transcription. Highly conserved sequences within the promoter include the Pribnow box in bacteria and the TATA BOX in eukaryotes.Oligonucleotide Probes: Synthetic or natural oligonucleotides used in hybridization studies in order to identify and study specific nucleic acid fragments, e.g., DNA segments near or within a specific gene locus or gene. The probe hybridizes with a specific mRNA, if present. Conventional techniques used for testing for the hybridization product include dot blot assays, Southern blot assays, and DNA:RNA hybrid-specific antibody tests. Conventional labels for the probe include the radioisotope labels 32P and 125I and the chemical label biotin.Databases, Genetic: Databases devoted to knowledge about specific genes and gene products.Evolution, Molecular: The process of cumulative change at the level of DNA; RNA; and PROTEINS, over successive generations.Genome: The genetic complement of an organism, including all of its GENES, as represented in its DNA, or in some cases, its RNA.Automation: Controlled operation of an apparatus, process, or system by mechanical or electronic devices that take the place of human organs of observation, effort, and decision. (From Webster's Collegiate Dictionary, 1993)Gene Expression: The phenotypic manifestation of a gene or genes by the processes of GENETIC TRANSCRIPTION and GENETIC TRANSLATION.Multigene Family: A set of genes descended by duplication and variation from some ancestral gene. Such genes may be clustered together on the same chromosome or dispersed on different chromosomes. Examples of multigene families include those that encode the hemoglobins, immunoglobulins, histocompatibility antigens, actins, tubulins, keratins, collagens, heat shock proteins, salivary glue proteins, chorion proteins, cuticle proteins, yolk proteins, and phaseolins, as well as histones, ribosomal RNA, and transfer RNA genes. The latter three are examples of reiterated genes, where hundreds of identical genes are present in a tandem array. (King & Stanfield, A Dictionary of Genetics, 4th ed)Recombinant Proteins: Proteins prepared by recombinant DNA technology.Gene Expression Regulation, Bacterial: Any of the processes by which cytoplasmic or intercellular factors influence the differential control of gene action in bacteria.Genetic Complementation Test: A test used to determine whether or not complementation (compensation in the form of dominance) will occur in a cell with a given mutant phenotype when another mutant genome, encoding the same mutant phenotype, is introduced into that cell.Polymorphism, Genetic: The regular and simultaneous occurrence in a single interbreeding population of two or more discontinuous genotypes. The concept includes differences in genotypes ranging in size from a single nucleotide site (POLYMORPHISM, SINGLE NUCLEOTIDE) to large nucleotide sequences visible at a chromosomal level.Molecular Sequence Annotation: The addition of descriptive information about the function or structure of a molecular sequence to its MOLECULAR SEQUENCE DATA record.Transcriptome: The pattern of GENE EXPRESSION at the level of genetic transcription in a specific organism or under specific circumstances in specific cells.Sensitivity and Specificity: Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. (From Last, Dictionary of Epidemiology, 2d ed)Heteroduplex Analysis: A method of detecting gene mutation by mixing PCR-amplified mutant and wild-type DNA followed by denaturation and reannealing. The resultant products are resolved by gel electrophoresis, with single base substitutions detectable under optimal electrophoretic conditions and gel formulations. Large base pair mismatches may also be analyzed by using electron microscopy to visualize heteroduplex regions.Blotting, Northern: Detection of RNA that has been electrophoretically separated and immobilized by blotting on nitrocellulose or other type of paper or nylon membrane followed by hybridization with labeled NUCLEIC ACID PROBES.Mutagenesis, Insertional: Mutagenesis where the mutation is caused by the introduction of foreign DNA sequences into a gene or extragenic sequence. This may occur spontaneously in vivo or be experimentally induced in vivo or in vitro. Proviral DNA insertions into or adjacent to a cellular proto-oncogene can interrupt GENETIC TRANSLATION of the coding sequences or interfere with recognition of regulatory elements and cause unregulated expression of the proto-oncogene resulting in tumor formation.Binding Sites: The parts of a macromolecule that directly participate in its specific combination with another molecule.Pedigree: The record of descent or ancestry, particularly of a particular condition or trait, indicating individual family members, their relationships, and their status with respect to the trait or condition.Molecular Epidemiology: The application of molecular biology to the answering of epidemiological questions. The examination of patterns of changes in DNA to implicate particular carcinogens and the use of molecular markers to predict which individuals are at highest risk for a disease are common examples.RNA Splicing: The ultimate exclusion of nonsense sequences or intervening sequences (introns) before the final RNA transcript is sent to the cytoplasm.Cluster Analysis: A set of statistical methods used to group variables or observations into strongly inter-related subgroups. In epidemiology, it may be used to analyze a closely grouped series of events or cases of disease or other health-related phenomenon with well-defined distribution patterns in relation to time or place or both.Cattle: Domesticated bovine animals of the genus Bos, usually kept on a farm or ranch and used for the production of meat or dairy products or for heavy labor.Viral Proteins: Proteins found in any species of virus.Electrophoresis, Polyacrylamide Gel: Electrophoresis in which a polyacrylamide gel is used as the diffusion medium.Gene Expression Regulation: Any of the processes by which nuclear, cytoplasmic, or intercellular factors influence the differential control (induction or repression) of gene action at the level of transcription or translation.RNA, Bacterial: Ribonucleic acid in bacteria having regulatory and catalytic roles as well as involvement in protein synthesis.Saccharomyces cerevisiae: A species of the genus SACCHAROMYCES, family Saccharomycetaceae, order Saccharomycetales, known as "baker's" or "brewer's" yeast. The dried form is used as a dietary supplement.Nucleotides, CyclicMutation, Missense: A mutation in which a codon is mutated to one directing the incorporation of a different amino acid. This substitution may result in an inactive or unstable product. (From A Dictionary of Genetics, King & Stansfield, 5th ed)Substrate Specificity: A characteristic feature of enzyme activity in relation to the kind of substrate on which the enzyme or catalytic molecule reacts.Guanine Nucleotide Exchange Factors: Protein factors that promote the exchange of GTP for GDP bound to GTP-BINDING PROTEINS.Molecular Weight: The sum of the weight of all the atoms in a molecule.Cosmids: Plasmids containing at least one cos (cohesive-end site) of PHAGE LAMBDA. They are used as cloning vehicles.Models, Genetic: Theoretical representations that simulate the behavior or activity of genetic processes or phenomena. They include the use of mathematical equations, computers, and other electronic equipment.Escherichia coli Proteins: Proteins obtained from ESCHERICHIA COLI.Disease Outbreaks: Sudden increase in the incidence of a disease. The concept includes EPIDEMICS and PANDEMICS.Drug Discovery: The process of finding chemicals for potential therapeutic use.Protein Biosynthesis: The biosynthesis of PEPTIDES and PROTEINS on RIBOSOMES, directed by MESSENGER RNA, via TRANSFER RNA that is charged with standard proteinogenic AMINO ACIDS.DNA, Ribosomal: DNA sequences encoding RIBOSOMAL RNA and the segments of DNA separating the individual ribosomal RNA genes, referred to as RIBOSOMAL SPACER DNA.Protein Binding: The process in which substances, either endogenous or exogenous, bind to proteins, peptides, enzymes, protein precursors, or allied compounds. Specific protein-binding measures are often used as assays in diagnostic assessments.Proteomics: The systematic study of the complete complement of proteins (PROTEOME) of organisms.Proteins: Linear POLYPEPTIDES that are synthesized on RIBOSOMES and may be further modified, crosslinked, cleaved, or assembled into complex proteins with several subunits. The specific sequence of AMINO ACIDS determines the shape the polypeptide will take, during PROTEIN FOLDING, and the function of the protein.Transcription Factors: Endogenous substances, usually proteins, which are effective in the initiation, stimulation, or termination of the genetic transcription process.Pyrimidine Nucleotides: Pyrimidines with a RIBOSE and phosphate attached that can polymerize to form DNA and RNA.Electrophoresis, Agar Gel: Electrophoresis in which agar or agarose gel is used as the diffusion medium.DNA, Fungal: Deoxyribonucleic acid that makes up the genetic material of fungi.RNA, Ribosomal, 16S: Constituent of 30S subunit prokaryotic ribosomes containing 1600 nucleotides and 21 proteins. 16S rRNA is involved in initiation of polypeptide synthesis.Genes, Fungal: The functional hereditary units of FUNGI.Kinetics: The rate dynamics in chemical or physical systems.Gastroenteritis: INFLAMMATION of any segment of the GASTROINTESTINAL TRACT from ESOPHAGUS to RECTUM. Causes of gastroenteritis are many including genetic, infection, HYPERSENSITIVITY, drug effects, and CANCER.beta-Lactamases: Enzymes found in many bacteria which catalyze the hydrolysis of the amide bond in the beta-lactam ring. Well known antibiotics destroyed by these enzymes are penicillins and cephalosporins.Streptomyces griseus: An actinomycete from which the antibiotics STREPTOMYCIN, grisein, and CANDICIDIN are obtained.DNA-Binding Proteins: Proteins which bind to DNA. The family includes proteins which bind to both double- and single-stranded DNA and also includes specific DNA binding proteins in serum which can be used as markers for malignant diseases.Sequence Deletion: Deletion of sequences of nucleic acids from the genetic material of an individual.Heterozygote: An individual having different alleles at one or more loci regarding a specific character.Genes, Regulator: Genes which regulate or circumscribe the activity of other genes; specifically, genes which code for PROTEINS or RNAs which have GENE EXPRESSION REGULATION functions.Genotyping Techniques: Methods used to determine individuals' specific ALLELES or SNPS (single nucleotide polymorphisms).Expressed Sequence Tags: Partial cDNA (DNA, COMPLEMENTARY) sequences that are unique to the cDNAs from which they were derived.Adenosine Triphosphate: An adenine nucleotide containing three phosphate groups esterified to the sugar moiety. In addition to its crucial roles in metabolism adenosine triphosphate is a neurotransmitter.Gene Deletion: A genetic rearrangement through loss of segments of DNA or RNA, bringing sequences which are normally separated into close proximity. This deletion may be detected using cytogenetic techniques and can also be inferred from the phenotype, indicating a deletion at one specific locus.Genome, Bacterial: The genetic complement of a BACTERIA as represented in its DNA.Combinatorial Chemistry Techniques: A technology, in which sets of reactions for solution or solid-phase synthesis, is used to create molecular libraries for analysis of compounds on a large scale.Anti-Bacterial Agents: Substances that reduce the growth or reproduction of BACTERIA.Haplotypes: The genetic constitution of individuals with respect to one member of a pair of allelic genes, or sets of genes that are closely linked and tend to be inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX.Microbial Sensitivity Tests: Any tests that demonstrate the relative efficacy of different chemotherapeutic agents against specific microorganisms (i.e., bacteria, fungi, viruses).Protein Array Analysis: Ligand-binding assays that measure protein-protein, protein-small molecule, or protein-nucleic acid interactions using a very large set of capturing molecules, i.e., those attached separately on a solid support, to measure the presence or interaction of target molecules in the sample.Internet: A loose confederation of computer communication networks around the world. The networks that make up the Internet are connected through several backbone networks. The Internet grew out of the US Government ARPAnet project and was designed to facilitate information exchange.Automation, Laboratory: Controlled operations of analytic or diagnostic processes, or systems by mechanical or electronic devices.Databases, Nucleic Acid: Databases containing information about NUCLEIC ACIDS such as BASE SEQUENCE; SNPS; NUCLEIC ACID CONFORMATION; and other properties. Information about the DNA fragments kept in a GENE LIBRARY or GENOMIC LIBRARY is often maintained in DNA databases.Genome, Plant: The genetic complement of a plant (PLANTS) as represented in its DNA.Models, Molecular: Models used experimentally or theoretically to study molecular shape, electronic properties, or interactions; includes analogous molecules, computer-generated graphics, and mechanical structures.Microfluidic Analytical Techniques: Methods utilizing the principles of MICROFLUIDICS for sample handling, reagent mixing, and separation and detection of specific components in fluids.Biological Evolution: The process of cumulative change over successive generations through which organisms acquire their distinguishing morphological and physiological characteristics.Oligodeoxyribonucleotides: A group of deoxyribonucleotides (up to 12) in which the phosphate residues of each deoxyribonucleotide act as bridges in forming diester linkages between the deoxyribose moieties.Amino Acid Substitution: The naturally occurring or experimentally induced replacement of one or more AMINO ACIDS in a protein with another. If a functionally equivalent amino acid is substituted, the protein may retain wild-type activity. Substitution may also diminish, enhance, or eliminate protein function. Experimentally induced substitution is often used to study enzyme activities and binding site properties.Single-Cell Analysis: Assaying the products of or monitoring various biochemical processes and reactions in an individual cell.Protein Interaction Mapping: Methods for determining interaction between PROTEINS.DNA-Directed DNA Polymerase: DNA-dependent DNA polymerases found in bacteria, animal and plant cells. During the replication process, these enzymes catalyze the addition of deoxyribonucleotide residues to the end of a DNA strand in the presence of DNA as template-primer. They also possess exonuclease activity and therefore function in DNA repair.Contig Mapping: Overlapping of cloned or sequenced DNA to construct a continuous region of a gene, chromosome or genome.Horse Diseases: Diseases of domestic and wild horses of the species Equus caballus.Viral Nonstructural Proteins: Proteins encoded by a VIRAL GENOME that are produced in the organisms they infect, but not packaged into the VIRUS PARTICLES. Some of these proteins may play roles within the infected cell during VIRUS REPLICATION or act in regulation of virus replication or VIRUS ASSEMBLY.MicroRNAs: Small double-stranded, non-protein coding RNAs, 21-25 nucleotides in length generated from single-stranded microRNA gene transcripts by the same RIBONUCLEASE III, Dicer, that produces small interfering RNAs (RNA, SMALL INTERFERING). They become part of the RNA-INDUCED SILENCING COMPLEX and repress the translation (TRANSLATION, GENETIC) of target RNA by binding to homologous 3'UTR region as an imperfect match. The small temporal RNAs (stRNAs), let-7 and lin-4, from C. elegans, are the first 2 miRNAs discovered, and are from a class of miRNAs involved in developmental timing.BrazilMetagenomics: The genomic analysis of assemblages of organisms.Membrane Proteins: Proteins which are found in membranes including cellular and intracellular membranes. They consist of two types, peripheral and integral proteins. They include most membrane-associated enzymes, antigenic proteins, transport proteins, and drug, hormone, and lectin receptors.Capsid: The outer protein protective shell of a virus, which protects the viral nucleic acid.Mass Spectrometry: An analytical method used in determining the identity of a chemical based on its mass using mass analyzers/mass spectrometers.Temperature: The property of objects that determines the direction of heat flow when they are placed in direct thermal contact. The temperature is the energy of microscopic motions (vibrational and translational) of the particles of atoms.Pseudomonas: A genus of gram-negative, aerobic, rod-shaped bacteria widely distributed in nature. Some species are pathogenic for humans, animals, and plants.Drug Resistance, Bacterial: The ability of bacteria to resist or to become tolerant to chemotherapeutic agents, antimicrobial agents, or antibiotics. This resistance may be acquired through gene mutation or foreign DNA in transmissible plasmids (R FACTORS).Bacterial Outer Membrane Proteins: Proteins isolated from the outer membrane of Gram-negative bacteria.INDEL Mutation: A mutation named with the blend of insertion and deletion. It refers to a length difference between two ALLELES where it is unknowable if the difference was originally caused by a SEQUENCE INSERTION or by a SEQUENCE DELETION. If the number of nucleotides in the insertion/deletion is not divisible by three, and it occurs in a protein coding region, it is also a FRAMESHIFT MUTATION.Genetic Vectors: DNA molecules capable of autonomous replication within a host cell and into which other DNA sequences can be inserted and thus amplified. Many are derived from PLASMIDS; BACTERIOPHAGES; or VIRUSES. They are used for transporting foreign genes into recipient cells. Genetic vectors possess a functional replicator site and contain GENETIC MARKERS to facilitate their selective recognition.Carrier Proteins: Transport proteins that carry specific substances in the blood or across cell membranes.Gene Frequency: The proportion of one particular in the total of all ALLELES for one genetic locus in a breeding POPULATION.Rotavirus: A genus of REOVIRIDAE, causing acute gastroenteritis in BIRDS and MAMMALS, including humans. Transmission is horizontal and by environmental contamination. Seven species (Rotaviruses A thru G) are recognized.Bacteria: One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal. Bacteria can be classified by their response to OXYGEN: aerobic, anaerobic, or facultatively anaerobic; by the mode by which they obtain their energy: chemotrophy (via chemical reaction) or PHOTOTROPHY (via light reaction); for chemotrophs by their source of chemical energy: CHEMOLITHOTROPHY (from inorganic compounds) or chemoorganotrophy (from organic compounds); and by their source for CARBON; NITROGEN; etc.; HETEROTROPHY (from organic sources) or AUTOTROPHY (from CARBON DIOXIDE). They can also be classified by whether or not they stain (based on the structure of their CELL WALLS) with CRYSTAL VIOLET dye: gram-negative or gram-positive.Base Composition: The relative amounts of the PURINES and PYRIMIDINES in a nucleic acid.DNA, Neoplasm: DNA present in neoplastic tissue.Computer Simulation: Computer-based representation of physical systems and phenomena such as chemical processes.Virulence: The degree of pathogenicity within a group or species of microorganisms or viruses as indicated by case fatality rates and/or the ability of the organism to invade the tissues of the host. The pathogenic capacity of an organism is determined by its VIRULENCE FACTORS.User-Computer Interface: The portion of an interactive computer program that issues messages to and receives commands from a user.Feces: Excrement from the INTESTINES, containing unabsorbed solids, waste products, secretions, and BACTERIA of the DIGESTIVE SYSTEM.Capsid Proteins: Proteins that form the CAPSID of VIRUSES.Genetic Predisposition to Disease: A latent susceptibility to disease at the genetic level, which may be activated under certain conditions.Chickens: Common name for the species Gallus gallus, the domestic fowl, in the family Phasianidae, order GALLIFORMES. It is descended from the red jungle fowl of SOUTHEAST ASIA.Chromosomes, Bacterial: Structures within the nucleus of bacterial cells consisting of or containing DNA, which carry genetic information essential to the cell.Genome-Wide Association Study: An analysis comparing the allele frequencies of all available (or a whole GENOME representative set of) polymorphic markers in unrelated patients with a specific symptom or disease condition, and those of healthy controls to identify markers associated with a specific disease or condition.Bacillus subtilis: A species of gram-positive bacteria that is a common soil and water saprophyte.Genetic Markers: A phenotypically recognizable genetic trait which can be used to identify a genetic locus, a linkage group, or a recombination event.Oligonucleotides: Polymers made up of a few (2-20) nucleotides. In molecular genetics, they refer to a short sequence synthesized to match a region where a mutation is known to occur, and then used as a probe (OLIGONUCLEOTIDE PROBES). (Dorland, 28th ed)Transfection: The uptake of naked or purified DNA by CELLS, usually meaning the process as it occurs in eukaryotic cells. It is analogous to bacterial transformation (TRANSFORMATION, BACTERIAL) and both are routinely employed in GENE TRANSFER TECHNIQUES.Bacillus: A genus of BACILLACEAE that are spore-forming, rod-shaped cells. Most species are saprophytic soil forms with only a few species being pathogenic.Transformation, Genetic: Change brought about to an organisms genetic composition by unidirectional transfer (TRANSFECTION; TRANSDUCTION, GENETIC; CONJUGATION, GENETIC, etc.) and incorporation of foreign DNA into prokaryotic or eukaryotic cells by recombination of part or all of that DNA into the cell's genome.Genetic Techniques: Chromosomal, biochemical, intracellular, and other methods used in the study of genetics.Conserved Sequence: A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multiple species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often composed of conserved sequences.Streptomyces: A genus of bacteria that form a nonfragmented aerial mycelium. Many species have been identified with some being pathogenic. This genus is responsible for producing a majority of the ANTI-BACTERIAL AGENTS of practical value.Enzyme Assays: Methods used to measure the relative activity of a specific enzyme or its concentration in solution. Typically an enzyme substrate is added to a buffer solution containing enzyme and the rate of conversion of substrate to product is measured under controlled conditions. Many classical enzymatic assay methods involve the use of synthetic colorimetric substrates and measuring the reaction rates using a spectrophotometer.Sequence Analysis, Protein: A process that includes the determination of AMINO ACID SEQUENCE of a protein (or peptide, oligopeptide or peptide fragment) and the information analysis of the sequence.Metagenome: A collective genome representative of the many organisms, primarily microorganisms, existing in a community.Serotyping: Process of determining and distinguishing species of bacteria or viruses based on antigens they share.Horses: Large, hoofed mammals of the family EQUIDAE. Horses are active day and night with most of the day spent seeking and consuming food. Feeding peaks occur in the early morning and late afternoon, and there are several daily periods of rest.Sulfites: Inorganic salts of sulfurous acid.
Massive parallel sequencing: Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged in 1994-1998 and became commercially available since 2005.High throughput biologyColes PhillipsSymmetry element: A symmetry element is a point of reference about which symmetry operations can take place. In particular, symmetry elements can be centers of inversion, axes of rotation and mirror planes.DNA sequencer: A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine).Protein primary structure: The primary structure of a peptide or protein is the linear sequence of its amino acid structural units, and partly comprises its overall biomolecular structure. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end.Ligation-independent cloning: Ligation-independent cloning (LIC) is a form of molecular cloning that is able to be performed without the use of restriction endonucleases or DNA ligase. This allows genes that have restriction sites to be cloned without worry of chopping up the insert.NTP binding site: An NTP binding site is a type of binding site found in nucleoside monophosphate (NMP) kinases, N can be adenosine or guanosine. A P-loop is one of the structural motifs common for nucleoside triphosphate (NTP) binding sites, it interacts with the bound nucleotide's phosphoryl groups.Thermal cyclerBranching order of bacterial phyla (Gupta, 2001): There are several models of the Branching order of bacterial phyla, one of these was proposed in 2001 by Gupta based on conserved indels or protein, termed "protein signatures", an alternative approach to molecular phylogeny. Some problematic exceptions and conflicts are present to these conserved indels, however, they are in agreement with several groupings of classes and phyla.Silent mutation: Silent mutations are mutations in DNA that do not significantly alter the phenotype of the organism in which they occur. Silent mutations can occur in non-coding regions (outside of genes or within introns), or they may occur within exons.List of strains of Escherichia coli: Escherichia coli is a well studied bacterium that was first identified by Theodor Escherich, after whom it was later named.DNA condensation: DNA condensation refers to the process of compacting DNA molecules in vitro or in vivo. Mechanistic details of DNA packing are essential for its functioning in the process of gene regulation in living systems.WGAViewer: WGAViewer is a bioinformatics software tool which is designed to visualize, annotate, and help interpret the results generated from a genome wide association study (GWAS). Alongside the P values of association, WGAViewer allows a researcher to visualize and consider other supporting evidence, such as the genomic context of the SNP, linkage disequilibrium (LD) with ungenotyped SNPs, gene expression database, and the evidence from other GWAS projects, when determining the potential importance of an individual SNP.CS-BLASTTriparental mating: Triparental mating is a form of Bacterial conjugation where a conjugative plasmid present in one bacterial strain assists the transfer of a mobilizable plasmid present in a second bacterial strain into a third bacterial strain. Plasmids are introduced into bacteria for such purposes as transformation, cloning, or transposon mutagenesis.Genetic variation: right|thumbOpen reading frame: In molecular genetics, an open reading frame (ORF) is the part of a reading frame that has the potential to code for a protein or peptide. An ORF is a continuous stretch of codons that do not contain a stop codon (usually UAA, UAG or UGA).Library (biology): In molecular biology, a library is a collection of DNA fragments that is stored and propagated in a population of micro-organisms through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries (formed from reverse-transcribed RNA), genomic libraries (formed from genomic DNA) and randomized mutant libraries (formed by de novo gene synthesis where alternative nucleotides or codons are incorporated).Ferric uptake regulator family: In molecular biology, the ferric uptake regulator (FUR) family of proteins includes metal ion uptake regulator proteins. These are responsible for controlling the intracellular concentration of iron in many bacteria.Restriction fragment: A restriction fragment is a DNA fragment resulting from the cutting of a DNA strand by a restriction enzyme (restriction endonucleases), a process called restriction. Each restriction enzyme is highly specific, recognising a particular short DNA sequence, or restriction site, and cutting both DNA strands at specific points within this site.Alternative splicing: Alternative splicing is a regulated process during gene expression that results in a single gene coding for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene.Mature messenger RNA: Mature messenger RNA, often abbreviated as mature mRNA is a eukaryotic RNA transcript that has been spliced and processed and is ready for translation in the course of protein synthesis. Unlike the eukaryotic RNA immediately after transcription known as precursor messenger RNA, it consists exclusively of exons, with all introns removed.Chromosome regionsOntario Genomics Institute: The Ontario Genomics Institute (OGI) is a not-for-profit organization that manages cutting-edge genomics research projects and platforms.The Ontario Genomics Institute OGI also helps scientists find paths to the marketplace for their discoveries and the products to which they lead, and it works through diverse outreach and educational activities to raise awareness and facilitate informed public dialogue about genomics and its social impacts.Mac OS X Server 1.0Eukaryotic transcription: Eukaryotic transcription is the elaborate process that eukaryotic cells use to copy genetic information stored in DNA into units of RNA replica. Gene transcription occurs in both eukaryotic and prokaryotic cells.Energy charge: Energy charge is an index used to measure the energy status of biological cells. It is related to ATP, ADP and AMP concentrations.DNA-encoded chemical library: DNA-encoded chemical libraries (DEL) is a technology for the synthesis and screening of collections of small molecule compounds of unprecedented size. DEL is used in medicinal chemistry to bridge the fields of combinatorial chemistry and molecular biology.PSI Protein Classifier: PSI Protein Classifier is a program generalizing the results of both successive and independent iterations of the PSI-BLAST program. PSI Protein Classifier determines belonging of the found by PSI-BLAST proteins to the known families.Codon Adaptation Index: The Codon Adaptation Index (CAI) is the most widespread technique for analyzing Codon usage bias. As opposed to other measures of codon usage bias, such as the 'effective number of codons' (Nc), which measure deviation from a uniform bias (null hypothesis), CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes.Composite transposon: A composite transposon is similar in function to simple transposons and Insertion Sequence (IS) elements in that it has protein coding DNA segments flanked by inverted, repeated sequences that can be recognized by transposase enzymes. A composite transposon, however, is flanked by two separate IS elements which may or may not be exact replicas.Clonal Selection Algorithm: In artificial immune systems, Clonal selection algorithms are a class of algorithms inspired by the clonal selection theory of acquired immunity that explains how B and T lymphocytes improve their response to antigens over time called affinity maturation. These algorithms focus on the Darwinian attributes of the theory where selection is inspired by the affinity of antigen-antibody interactions, reproduction is inspired by cell division, and variation is inspired by somatic hypermutation.Gene signature: A gene signature is a group of genes in a cell whose combined expression patternItadani H, Mizuarai S, Kotani H. Can systems biology understand pathway activation?Haloarcula hispanica SH1 virus: Haloarcula hispanica SH1 virus is a double-stranded DNA virus that infects the archaeon Haloarcula hispanica.Bamford DH, Ravantti JJ, Rönnholm G, Laurinavicius S, Kukkaro P, Dyall-Smith M, Somerharju P, Kalkkinen N, Bamford JK (2005) Constituents of SH1, a novel lipid-containing virus infecting the halophilic euryarchaeon Haloarcula hispanica.Infinite alleles model: The infinite alleles model is a mathematical model for calculating genetic mutations. The Japanese geneticist Motoo Kimura and American geneticist James F.Phenotype microarray: The phenotype microarray approach is a technology for high-throughput phenotyping of cells.Nucleic acid structure: Nucleic acid structure refers to the structure of nucleic acids such as DNA and RNA. Chemically speaking, DNA and RNA are very similar.Single-strand conformation polymorphism: Single-strand conformation polymorphism (SSCP), or single-strand chain polymorphism, is defined as conformational difference of single-stranded nucleotide sequences of identical length as induced by differences in the sequences under certain experimental conditions. This property allows sequences to be distinguished by means of gel electrophoresis, which separates fragments according to their different conformations.Amplified fragment length polymorphismPurine nucleotide cycle: The Purine Nucleotide Cycle is a metabolic pathway in which fumarate is generated from aspartate in order to increase the concentration of Krebs cycle intermediates.Salway, J.Generalizability theory: Generalizability theory, or G Theory, is a statistical framework for conceptualizing, investigating, and designing reliable observations. It is used to determine the reliability (i.Operon: In genetics, an operon is a functioning unit of genomic DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo trans-splicing to create monocistronic mRNAs that are translated separately, i.Direct repeat: Direct repeats are a type of genetic sequence that consists of two or more repeats of a specific sequence.Exome: The exome is the part of the genome formed by exons, the sequences which when transcribed remain within the mature RNA after introns are removed by RNA splicing. It consists of all DNA that is transcribed into mature RNA in cells of any type as distinct from the transcriptome, which is the RNA that has been transcribed only in a specific cell population.Cellular microarray: A cellular microarray is a laboratory tool that allows for the multiplex interrogation of living cells on the surface of a solid support. The support, sometimes called a "chip", is spotted with varying materials, such as antibodies, proteins, or lipids, which can interact with the cells, leading to their capture on specific spots.Point mutationIntron: right|thumbnail|270px|Representation of intron and [[exons within a simple gene containing a single intron.]]YjdF RNA motifRecombination (cosmology): In cosmology, recombination refers to the epoch at which charged electrons and protons first became bound to form electrically neutral hydrogen atoms.Note that the term recombination is a misnomer, considering that it represents the first time that electrically neutral hydrogen formed.GC box: In molecular biology, a GC box is a distinct pattern of nucleotides found in the promoter region of some eukaryotic genes upstream of the TATA box and approximately 110 bases upstream from the transcription initiation site. It has a consensus sequence GGGCGG which is position dependent and orientation independent.Allele-specific oligonucleotide: An allele-specific oligonucleotide (ASO) is a short piece of synthetic DNA complementary to the sequence of a variable target DNA. It acts as a probe for the presence of the target in a Southern blot assay or, more commonly, in the simpler Dot blot assay.Extracellular: In cell biology, molecular biology and related fields, the word extracellular (or sometimes extracellular space) means "outside the cell". This space is usually taken to be outside the plasma membranes, and occupied by fluid.Molecular evolution: Molecular evolution is a change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes.List of sequenced eukaryotic genomesXAP Home Automation protocol: xAP is an open protocol used for home automation and supports integration of telemetry and control devices primarily within the home. Common communications networks include RS232, RS485, Ethernet& wireless.ParaHox: The ParaHox gene cluster is an array of homeobox genes (involved in morphogenesis, the regulation of patterns of anatomical development) from the Gsx, Xlox (Pdx) and Cdx gene families.Gene polymorphismDe novo transcriptome assembly: De novo transcriptome assembly is the method of creating a transcriptome without the aid of a reference genome.Assay sensitivity: Assay sensitivity is a property of a clinical trial defined as the ability of a trial to distinguish an effective treatment from a less effective or ineffective intervention. Without assay sensitivity, a trial is not internally valid and is not capable of comparing the efficacy of two interventions.Heteroduplex: A heteroduplex is a double-stranded (duplex) molecule of nucleic acid originated through the genetic recombination of single complementary strands derived from different sources, such as from different homologous chromosomes or even from different organisms.Signature-tagged mutagenesis: Signature-tagged mutagenesis (STM) is a genetic technique used to study gene function. Recent advances in genome sequencing have allowed us to catalogue a large variety of organisms' genomes, but the function of the genes they contain is still largely unknown.DNA binding site: DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence (e.Pedigree chart: A pedigree chart is a diagram that shows the occurrence and appearance or phenotypes of a particular gene or organism and its ancestors from one generation to the next,pedigree chart Genealogy Glossary - About.com, a part of The New York Times Company.Beef cattle: Beef cattle are cattle raised for meat production (as distinguished from dairy cattle, used for milk production). The meat of adult cattle is known as beef.
(1/2909) Accurate taxonomic assignment of short pyrosequencing reads.
Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment of short reads can be made by mapping each read to the node in the reference taxonomy that provides the best precision and recall. We show that given a suffix array for the sequences in the reference taxonomy, a short read can be mapped to the node of the reference taxonomy with the best combined value of precision and recall in time linear in the size of the taxonomy subtree rooted at the lowest common ancestor of the matching sequences. An accurate taxonomic assignment of short reads can thus be made with about the same efficiency as when mapping each read to the lowest common ancestor of all matching sequences in a reference taxonomy. We demonstrate the effectiveness of our approach on several metagenomic datasets of marine and gut microbiota. (+info)
(2/2909) Identification and classification of small RNAs in transcriptome sequence data.
Current methods for high throughput sequencing (HTS) for the first time offer the opportunity to investigate the entire transcriptome in an essentially unbiased way. In many species, small non-coding RNAs with specific secondary structures constitute a significant part of the transcriptome. Some of these RNA classes, in particular microRNAs and snoRNAs, undergo maturation processes that lead to the production of shorter RNAs. After mapping the sequences to the reference genome specific patterns of short reads can be observed. These read patterns seem to reflect the processing and thus are specific for the RNA transcripts of which they are derived from. We explore here the potential of short read sequence data in the classification and identification of non-coding RNAs. (+info)
(3/2909) Targeted high-throughput DNA sequencing for gene discovery in retinitis pigmentosa.
(4/2909) AutoMeDIP-seq: a high-throughput, whole genome, DNA methylation assay.
(5/2909) Introduction into the analysis of high-throughput-sequencing based epigenome data.
(6/2909) The next generation of molecular markers from massively parallel sequencing of pooled DNA samples.
(7/2909) Next-generation genomics: an integrative approach.
(8/2909) Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.