• query sequences
  • The general algorithmic process followed by BLAT is similar to BLAST's in that it first searches for short segments in the database and query sequences which have a certain number of matching elements. (wikipedia.org)
  • It does this by keeping an indexed list (hash table) of the target database in memory, which significantly reduces the time required for the comparison of the query sequences with the target database. (wikipedia.org)
  • Calculating a global alignment is a form of global optimization that "forces" the alignment to span the entire length of all query sequences. (wikipedia.org)
  • In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. (wikipedia.org)
  • molecule
  • In bacteria, RNase P consists of an RNA molecule of some 400 nt in length (11, 28) and a small (about 120 aa) protein (33). (google.fr)
  • hence at least in these organisms, RNase P is a ribozyme (an RNA molecule catalysing chemical reactions). (google.fr)
  • The method used in this study, which is called "Sanger method" or Sanger sequencing, was a milestone in sequencing long strand molecule such as DNA. (wikipedia.org)
  • Robert Holley and his team in Cornell University was believed to be the first to sequence RNA molecule. (wikipedia.org)
  • Transfer-messenger RNA (abbreviated tmRNA, also known as 10Sa RNA and by its genetic name SsrA) is a bacterial RNA molecule with dual tRNA-like and messenger RNA-like properties. (wikipedia.org)
  • genes
  • The information contained in the genome of an organism, its DNA, is expressed through transcription of its genes to RNA, in quantities determined by many internal and external factors. (helsinki.fi)
  • Novel small RNA‐encoding genes in the intergenic regions of Escherichia coli. (currentprotocols.com)
  • Given the initial success of the project, GENCODE now aims to build an "Encyclopedia of genes and genes variants" by identifying all gene features in the human and mouse genome using a combination of computational analysis, manual annotation, and experimental validation, and annotating all evidence-based gene features in the entire human genome at a high accuracy. (wikipedia.org)
  • The comparison of the sequences from these genes are sometimes used in molecular analysis to construct phylogenetic trees, for example in Protists, Fungi, Insects, Tardigrades, and Vertebrates. (wikipedia.org)
  • gene expression
  • Scotty Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. (wikipedia.org)
  • For example, miRNAs regulate protein coding gene expression by binding to 3' UTRs, small nucleolar RNAs guide post-transcriptional modifications by binding to rRNA, U4 spliceosomal RNA and U6 spliceosomal RNA bind to each other forming part of the spliceosome and many small bacterial RNAs regulate gene expression by antisense interactions E.g. (wikipedia.org)
  • rRNA
  • The project also financed deep sequencing of bacterial 16S rRNA sequences amplified by polymerase chain reaction from human subjects. (wikipedia.org)
  • 28S ribosomal RNA is the structural ribosomal RNA (rRNA) for the large component, or large subunit (LSU) of eukaryotic cytoplasmic ribosomes, and thus one of the basic components of all eukaryotic cells. (wikipedia.org)
  • bacterial
  • Bacterial RNase P RNAs have been separated into two main structural classes. (google.fr)
  • Important components of the HMP were culture-independent methods of microbial community characterization, such as metagenomics (which provides a broad genetic perspective on a single microbial community), as well as extensive whole genome sequencing (which provides a "deep" genetic perspective on certain aspects of a given microbial community, i.e. of individual bacterial species). (wikipedia.org)
  • The latter served as reference genomic sequences - 3000 such sequences of individual bacterial isolates are currently planned - for comparison purposes during subsequent metagenomic analysis. (wikipedia.org)
  • In other bacterial species, a permuted ssrA gene produces a two-piece tmRNA in which two separate RNA chains are joined by base-pairing. (wikipedia.org)
  • Watson-Crick and G-U base pairs were identified by comparing the bacterial tmRNA sequences using automated computational methods in combination with manual alignment procedures. (wikipedia.org)
  • Genomes
  • The INFERNAL package can also be used with Rfam to annotate sequences (including complete genomes) for homologues to known ncRNAs. (wikipedia.org)
  • reads
  • The Python script htseq-qa takes a file with sequencing reads (either raw or aligned reads) and produces a PDF file with useful plots to assess the technical quality of a run. (wikipedia.org)
  • QC-Chain QC-Chain is a package of quality control tools for next generation sequencing (NGS) data, consisting of both raw reads quality evaluation and de novo contamination screening, which could identify all possible contamination sequences. (wikipedia.org)
  • Quickly scans reads and gathers statistics on base and quality frequencies, read length, and frequent sequences. (wikipedia.org)
  • RNA reads may be obtained using a variety of RNA-seq methods. (wikipedia.org)
  • common ancestor
  • If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another. (wikipedia.org)
  • Rfam
  • Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. (wikipedia.org)
  • Rfam researchers also contribute to Wikipedia's RNA WikiProject. (wikipedia.org)
  • The interface at the Rfam website allows users to search ncRNAs by keyword, family name, or genome as well as to search by ncRNA sequence or EMBL accession number. (wikipedia.org)
  • This seed alignment is used to create the SCFG, which is used with the Rfam software INFERNAL to identify additional family members and add them to the alignment. (wikipedia.org)
  • structures
  • Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides. (wikipedia.org)
  • The single sequence methods mentioned above have a difficult job detecting a small sample of reasonable secondary structures from a large space of possible structures. (wikipedia.org)
  • This suggests that tmRNA folding outside the TLD can be important, yet the pseudoknot region lacks conserved residues and pseudoknots are among the first structures to be lost as ssrA sequences diverge in plastid and endosymbiont lineages. (wikipedia.org)
  • tRNA
  • It catalyses the removal of 5′ leader sequences from tRNA precursor molecules. (google.fr)
  • The presence of pseudouridine in the mixed 10S RNA hinted that tmRNA has modified bases found also in tRNA. (wikipedia.org)
  • Subsequent sequence comparison revealed the full tRNA-like domain (TLD) formed by the 5' and 3' ends of tmRNA, including the acceptor stem with elements like those in alanine tRNA that promote its aminoacylation by alanine-tRNA ligase. (wikipedia.org)
  • structure
  • It includes sophisticated editing options and provides a range of analysis tools to investigate the structure and function of macromolecules through a multiple window interface. (jalview.org)
  • For example, it can be used to predict secondary structure, generate trees, and assess consensus and conservation across sequence families. (jalview.org)
  • The course provides training in loading, editing, annotating and saving alignments, viewing 3D structure and implementing a range of analysis. (jalview.org)
  • The course considered the relationship between protein sequence and structure. (jalview.org)
  • It used JPred4 to predict protein structure from multiple alignments. (jalview.org)
  • The picture was taken as Jim talked about the relationship between protein sequences and their structure. (jalview.org)
  • JPred predicts the location of secondary structure (α-helix and β-strand) and solvent accessibility from a single sequence of multiple alignment. (jalview.org)
  • PCFGs have application in areas as diverse as natural language processing to the study the structure of RNA molecules and design of programming languages. (wikipedia.org)
  • RSeQC RSeQC analyzes diverse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. (wikipedia.org)
  • Producing multiple sequence alignments (MSA) of these families can provide insight into their structure and function, similar to the case of protein families. (wikipedia.org)
  • In the database, the information of the secondary structure and the primary sequence, represented by the MSA, is combined in statistical models called profile stochastic context-free grammars (SCFGs), also known as covariance models. (wikipedia.org)
  • Access to various analysis tools, (including viewing a 3D structure-viewer from a PDBid), is provided as separate command buttons to analyze every record from a BLAST report before making a final selection. (wikipedia.org)
  • Such a direct relationship can for example be the evolutionary pressure for two positions to maintain mutual compatibility in the biomolecular structure of the sequence, leading to molecular coevolution between the two positions. (wikipedia.org)
  • Revealing the evolution and genetic diversity of sequences and organisms Identification of molecular structure from sequence alone In chemistry, sequence analysis comprises techniques used to determine the sequence of a polymer formed of several monomers. (wikipedia.org)
  • In sociology, sequence methods are increasingly used to study life-course and career trajectories, patterns of organizational and national development, conversation and interaction structure, and the problem of work/family synchrony. (wikipedia.org)
  • The complete E. coli tmRNA secondary structure was elucidated by comparative sequence analysis and structural probing. (wikipedia.org)
  • data
  • The first part of this thesis focuses on the analysis of short-read RNA-seq data. (helsinki.fi)
  • The second part, where the main contributions of this thesis lie, focuses on the analysis of long-read RNA-seq data. (helsinki.fi)
  • Analysis of DNA/RNA and protein sequence data. (umich.edu)
  • Analysis of expression array data. (umich.edu)
  • Next Generational Sequencing (NGS) technologies, such as Illumina/Solexa, AB SOLiD and 454 Pyrosequencing are revolutionizing the acquisition of genomics data. (utoronto.ca)
  • These platforms offer much reduced costs and an increased speed of data acquisition, but the length of the sequences acquired is much reduced, from 500-1000 base pairs, to as little as 25 base pairs per read. (utoronto.ca)
  • In this class we will explore the features of NGS data that make it different from classical sequencing data, and try to determine what are the possible methods to address some of these differences. (utoronto.ca)
  • Among them sequence data is increasing at the exponential rate due to advent of next-generation sequencing technologies. (wikipedia.org)
  • Another limitation of alignment-based approaches is their computational complexity and are time-consuming and thus, are limited when dealing with large-scale sequence data. (wikipedia.org)
  • The advent of next-generation sequencing technologies has resulted in generation of voluminous sequencing data. (wikipedia.org)
  • Often, is necessary to filter data, removing low quality sequences or bases (trimming), adapters, contaminations, overrepresented sequences or correcting errors to assure a coherent final result. (wikipedia.org)
  • mRIN mRIN - Assessing mRNA integrity directly from RNA-Seq data. (wikipedia.org)
  • NGSQC NGSQC: cross-platform quality analysis pipeline for deep sequencing data. (wikipedia.org)
  • NGS QC Toolkit NGS QC Toolkit A toolkit for the quality control (QC) of next generation sequencing (NGS) data. (wikipedia.org)
  • The toolkit comprises user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. (wikipedia.org)
  • It also includes few other tools, which are helpful in NGS data quality control and analysis. (wikipedia.org)
  • PRINSEQ PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. (wikipedia.org)
  • It is particular designed for 454/Roche data, but can also be used for other types of sequence. (wikipedia.org)
  • QC3 QC3 a quality control tool designed for DNA sequencing data for raw data, alignment, and variant calling. (wikipedia.org)
  • As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics. (wikipedia.org)
  • There are a number of public databases that contain freely available RNA-Seq data. (wikipedia.org)
  • RNA-Seq data may be directly assembled into transcripts using sequence assembly. (wikipedia.org)
  • Genome-guided assembly (sometimes mapping or reference-guided assembly) - is capable of using a pre-existing reference to guide the assembly of transcripts Both methods attempt to generate biologically representative isoform-level constructs from RNA-seq data and generally attempt to associate isoforms with a gene-level construct. (wikipedia.org)
  • When selecting or generating sequence data, it is also vital to consider the tissue type, developmental stage and environmental conditions of the organisms. (wikipedia.org)
  • RevTrans will even use protein data to inform DNA alignments, which can be beneficial for resolving more distant phylogenetic relationships. (wikipedia.org)
  • It is not uncommon to translate RNA sequence into protein sequence when using transcriptomic data, especially when analyzing highly diverged taxa. (wikipedia.org)
  • Sequerome directly queries the input sequence against a variety of databases/tools ('popular public domains' and 'privately hosted services') including BLAST, Protein Data Bank (PDB), REBASE and others, and generates outputs that are intuitive and easily comprehensible. (wikipedia.org)
  • One of the key features of a profiling an input sequence data is to store, retrieve and effectively combine and re-use the older inputs. (wikipedia.org)
  • Direct coupling analysis or DCA is an umbrella term comprising several methods for analyzing sequence data in computational biology. (wikipedia.org)
  • conservation
  • Although DNA and RNA nucleotide bases are more similar to each other than are amino acids, the conservation of base pairs can indicate a similar functional or structural role. (wikipedia.org)
  • In protein alignments, such as the one in the image above, color is often used to indicate amino acid properties to aid in judging the conservation of a given amino acid substitution. (wikipedia.org)
  • the consensus sequence is also often represented in graphical format with a sequence logo in which the size of each nucleotide or amino acid letter corresponds to its degree of conservation. (wikipedia.org)
  • Rather than using a single sequence, profile methods use a multiple sequence alignment to encode a profile which contains information about the conservation level of each residue. (wikipedia.org)
  • structural
  • It is a hand-curated alignment that contains representative members of the ncRNA family and is annotated with structural information. (wikipedia.org)
  • The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose side chains have similar biochemical properties) in a particular region of the sequence, suggest that this region has structural or functional importance. (wikipedia.org)
  • mismatches
  • Because the search is greedy, the first valid alignment encountered by Bowtie will not necessarily be the 'best' in terms of the number of mismatches or in terms of quality. (wikipedia.org)
  • For nucleotide sequences a similar gap penalty is used, but a much simpler substitution matrix, wherein only identical matches and mismatches are considered, is typical. (wikipedia.org)
  • Computational Biology
  • Currently TopHat is a collaborative effort between Cole Trapnell at the University of Washington and Daehwan Kim and Steven Salzberg in the Center for Computational Biology at Johns Hopkins University who together in 2013 also came up with TopHat2 which does accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. (wikipedia.org)