• algorithm
  • The evaluated approaches included aligning the long reads to a graph created from short read alignments instead of the reference genome, which led to our final contribution: extending a co-linear chaining algorithm from between two sequences to between a sequence and a directed acyclic graph. (helsinki.fi)
  • We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy.These datasets allowed for comparison of the algorithm with other methods.In these tests, RNAspa performed better than four other programs. (nih.gov)
  • We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. (nih.gov)
  • We first tested the robustness of our algorithm to the arbitrary order of the sequences chosen. (nih.gov)
  • The fact that the algorithm is based on calculating paths in linear time, and only in the final stage is a quadratic time Sum-of-Pair score calculated, enables the algorithm to scale, in practice, almost linearly with the number of sequences. (nih.gov)
  • The Inside-Outside algorithm is used in model parametrization to estimate prior frequencies observed from training sequences in the case of RNAs. (wikipedia.org)
  • Dynamic programming variants of the CYK algorithm find the Viterbi parse of a RNA sequence for a PCFG model. (wikipedia.org)
  • BLAT (BLAST-like alignment tool) is a pairwise sequence alignment algorithm that was developed by Jim Kent at the University of California Santa Cruz (UCSC) in the early 2000s to assist in the assembly and annotation of the human genome. (wikipedia.org)
  • This is because a small k-mer size is necessary in order to achieve high levels of sensitivity, but this increases the number of false positive hits, thus increasing the amount of time spent in the alignment stage of the algorithm. (wikipedia.org)
  • In 1989, based on Carrillo-Lipman Algorithm, Altschul introduced a practical method that uses pairwise alignments to constrain the n-dimensional search space. (wikipedia.org)
  • In 1970, Saul B. Needleman and Christian D. Wunsch published the first computer algorithm for aligning two sequences. (wikipedia.org)
  • Essentially, tree alignment is an algorithm for optimizing phylogenetic tree by calculating the edit distance to achieve the minimum value. (wikipedia.org)
  • Tree alignment problem is a NP-hard problem when we restrict its scoring mode and alphabet size, and it can be found an algorithm, which uses to find the optimized solution. (wikipedia.org)
  • Depending on its transformation strategy, the combinatorial optimization strategy can be divided into the tree alignment algorithm and the star alignment algorithm. (wikipedia.org)
  • rRNA
  • The project also financed deep sequencing of bacterial 16S rRNA sequences amplified by polymerase chain reaction from human subjects. (wikipedia.org)
  • 28S ribosomal RNA is the structural ribosomal RNA (rRNA) for the large component, or large subunit (LSU) of eukaryotic cytoplasmic ribosomes, and thus one of the basic components of all eukaryotic cells. (wikipedia.org)
  • For example, miRNAs regulate protein coding gene expression by binding to 3' UTRs, small nucleolar RNAs guide post-transcriptional modifications by binding to rRNA, U4 spliceosomal RNA and U6 spliceosomal RNA bind to each other forming part of the spliceosome and many small bacterial RNAs regulate gene expression by antisense interactions E.g. (wikipedia.org)
  • Genomes
  • The INFERNAL package can also be used with Rfam to annotate sequences (including complete genomes) for homologues to known ncRNAs. (wikipedia.org)
  • reads
  • The Python script htseq-qa takes a file with sequencing reads (either raw or aligned reads) and produces a PDF file with useful plots to assess the technical quality of a run. (wikipedia.org)
  • QC-Chain QC-Chain is a package of quality control tools for next generation sequencing (NGS) data, consisting of both raw reads quality evaluation and de novo contamination screening, which could identify all possible contamination sequences. (wikipedia.org)
  • Quickly scans reads and gathers statistics on base and quality frequencies, read length, and frequent sequences. (wikipedia.org)
  • RNA reads may be obtained using a variety of RNA-seq methods. (wikipedia.org)
  • Strand NGS also allows users to perform quality control on the imported data and filter reads before the main analysis is performed. (wikipedia.org)
  • Illumina
  • Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. (frontiersin.org)
  • Next Generational Sequencing (NGS) technologies, such as Illumina/Solexa, AB SOLiD and 454 Pyrosequencing are revolutionizing the acquisition of genomics data. (utoronto.ca)
  • The toolkit comprises user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. (wikipedia.org)
  • It can import raw read sequences from sequencing platforms like Illumina, Ion Torrent, PacBio, ABI, and 454 Life Sciences and supports fragment, single-end, paired-end, mate-paired, directional single/ paired end library types. (wikipedia.org)
  • read sequences
  • Yksi yleisistä RNA-sekvensointidatan analyysitavoista koostuu kolmesta osasta: lukujaksojen (read sequences) linjaus referenssigenomiin, transkriptien kokoaminen, ja transkriptien ekspressiotasojen määrittäminen. (helsinki.fi)
  • bacterial
  • Important components of the HMP were culture-independent methods of microbial community characterization, such as metagenomics (which provides a broad genetic perspective on a single microbial community), as well as extensive whole genome sequencing (which provides a "deep" genetic perspective on certain aspects of a given microbial community, i.e. of individual bacterial species). (wikipedia.org)
  • In other bacterial species, a permuted ssrA gene produces a two-piece tmRNA in which two separate RNA chains are joined by base-pairing. (wikipedia.org)
  • Watson-Crick and G-U base pairs were identified by comparing the bacterial tmRNA sequences using automated computational methods in combination with manual alignment procedures. (wikipedia.org)
  • structures
  • We used 150 suboptimal structures for each sequence. (nih.gov)
  • The single sequence methods mentioned above have a difficult job detecting a small sample of reasonable secondary structures from a large space of possible structures. (wikipedia.org)
  • This suggests that tmRNA folding outside the TLD can be important, yet the pseudoknot region lacks conserved residues and pseudoknots are among the first structures to be lost as ssrA sequences diverge in plastid and endosymbiont lineages. (wikipedia.org)
  • Antisense
  • According to the pairing region of a sense and antisense RNA pair, hNATs are divided into 6 classes, of which about 87% involve 5' or 3' UTR sequences, supporting the regulatory role of UTRs. (mendeley.com)
  • infer
  • Our results describe a link between the evolutionary conservation of plant MIRNAs and the mechanisms underlying the biogenesis of these small RNAs and show that the MIRNA pattern of conservation can be used to infer the mode of miRNA biogenesis. (plantcell.org)
  • As proper ortholog identification is pivotal to phylogenetic analyses, there are a variety of methods available to infer orthologs and paralogs. (wikipedia.org)
  • Rfam
  • Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. (wikipedia.org)
  • Rfam researchers also contribute to Wikipedia's RNA WikiProject. (wikipedia.org)
  • The interface at the Rfam website allows users to search ncRNAs by keyword, family name, or genome as well as to search by ncRNA sequence or EMBL accession number. (wikipedia.org)
  • This seed alignment is used to create the SCFG, which is used with the Rfam software INFERNAL to identify additional family members and add them to the alignment. (wikipedia.org)
  • structure
  • For example, it can be used to predict secondary structure, generate trees, and assess consensus and conservation across sequence families. (jalview.org)
  • The course provides training in loading, editing, annotating and saving alignments, viewing 3D structure and implementing a range of analysis. (jalview.org)
  • Here, we designed a strategy to systematically analyze MIRNAs from different species generating a graphical representation of the conservation of the primary sequence and secondary structure. (plantcell.org)
  • miRNAs are transcribed as longer precursors harboring an imperfect fold-back structure, with the small RNA embedded in one of its arms. (plantcell.org)
  • A typical animal miRNA primary transcript harbors a fold-back structure that consists of an ∼35-bp stem and a terminal loop that is flanked by single-stranded RNA ( ssRNA ) segments ( Ha and Kim, 2014 ). (plantcell.org)
  • The course considered the relationship between protein sequence and structure. (jalview.org)
  • The picture was taken as Jim talked about the relationship between protein sequences and their structure. (jalview.org)
  • PCFGs have application in areas as diverse as natural language processing to the study the structure of RNA molecules and design of programming languages. (wikipedia.org)
  • RSeQC RSeQC analyzes diverse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. (wikipedia.org)
  • In the database, the information of the secondary structure and the primary sequence, represented by the MSA, is combined in statistical models called profile stochastic context-free grammars (SCFGs), also known as covariance models. (wikipedia.org)
  • Access to various analysis tools, (including viewing a 3D structure-viewer from a PDBid), is provided as separate command buttons to analyze every record from a BLAST report before making a final selection. (wikipedia.org)
  • Revealing the evolution and genetic diversity of sequences and organisms Identification of molecular structure from sequence alone In chemistry, sequence analysis comprises techniques used to determine the sequence of a polymer formed of several monomers. (wikipedia.org)
  • In sociology, sequence methods are increasingly used to study life-course and career trajectories, patterns of organizational and national development, conversation and interaction structure, and the problem of work/family synchrony. (wikipedia.org)
  • Such a direct relationship can for example be the evolutionary pressure for two positions to maintain mutual compatibility in the biomolecular structure of the sequence, leading to molecular coevolution between the two positions. (wikipedia.org)
  • The complete E. coli tmRNA secondary structure was elucidated by comparative sequence analysis and structural probing. (wikipedia.org)
  • platforms
  • Now another revolution is imminent with the third-generation sequencing platforms producing an order of magnitude higher read lengths. (helsinki.fi)
  • These platforms offer much reduced costs and an increased speed of data acquisition, but the length of the sequences acquired is much reduced, from 500-1000 base pairs, to as little as 25 base pairs per read. (utoronto.ca)
  • January 16th -- Next Generation Sequencing Platforms. (utoronto.ca)
  • query sequences
  • The general algorithmic process followed by BLAT is similar to BLAST's in that it first searches for short segments in the database and query sequences which have a certain number of matching elements. (wikipedia.org)
  • It does this by keeping an indexed list (hash table) of the target database in memory, which significantly reduces the time required for the comparison of the query sequences with the target database. (wikipedia.org)
  • In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. (wikipedia.org)
  • data
  • The first part of this thesis focuses on the analysis of short-read RNA-seq data. (helsinki.fi)
  • The second part, where the main contributions of this thesis lie, focuses on the analysis of long-read RNA-seq data. (helsinki.fi)
  • Analysis of DNA/RNA and protein sequence data. (umich.edu)
  • Analysis of expression array data. (umich.edu)
  • In this class we will explore the features of NGS data that make it different from classical sequencing data, and try to determine what are the possible methods to address some of these differences. (utoronto.ca)
  • Among them sequence data is increasing at the exponential rate due to advent of next-generation sequencing technologies. (wikipedia.org)
  • Another limitation of alignment-based approaches is their computational complexity and are time-consuming and thus, are limited when dealing with large-scale sequence data. (wikipedia.org)
  • The advent of next-generation sequencing technologies has resulted in generation of voluminous sequencing data. (wikipedia.org)
  • Often, is necessary to filter data, removing low quality sequences or bases (trimming), adapters, contaminations, overrepresented sequences or correcting errors to assure a coherent final result. (wikipedia.org)
  • mRIN mRIN - Assessing mRNA integrity directly from RNA-Seq data. (wikipedia.org)
  • NGSQC NGSQC: cross-platform quality analysis pipeline for deep sequencing data. (wikipedia.org)
  • NGS QC Toolkit NGS QC Toolkit A toolkit for the quality control (QC) of next generation sequencing (NGS) data. (wikipedia.org)
  • It also includes few other tools, which are helpful in NGS data quality control and analysis. (wikipedia.org)
  • PRINSEQ PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. (wikipedia.org)
  • It is particular designed for 454/Roche data, but can also be used for other types of sequence. (wikipedia.org)
  • QC3 QC3 a quality control tool designed for DNA sequencing data for raw data, alignment, and variant calling. (wikipedia.org)
  • As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics. (wikipedia.org)
  • There are a number of public databases that contain freely available RNA-Seq data. (wikipedia.org)
  • RNA-Seq data may be directly assembled into transcripts using sequence assembly. (wikipedia.org)
  • Genome-guided assembly (sometimes mapping or reference-guided assembly) - is capable of using a pre-existing reference to guide the assembly of transcripts Both methods attempt to generate biologically representative isoform-level constructs from RNA-seq data and generally attempt to associate isoforms with a gene-level construct. (wikipedia.org)
  • When selecting or generating sequence data, it is also vital to consider the tissue type, developmental stage and environmental conditions of the organisms. (wikipedia.org)
  • RevTrans will even use protein data to inform DNA alignments, which can be beneficial for resolving more distant phylogenetic relationships. (wikipedia.org)
  • It is not uncommon to translate RNA sequence into protein sequence when using transcriptomic data, especially when analyzing highly diverged taxa. (wikipedia.org)
  • Sequerome directly queries the input sequence against a variety of databases/tools ('popular public domains' and 'privately hosted services') including BLAST, Protein Data Bank (PDB), REBASE and others, and generates outputs that are intuitive and easily comprehensible. (wikipedia.org)
  • One of the key features of a profiling an input sequence data is to store, retrieve and effectively combine and re-use the older inputs. (wikipedia.org)
  • Direct coupling analysis or DCA is an umbrella term comprising several methods for analyzing sequence data in computational biology. (wikipedia.org)
  • Strand NGS is a software platform for next-generation sequencing data analysis. (wikipedia.org)
  • Statistical tests, specifically designed to handling count based data, can be used for differential gene expression and alternative splicing analysis. (wikipedia.org)
  • Transcripts
  • A thorough in silico analysis of human transcripts will help expand our knowledge of NATs. (mendeley.com)
  • Combined with endogenous micro RNAs, hNATs could be regarded as a special group of transcripts contributing to the complex regulation networks. (mendeley.com)
  • transcript
  • For transcript assembly we propose a novel (at the time of the publication) approach of using minimum-cost flows to solve the problem of covering a graph created from the read alignments with a set of paths with the minimum cost, under some cost model. (helsinki.fi)
  • GENCODE Release 1 contained 416 known loci, 26 novel (coding DNA sequence) CDS loci, 82 novel transcript loci, 78 putative loci, 104 processed pseudogenes and 66 unprocessed pseudogenes. (wikipedia.org)
  • At the time of release, GENCODE Release 7 had the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. (wikipedia.org)
  • Computational Biology
  • Currently TopHat is a collaborative effort between Cole Trapnell at the University of Washington and Daehwan Kim and Steven Salzberg in the Center for Computational Biology at Johns Hopkins University who together in 2013 also came up with TopHat2 which does accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. (wikipedia.org)