PLOS ONE: Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads: Evaluation of Effective Study...
Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating ...http://journals.plos.org/plosone/article/related?id=10.1371/journal.pone.0053608
16S rRNA gene sequencing of mock microbial populations- impact of DNA extraction method, primer choice and sequencing platform
Background Next-generation sequencing platforms have revolutionised our ability to investigate the microbiota composition of complex environments, frequently through 16S rRNA gene sequencing of the bacterial component of the community. Numerous factors, including DNA extraction method, primer sequences and sequencing platform employed, can affect the accuracy of the results achieved. The aim of this study was to determine the impact of these three factors on 16S rRNA gene sequencing results, using mock communities and mock community DNA. Results The use of different primer sequences (V4-V5, V1-V2 and V1-V2 degenerate primers) resulted in differences in the genera and species detected. The V4-V5 primers gave the most comparable results across platforms. The three Ion PGM primer sets detected more of the 20 mock community species than the equivalent MiSeq primer sets. Data generated from DNA extracted using the 2 extraction methods were very similar. Conclusions Microbiota ...https://t-stor.teagasc.ie/handle/11019/1018
Optimal spliced alignments of short sequence reads | BMC Bioinformatics | Full Text
Next generation sequencing technologies open exciting new possibilities for genome and transcriptome sequencing. While reads produced by these technologies are relatively short and error-prone compared to the Sanger method, their throughput is several magnitudes higher. We present a novel approach, called QPALMA, for computing accurate spliced alignments of short sequence reads that take advantage of the read's quality information as well as computational splice site predictions. In computational experiments we illustrate that the quality information as well as the splice site predictions  help to considerably improve the alignment quality. Our algorithms were optimized and tested using artificially spliced genomic reads produced with the Illumina Genome Analyzer for the model plant Arabidopsis thaliana. ...https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-S10-O7
Hybrid de novo tandem repeat detection using short and long reads | BMC Medical Genomics | Full Text
As one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%. In this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and ...https://bmcmedgenomics.biomedcentral.com/articles/10.1186/1755-8794-8-S3-S5
Press Releases - Edico Genome
SEOUL, South Korea and SAN DIEGO, Jan. 11, 2016 - Macrogen, a global leader in genome sequencing services, and Edico Genome today announced Macrogen has chosen multiple DRAGEN™ Bio-IT Processors to reinforce its big data processing and analysis capacity for large-scale genome analysis and clinical sequencing services. Macrogen has world-class next-generation sequencing (NGS) facilities, which are equipped with Illumina's HiSeq™ X Ten, HiSeq 2000, HiSeq 2500, HiSeq 4000 and MiSeq® sequencing systems; Thermo Fisher's Ion PGM™ and Ion Proton™ systems; Roche's GS-FLX system; and PacBio instruments. Macrogen's IT infrastructure capacity exceeds 11 petabytes of storage and more than 3,000 core clusters. Using DRAGEN, Macrogen was able to analyze each genome (30x coverage) produced by their HiSeq X Ten sequencing system in only 26 minutes, while maintaining high sensitivity and specificity. This analysis included conversion from BCL, the file that is delivered by ...http://edicogenome.com/news/press-releases/
Short read sequencing and shellfish - English
Short-read sequencing used forgenomic characterization inaquacultured shellfishSteven RobertsUniversity of WashingtonSchool of Aquatic and Fishery Scienceshttps://www.slideshare.net/sr320/roberts-pagxx-satam
Employing whole genome mapping for optimal de novo assembly of bacterial genomes | BMC Research Notes | Full Text
Genome assembly is often a primary step in the process of yielding results that lead to interpretation of biological data and hence sub-optimally assembled genomes might lead to faulty conclusions . Factors causing such low quality genome assembly include sequence quality, presence of repetitive sequences, base composition, size and low genome coverage [2, 3], all of which complicate downstream data analysis using the available tools . Currently, de novo assemblers based on de Bruijn graph are considered to yield the best results provided sufficient sequence quality and coverage are achieved. Such assembly tools based on de Bruijn graph algorithms, like Velvet  and SPAdes  use k-mers as building blocks, but as most users are not bio-informaticians, these tools are often considered as an encrypted black box with the quality of the assembly usually determined by statistics parameters such as the N50 and the size and number of ...https://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-7-484
Analysing 454 amplicon resequencing experiments using the modular and database oriented Variant Identification Pipeline | BMC...
Recent DNA sequencing technology, the so-called next-generation sequencing (NGS) technology, enables researchers to read a number of DNA sequences that is several orders of magnitudes bigger and at a cost that is several orders of magnitude smaller than the previous generation DNA sequencing technologies. The cost of determining the human genome was estimated at $2.7 billion for the IHGSC genome and at $300 million for the Celera genome. Recently several human genomes were sequenced in about 1.5 months at a cost that is around $1.5 million [1, 2].. Large-scale parallel pyrosequencing from 454/Roche generates hundreds of thousands sequenced DNA reads within a matter of hours . The latest version of the sequencing technology (Titanium) enables a throughput of 0.4-0.6 gigabases per 10 h run . The amount of data to be analyzed keeps growing at an increasing speed. Other NGS platforms such as Illumina's Genome Analyzer (San Diego, CA, USA), Applied ...https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-269
PLOS Computational Biology: Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals...
Author Summary Human individual genome sequencing has recently become affordable, enabling highly detailed genetic sequence comparisons. While the identification and genotyping of single-nucleotide polymorphisms has already been successfully established for different sequencing platforms, the detection, quantification and genotyping of large-scale copy-number variants (CNVs), i.e., losses or gains of long genomic segments, has remained challenging. We present a computational approach that enables detecting CNVs in sequencing data and accurately identifies the actual copy-number at which DNA segments of interest occur in an individual genome. This approach enabled us to obtain novel insights into the largest human gene family - the olfactory receptors (ORs) - involved in smell perception. While previous studies reported an abundance of CNVs in ORs, our approach enabled us to globally identify absolute differences in OR gene counts that exist between humans. While several OR genes have very ...http://journals.plos.org/ploscompbiol/article/comments?id=10.1371/journal.pcbi.1000988&imageURI=info:doi/10.1371/journal.pcbi.1000988.g003
BFAST: An Alignment Tool for Large Scale Genome Resequencing
Background The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation. Methodology We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple ...http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007767
Genotype and Haplotype Reconstruction from Low-Coverage Short Sequencing Reads | SpringerLink
Recent advances in high-throughput sequencing (HTS) technologies have led to orders of magnitude higher throughput compared to classic Sanger sequencing (see  for a review). Coupled with continuoushttps://link.springer.com/chapter/10.1007%2F978-3-642-00727-9_7
Announcements - Mockler Lab
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE)2. Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (,16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, ...http://mocklerlab.org/announcements/56
OTU Analysis Using Metagenomic Shotgun Sequencing Data
Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs). Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison ...http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0049785
Whole genome sequencing used to help inform cancer therapy - Healthcanal.com : Healthcanal.com
SCOTTSDALE, Ariz. - Whole genome sequencing - spelling out a person's entire DNA genetic code - has moved one step closer to being a medical option for direct patient care.. Physicians and researchers at Mayo Clinic in Arizona and the Translational Genomics Research Institute (TGen) successfully completed sequencing both a single patients normal and cancer cells - a tour de force of more than 6 billion DNA chemical bases.. While the whole genomes of several individuals or their cancers have been sequenced in recent years, this is believed to be among the first successful application of whole genome sequencing performed in support of the medical care of a specific cancer patient.. A male patient with pancreatic cancer was the first patient at Mayo Clinic to have whole genome sequencing performed on both his tumor and non-cancerous cells as part of a clinical research project. By comparing the tumor DNA to the patient's normal DNA, researchers found genetic changes (mutations) that were ...https://www.healthcanal.com/cancers/14596-whole-genome-sequencing-used-to-help-inform-cancer-therapy.html
GigaDB Dataset - DOI 10.5524/100311 - Supporting data for 'De Novo PacBio long-read and phased avian genome assemblies correct...
Reference quality genomes provide a resource for studying gene structure, function, and evolution. However, often genes of interest are not completely or accurately assembled, leading to unknown errors in analyses or additional cloning efforts for the correct sequences. A promising solution is long-read sequencing. Here we tested PacBio-based long-read sequencing and diploid assembly for potential improvements to the Sanger-based intermediate-read zebra finch reference and Illumina-based short-read Anna's hummingbird reference, two vocal learning avian species widely studied in neuroscience and genomics. With DNA of the same individuals used to generate the reference genomes, we generated diploid assemblies with the FALCON-Unzip assembler, resulting in contigs with no gaps in the megabase range, representing 150-fold and 200-fold improvements over the current zebra finch and hummingbird references, respectively. These long-read and phased assemblies corrected and resolved what we discovered ...http://gigadb.org/dataset/view/id/100311/File_page/2/File_sort/name
Groundtruthing Next-Gen Sequencing for Microbial Ecology-Biases and Errors in Community Structure Estimates from PCR Amplicon...
Analysis of microbial communities by high-throughput pyrosequencing of SSU rRNA gene PCR amplicons has transformed microbial ecology research and led to the observation that many communities contain a diverse assortment of rare taxa-a phenomenon termed the Rare Biosphere. Multiple studies have investigated the effect of pyrosequencing read quality on operational taxonomic unit (OTU) richness for contrived communities, yet there is limited information on the fidelity of community structure estimates obtained through this approach. Given that PCR biases are widely recognized, and further unknown biases may arise from the sequencing process itself, a priori assumptions about the neutrality of the data generation process are at best unvalidated. Furthermore, post-sequencing quality control algorithms have not been explicitly evaluated for the accuracy of recovered representative sequences and its impact on downstream analyses, reducing useful discussion on pyrosequencing reads ...http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0044224
Methods and Apparatuses for Estimating Parameters in a Predictive Model for Use in Sequencing-by-Synthesis - Patent...
0085] FIG. 9 shows a sensor array 901 including a plurality of regions or subgroups 902 of reaction areas or wells. In an embodiment, each region or subgroup 902 includes at least two sets of reaction areas or wells 903 and 904. The sets of reaction areas or wells 903 and 904 may be physically distinguishable from each other (e.g., by shape, dimensions, or inclusion into some defined physical location such as wells that are to the left and right, respectively, of some dividing line or any other type of spatial classification such as a "checkerboard"-type arrangement, or any other physical attribute) and/or they may be distinguishable on the basis of some arbitrary classification that may depend on position or some labeling in software (e.g., "odd" wells versus "even" wells or any other type of label allowing differentiation between wells). In an embodiment, both region or subgroup 903 and 904 include a population of template nucleic acids. Each of reaction areas or wells 903 may contain a ...http://www.patentsencyclopedia.com/app/20140051584
Translating Genomics to the Clinic: Implications of Cancer Heterogeneity | Clinical Chemistry
The advent of next-generation sequencing (NGS)4 technologies, which grew exponentially in the decade after publication of the first iteration of the human genome sequence (4), has provided substantial insights into new genes and the biological processes that underlie cancer pathogenesis. These insights are outlined below. NGS technologies "parallelize" sequencing processes via high-throughput means to produce millions of short sequencing "reads" from amplified DNA clones (5). NGS is also referred to as "massively parallel sequencing," because the reaction steps occur in parallel with the detection steps and millions of reactions occur simultaneously (6). This parallelism makes it possible to read the same segment of a DNA sequence repeatedly to increase confidence in the sequence obtained for the targeted genomic segment. This multiple sampling of a genomic segment is referred to as the "coverage" of the sequencing run.. Before the NGS era, much progress had ...http://clinchem.aaccjnls.org/content/59/1/127.long
A scaling normalization method for differential expression analysis of RNA-seq data | Genome Biology | Full Text
The transcriptional architecture is a complex and dynamic aspect of a cell's function. Next generation sequencing of steady state RNA (RNA-seq) gives unprecedented detail about the RNA landscape within a cell. Not only can expression levels of genes be interrogated without specific prior knowledge, but comparisons of expression levels between genes within a sample can be made. It has also been demonstrated that splicing variants [1, 2] and single nucleotide polymorphisms  can be detected through sequencing the transcriptome, opening up the opportunity to interrogate allele-specific expression and RNA editing.. An important aspect of dealing with the vast amounts of data generated from short read sequencing is the processing methods used to extract and interpret the information. Experience with microarray data has repeatedly shown that normalization is a critical component of the processing pipeline, allowing accurate estimation and detection of differential expression (DE) . The aim of ...https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25
Publications | Microbial Ecology, University of Vienna
Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best ...http://dome.csb.univie.ac.at/publications?id=1083
A DNA sequence-based identification checklist for Taiwanese chondrichthyans
This dataset contains the digitized treatments in Plazi based on the original journal article Straube, Nicolas, White, William T., Ho, Hsuan-Ching, Rochel, Elisabeth, Corrigan, Shannon, Li, Chenhong, Naylor, Gavin J. P. (2013): A DNA sequence-based identification checklist for Taiwanese chondrichthyans. Zootaxa 3752 (1): 256-278, DOI: http://dx.doi.org/10.11646/zootaxa.3752.1.16 ...https://www.gbif.org/dataset/0486185a-4e5f-4536-8486-78389f2699d7
CSHL scientists develop new method to detect copy number variants using DNA sequencing technologies
Genome sequencing technologies are improving at a rapid pace. The current challenge is to find ways to extract all of the genetic information from the data. One of the biggest challenges has been the detection of CNVs. Sebat, in collaboration with Seungtai Yoon of CSHL and Kenny Ye, Ph.D., at the Albert Einstein College of Medicine, developed a statistical method to estimate DNA copy number of a genomic region based on the number of sequences that map to that location (or "read depth"). When the genomes of multiple individuals are compared, regions that differ in copy number between individuals can be identified.. The new method allows the detection of small structural variants that could not be detected using earlier microarray-based methods. This is significant because most of the CNVs the genome are less than 5000 nucleotides in length. The new method is also able to detect certain classes of CNVs that other sequencing-based approaches struggle with, particularly those located in complex ...http://www.innovations-report.com/html/reports/life-sciences/cshl-scientists-develop-method-detect-copy-number-138384.html
Publication - The Elizabeth H. and James S. McDonnell III Genome Institute at Washington University
Strategies for assembling large, complex genomes have evolved to include a combination of whole-genome shotgun sequencing and hierarchal map-assisted sequencing. Whole-genome maps of all types can aid genome assemblies, generally starting with low-resolution cytogenetic maps and ending with the highest resolution of sequence. Fingerprint clone maps are based upon complete restriction enzyme digests of clones representative of the target genome, and ultimately comprise a near-contiguous path of clones across the genome. Such clone-based maps are used to validate sequence assembly order, supply long-range linking information for assembled sequences, anchor sequences to the genetic map and provide templates for closing gaps. Fingerprint maps are also a critical resource for subsequent functional genomic studies, because they provide a redundant and ordered sampling of the genome with clones. In an accompanying paper we describe the draft genome ...http://genome.wustl.edu/publications/detail/a-physical-map-of-the-chicken-genome/
CRAM | Storage of high throughput DNA sequencing data using reference-based compression - OMICtools
A framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available for computational use. This compression method is tunable: The storage of quality scores and unaligned sequences may be adjusted for different experiments to conserve information or to minimize storage costs, and provides one opportunity to address the threat that increasing DNA sequence volumes will overcome our ability to store the sequences.https://omictools.com/cram-tool
Informed Consent for Whole Genome Sequencing: Ideals and Norms Referenced by Early Participants - Full Text View -...
Since 2007, the cost of sequencing a diploid human genome has fallen dramatically, from approximately $70 million to $20,000 (Illumina, 2010). As affordable sequencing platforms become more widely available, the advancement of biomedical science will draw increasingly on whole genome sequencing research requiring large cohorts of diverse populations (Lunshof et al., 2009; Need & Goldstein, 2009). Key policy, ethical and legal implications of these developments will need to be understood in order to promote the efficacy and effectiveness of genomic research going forward.. ,TAB,. In addition to information about well-understood regions of the genome both sought-after and incidental whole genome sequencing yields results of probabilistic, uncertain, and changing significance over indefinite periods of time. Sequence data is most useful when shared widely among investigators in conjunction with detailed clinical information (Angrist, 2010). It may have implications for individuals and families ...https://clinicaltrials.gov/ct2/show/NCT01369953?cond=%22Proteus+syndrome%22&rank=1