BioGenomics2017 - Global Biodiversity Genomics Conference
February 21-23, 2017
Smithsonian National Museum of Natural History | Washington, D.C.

Program - Single Session


[Back to Session Listing]

5
Genome Sequencing Technologies I

Room: Salon 2, Marriott Hotel

09:00 - 10:30

Moderator: Adam Felsenfeld, National Human Genome Research Institute



5.1  09:10  Linked-Reads enable efficient de novo, diploid assembly. Church DM*, 10x Genomics; Catalanotti C, 10x Genomics; Herschleb J, 10x Genomics; Kumar V, 10x Genomics; Shah P, 10x Genomics; Weisenfeld N, 10x Genomics; Schnall-Levin M, 10x Genomics; Jaffe D, 10x Genomics

The determination of a reference sequence for the human genome fundamentally changed the way we approach studying human health and development. An important lesson from the past decade of research is that generating a single haploid consensus assembly for diploid organisms can lead to both assembly errors as well as limited representation of biologically important sequences. Reconstruction of accurate, individual haplotypes provides a more complete picture of a genome. However, haplotype reconstruction of diploid genomes using cost effective, accurate short reads remains challenging. We describe a novel approach for the de novo assembly of individual mammalian genomes, requiring only small amounts (0.5-1.25 ng) of input DNA to construct a single library. We have developed a microfluidic system that allows for the high-throughput partitioning of high-molecular weight DNA. Unique barcodes are applied within each partition, allowing for the retention of long-range information using short read sequencing, creating a data type called Linked-Reads. The Supernova' De Novo Assembler takes advantage of Linked-Reads to perform de novo diploid assembly. Heterozygosity within the sample, coupled with molecular barcodes, allows for the separation of these scaffolds into their distinct haplotypes, referred to as phase-blocks. The reconstruction of individual haplotypes, rather than a haploid consensus sequence, allows for a more complete, and accurate representation of the sample. We demonstrate the performance of this process on several human genomes of diverse ethnic origin. As some of these samples are members of well characterized trios, we can validate accuracy of the phase information using orthogonal data. We will also show performance on a variety of other genomes including hummingbird, dog and olive fly.


5.2  09:30  Accurate and phased $100 de novo genome sequencing using low cost co-barcoded reads generated on DNA nanoball arrays. Drmanac R*, Complete Genomics

Our extremely massively parallel DNA sequencing process using genomic nanoarrays (Drmanac et al, Science, 2010) continues to advance our ability to efficiently read DNA. Recent improvements in making patterned DNA Nano-Ball (DNB) arrays provide strong signal for long (200-400 bases) reads without amplification biases and errors. These nanoarrays combined with novel powerful cameras and fast scanning stages allow to generate sequence data with extreme efficiency. Using this technology BGI is building the world largest sequencing facility in China National Gene Bank in Shenzhen. Our Long Fragment Read (LFR) technology generates co-barcoded reads for each 30kb-300kb long genomic DNA fragment allowing separate (phased) assembly of parental chromosome sequences (Drmanac, Nature 2012). Furthermore, using co-barcoded reads, accurate WGS including detection and phasing of de novo mutations (~1 error per Gb, ~6 errors per genome) is now achievable from ~10 cells (Drmanac, Genome Research 2015). LFR"s co-barcoded reads also allow phased de novo genome assembly (Peters, Drmanac, Frontiers in Genetics, 2015). We are now implementing an inexpensive single tube LFR process with >2M barcodes to enable "perfect" (close to 100% complete and accurate) $100 de novo WGS when combined with our advanced massively parallel sequencing process.


5.3  09:50  De Novo PacBio Long-read Assembled Vertebrate Genomes Correct and Add to Genes Important in Conservation Research. Korlach J*, PacBio; Gedman G, Rockefeller University; Howard J, Rockefeller University; Baybayan P, PacBio; Kingan S, PacBio; Hall R, PacBio; Gu J, PacBio; Robertson B, University of Otago; Digby A, University of Otago; Ryder O, San Diego Zoo Institute; Jarvis ED, Rockefeller University

To test the impact of high-quality genome assemblies on conservation research, we applied PacBio long-read sequencing in conjunction with the new, diploid-aware FALCON-Unzip assembler to a number of vertebrate species, including both non-threatened and critically endangered species. All PacBio de novo genome assemblies had contiguities in the megabase range (contig N50s ranging between 5.4 and 7.7 Mb), representing an improvement of over two orders of magnitude over previous short-read based assemblies. Gapless, allele-resolved contigs of this size range translated into the resolution of thousands of gaps present in previous assemblies, correction of erroneous sequence flanking those gaps, correction of misassemblies in previous assemblies and resolution of complex repeat structures, as well as allelic differences between the two chromosome haplotypes. For the first time, we were able to assemble the complete structure of many genes critical in conservation research. These findings demonstrate the impact of higher-quality, phased and gap-less assemblies vs. fragmented, incomplete scaffold-based assemblies in conservation research.


5.4  10:10  Promising prospects of Oxford nanopore sequencing for chloroplast genomics. Sauvage T*, Smithsonian Marine Station at Fort Pierce; Schmidt W, University of Louisiana at Lafayette; Paul V, Smithsonian Marine Station at Fort Pierce; Yoon H, Sungkyunkwan University; Fredericq S, University of Louisiana at Lafayette

Eukaryotic algae represent low complexity metagenomes that include chloroplast, mitochondrial, and nuclear genomes, as well as bacterial chromosomes from associated prokaryotes (epiphytic, and endophytic bacteria in some cases). Among these genomic compartments, chloroplasts are generally the most abundant in the cell and the most sought after for the characterization of algal species in the context of biodiversity (evolutionary and ecological), biomedical (i.e. metabolite-producing taxa) and biofuel studies. While most de novo studies conducted to date on the MinION platforms have focused on the sequencing of megabase bacterial chromosomes (i.e. >1 Mb), the prospect of this technology to assemble smaller molecules from metagenomes, such as chloroplast genomes (~0.1-0.5 Mb) in the eukaryotic cell, has not been assessed. In the present study, we undertook testing of this platform to sequence long reads and assemble de novo the chloroplast genome of the marine green algal holobiont Caulerpa ashmeadii (Bryopsidales, Ulvophyceae), which preliminary short read assembly failed to resolve. Thanks to long nanopore reads, we identify ORF and Intron presence-absence polymorphism within a 30kb region and resolve C. ashmeadii's chloroplast genome.




[Back to Session Listing]