Whole Genome Sequencing


The cost and effort to resequence an entire mammalian genome is rapidly dropping, but remains too high for most applications. Most mammalian researchers choose to balance these costs by sequencing a subset of the genome. For a smaller genome, however, whole genome sequencing can be an attractive solution. As a rule of thumb, de novo assembly should be undertaken primarily using long reads. For genomes with a reference available, short reads will usually work just as well.

Platform Information

The GSL offers Illumina, 454, and SOLiD sequencing, any of which may be appropriate for whole genome sequencing depending on the size of the genome and the goals of the experiment. It is possible to index samples to allow multiple samples to run per lane (or plate, etc).

PlatformRead TypeSample UnitUnits/RunExpected ReadsRead Length
Illumina HiSeq 2000shortlane16200 million50 or 100bp
Illumina Genome Analyzer IIxshortlane825 million36 or 72bp
Applied Biosystems SOLiD 4shortslide1400 million50bp
Roche 454 Genome Sequencer FLX Titaniumlongregion2500 thousand400 - 600bp
Ion Torrent PGMlongchip1190 thousand~100bp

The "Sample Units" in this table are the most natural way to run a single sample on a given platform, they are not necessarily equivalent in terms of cost. The number of expected reads are for a single-end sequencing run. In paired-end or mate-pair mode, the number of reads will double.

Sample Submission Requirements

A total of 3µg genomic DNA (or high-quality WGA DNA) should be provided in 55µl of 10mM Tris, Qiagen EB, TE, or dH2O. As little as 2.5µg can be used if sample quantities are limited and methods of amplification are also available for very small input of gDNA. No special treatment is needed for the DNA samples provided that they were NOT isolated from whole blood collected in Heparin tubes. Any other standard DNA isolation method that results in clean, intact, high molecular weight DNA is appropriate. If possible, please send 5µg of sample as we find users' methods of quantitation tend to over-estimate amounts by as much as 30-50%. The GSL will NOT pool samples if additional sample is needed to perform the assay.

GSL Whole Genome Sequencing Data

A quality assessment of the sequencing run and alignment of reads by BWA to a standard genome (human, rat, mouse, Drosophila) is included in the cost. Due to the size of sequencing data, if an alignment is done, only .bam files (and the .bai or bam index file) are provided as results. Fastq files are not also provided due to the redundancy of data. Users wishing to perform their own analysis of the raw data can easily re-generate the original fastq files using the bam2fastq program.

©2010-11 HudsonAlpha Institute for Biotechnology
genomics@hudsonalpha.org