Transcript Discovery


Transcript discovery experiments aim to characterize the expression of both known mRNA transcripts in samples and unique and non-coding RNA species. Usually, discovery of novel transcripts is expected, unlike in transcript profiling which targets only mRNA.

Platform Information

For transcript discovery, the GSL uses the NuGEN Ovation RNA-Seq System, which creates cDNA from total RNA, followed by sequencing on the Illumina platform. Both mRNA and non-polyadenylated species are targeted through the use of specialized primers, which are designed to avoid amplifying rRNA species. After cDNA is created, the samples can be indexed such that many samples may be sequenced per lane if desired. However, for most standard RNA-Seq experiments with a goal of transcript discovery, GSL recommends running one sample per lane on a 50bp paired-end HiSeq run to generate approximately 80 million paired-end reads. The additional coverage will likely be necessary for true transcript discovery and increased confidence in the results. The following sequencing conditions are available: 36bp or 72bp (~25-40m PE reads); 50bp or 100bp (~80m PE reads).

Sample Submission Requirements

RNA-Seq samples from different projects are batched together in groups of 24 or more to allow indexing; so if fewer than 24 samples are submitted in a project, there may be a delay until additional samples fill the batch. Submission requirements are 100ng of total RNA in 10ul of 10mM Tris or Qiagen EB. If these conditions are not met, a fee of $25/sample will be charged. For best results, RNA should be free of contaminants and intact, with a Bioanalyzer RIN number of 7 or higher. Standard sequencing conditions for transcript discovery via RNA-Seq are 1 sample/lane of 50bp paired end HiSeq run to yield ~80 million paired end reads per sample. Additional sequencing or less sequencing can be accommodated by special request.

GSL RNA-Seq Data

All demultiplexing (i.e. the sorting of indexed reads) is included in the cost of basic RNA-Seq, with users receiving fastq files for each sample. Additional fees are charged for analysis, including alignment. The current GSL pipeline for RNA-Seq analysis can include alignment with Tophat to standard organisms (human, mouse, rat) and differential gene expression analysis with Cufflinks. If a non-standard organism is sequenced and analysis help requested, an additional fee applies is charged to cover manipulation and installation of a new genome.

Please note that analysis of RNA-Seq data, even provided at the 'finished' level of Cufflinks output, is still a work in progress and very computationally heavy. It requires that end users be willing to delve into the provided output and interpret the results rather than simply relying on a fold-change Excel spreadsheet of differentially expressed genes. This is especially true for transcript discovery RNA-Seq experiments, in which annotation will be spotty or even non-existent for many of the RNA species identified.

©2010-11 HudsonAlpha Institute for Biotechnology
genomics@hudsonalpha.org