The Evolution of DNA Sequencing Technology
Part 3: Third-generation sequencing and beyond
by: Sarah Sharman, PhD, Science writer
Although next-generation sequencing greatly improved the ability to sequence large, complex genomes at a low cost, it still has some limitations. Next-generation sequencing platforms can sequence a lot of DNA at once, but they typically produce short DNA reads of about 50 to 500 base pairs in length (remember the whole human genome has 3 billion bases ). Other steps in next-generation sequencing, like template amplification, also lead to errors and information loss.
In Part 3 of The Evolution of DNA Sequencing Technology blog series we will discover how scientists are still improving sequencing technology, and we will discuss how DNA sequencing impacts the world around us.
Sequencing a single-molecule at a time
To address the limitations of next-generation sequencing, scientists began developing real-time, single-molecule DNA sequencing platforms. In 2009, two vastly different single-molecule sequencing technologies emerged— single-molecule real-time sequencing (SMRT-seq by Pacific Biosciences) and nanopore sequencing (Oxford Nanopore Technology).
Pacific Bioscience’s SMRT platform relies on the accuracy and speed of DNA polymerase, the enzyme that adds bases to growing strands of DNA. The bottom of each well of the SMRT platform has polymerase attached to it. As labeled nucleotides are introduced into the well, polymerase adds them to a single-stranded piece of DNA. Since each base is labeled with a different color, scientists can tell which letter was added to the DNA strand in real time.
Nanopore sequencing, like Oxford Nanopore Technology’s MinION platform, is based on the fact that each nucleotide base is a different size and has different electrical properties. The wells of the machine measure the electrical current changes that occur when single-stranded DNA pass through tiny nanopores on the surface. Each base has its own electrical signature that the machine measures and records in real time.
Long-read sequencing has a number of advantages over next-generation sequencing, most notably, its ability to produce long reads of up to tens of thousands of bases. Long reads contain more information compared to short reads produced by other platforms. The human genome has many complex regions, such as stretches of repeated bases, that are hard to interpret. With short reads of 200 bases or less, it is hard to tell where the repetitive region starts and stops. However, long reads that include the entire repetitive region can tell you where the repeats start and end.
Because long-read sequencing uses native, or unprocessed, DNA, scientists can also detect DNA epigenetic modifications like methylation that change the function of DNA without changing its sequence. Longer reads are valuable for certain applications like de novo sequencing and rare variant detection. (To learn how scientists at HudsonAlpha have begun incorporating long-read sequencing technology into their sequencing arsenal, see Research Highlights #3 and #4.)
Even though third-generation sequencing platforms provide invaluable long reads, they are still very expensive. Scientists and engineers are actively working to improve these technologies so that one day we might have a sequencing platform that produces highly-accurate, long reads at a fraction of the cost they are today.
How does DNA sequencing impact you?
Now that you know about the evolution of DNA sequencing technology, you may be asking yourself ‘How does this technology affect me?’. Because DNA is the basis for much of life on earth, understanding the sequence of a plant or animal’s DNA can provide valuable information about health and disease, development, ancestry, evolution, the environment, and much more.
In the decades since the development of the first rapid DNA sequencing technologies, scientists and clinicians have used DNA sequencing to make monumental discoveries across many fields of science— from drug discovery to cancer research, rare disease diagnosis to environmental studies, agricultural research to infectious disease research.
At the HudsonAlpha Institute for Biotechnology, scientists use DNA sequencing technology and the genomic sciences to improve the human condition around the globe. The scientists at HudsonAlpha are constantly making meaningful contributions to their fields—in fact, it would probably take days to cover them all. In the next few sections, we will discuss a few of the real-world applications of DNA sequencing technology.
Human health and disease research
Prior to the completion of the first human genome, progress into understanding the cause of many complex diseases and disorders was slow. Equipped with the human genome sequence, scientists are now rapidly identifying specific DNA variants that can impact the development of symptoms or disease, increase your risk of developing disease, or affect how your body responds to the foods you eat, environmental exposures like sunlight and pollution, infectious agents, therapeutics and medications.
Discoveries in genetic research, made possible in part by DNA sequencing technology, have unearthed tremendous opportunities in the diagnosis, detection, and treatment of countless diseases and disorders. The genetic basis for hundreds of single-gene disorders have been discovered. Now, scientists are hard at work trying to identify which genes contribute to conditions like obesity and heart disease that arise due to more than one gene variant.
Scientists at HudsonAlpha apply genomic technology to uncover the genetic causes of a variety of diseases including cancer, psychiatric disease, neurological disorders, childhood diseases, and autoimmune diseases. Identifying disease-causing genetic variants allows the scientists to find or develop specific treatments for the disease. For example, HudsonAlpha Faculty Investigator Sara Cooper’s lab focuses on identifying genes associated with chemotherapy resistance in pancreatic and ovarian cancer. They are collaborating with HudsonAlpha resident company CFD Research to develop drugs to target some of the chemotherapy resistance targets they have found.
Insight into disease-causing genetic variants, coupled with advanced genomic sequencing technology also allows scientists and clinicians to detect and treat some genetic diseases early in disease progression. As an example, Faculty Investigators Richard M. Myers, PhD, and Devin Absher, PhD, discovered biomarkers that are very strong predictors of renal cell carcinoma, the most common type of kidney cancer in adults. Myers’ lab is currently looking for predictive blood biomarkers for several neurodegenerative diseases.
Today, scientists and physicians recognize the potential of genomics to revolutionize medicine by giving physicians the ability to administer personalized healthcare based on an individual’s genome. Genomics enhances physicians’ ability to determine the causes of inherited diseases and advance the understanding of complex disorders.
Genomic testing can provide insights into disease risk and intervention, help make better health decisions, and even target therapies that lead to better health outcomes. Single gene testing can be performed if a physician believes a patient has a specific inherited condition, or when there is a known genetic mutation in a family. A panel genetic test looks for changes in many genes in one test. These are usually grouped in categories based on the medical concern, like breast cancer or epilepsy.
Exome and genome sequencing look at all of the genes in protein coding DNA (exome) or your entire genetic code (genome). Oftentimes large scale genetic testing like exome or genome sequencing are used when the patient has an undiagnosed disease that has not been solved using other types of clinical tests. The Smith Family Clinic for Genomic Medicine is a stand-alone medical office residing on the HudsonAlpha campus that integrates whole genome sequencing into the routine practice of genetics, something that not many clinics are doing.
The genetic tests detailed above are all generally ordered by a physician that is trying to diagnose disease or provide preventative care for a patient with a family history of disease. However, because of the falling cost of DNA sequencing technology and the new insight into the role of DNA variants on health and disease, entrepreneurs also market genetic testing to customers who are not necessarily ill or at high risk for a disease, but who are looking to learn a little more about their genome.
This growing industry, called direct-to-consumer genetic testing, became popular in the 2000s with 23andMe and deCODEme offering direct-to-consumer personal genetic tests beginning in 2007. Customers send the company a DNA sample (usually saliva) and receive results directly from the company. By looking for specific genetic variations, the direct-to-consumer genetic testing companies offer predictions about health, provide information about common traits, and offer clues about a person’s ancestry.
Rare disease diagnosis
Genomic sequencing technology also benefits patients with rare diseases. Although each rare disease affects a small number of people (as suggested by their name), rare diseases as a whole affect over 350 million people. These patients with rare disease often embark on years long diagnostic journeys, visiting doctor after doctor and often receiving misdiagnoses.
In many cases if the disease is genetic in nature, whole genome sequencing can help avoid this long and painful path to diagnosis. By identifying the specific changes to the patient’s DNA that are causing the disease, clinicians can often provide the patient with a diagnosis.
Scientists at HudsonAlpha use whole genome sequencing as a routine part of many research studies. As part of The Clinical Sequencing Exploratory Research (CSER) program, HudsonAlpha faculty researchers are working with physicians, scientists and genetic counselors at partner institutions to sequence the genomes and exomes of hundreds of North Alabama children and their families with the goal of providing a diagnosis to the children.
SouthSeq is a collaborative research effort among researchers and clinicians at HudsonAlpha, the University of Alabama-Birmingham, and the University of Mississippi at Jackson that tries to find the reason for medical problems among newborns in neonatal intensive care units. By making diagnoses earlier in the child’s life, doctors can provide better care and treatments to them.
Through these research projects, scientists at HudsonAlpha and their collaborators have sequenced more than 1,467 affected children and approximately 1,511 parents. They have found genetic cause for about 27 percent of the affected children, leading to a more precise and definitive clinical diagnosis.
The applications of genomic sequencing technology reach wider than human health discoveries. Genomics can also help to improve the foods we eat, the energy we use, and the clothes we wear. Plants produce the food, fiber and fuel that sustain our lives. However, the demand for crop production is rising due to the increasing human population worldwide, greater food consumption, and the rise of biofuel use.
To keep up with demand, growers are always looking for new varieties of a crop that can better withstand drought, pests, and pathogens, while still producing a high-quality product. HudsonAlpha applies the techniques of genomic research to plants and agriculture with the ultimate goal of accelerating discoveries in crops, improving sustainable crops, and developing new scientific methods that will change the way we grow and use plants in agriculture.
By combining their expertise in evolutionary biology, genome editing, and plant genomics, scientists at the HudsonAlpha Center for Plant Science and Sustainable Agriculture created a pipeline to accelerate plant breeding programs to develop the next generation of crops. This pipeline relies heavily on the strong foundation of the high-quality genome sequencing generated at the Institute.
Advances in computational biology and plant biotechnology allow HudsonAlpha faculty to identify key genes related to important crop traits such as increased yields, drought tolerance, or pest resistance. The beneficial traits can be introduced into existing crop lines using accelerated breeding programs or precision genome editing. Current research focuses include peanuts, cotton, the common bean, barley, poplar trees, pecans, duckweed, and bioenergy crops like switchgrass, miscanthus, and sugarcane.
By using the ever-advancing next-generation and third-generation sequencing technologies, scientists at the HudsonAlpha Genome Sequencing Center (GSC) generate complex plant genomes. Many people do not know this but some plants have genomes that are much larger than the human genome—the loblolly pine, for example, has a genome size that is over 20 billion bases. As of early 2021, the GSC has sequenced reference genomes for more than 175 plants—approximately half of the plants sequenced as high-quality references worldwide. Reference genomes serve as a point of comparison for future studies and lay the foundation for downstream functional studies for the improvement and production of domesticated crops.
On the human health side, several HudsonAlpha labs, including that of Faculty Investigator Greg Cooper, PhD, are using long-read sequencing technology for human disease research. In a recent study, Cooper’s lab used long-read sequencing technology to help physicians make diagnoses for two pediatric patients affected by neurodevelopmental disorders.
The team used long-read sequencing to re-analyze the genomes of six family trios (mom, dad, and the affected child) that each had children affected by neurodevelopmental disorders. Although the families had each previously had their genomes sequenced with short-read technology, no causal genetic variant had been identified. They found thousands of genetic variants in each family that had previously been missed. Among them, the team identified likely pathogenic variation in two of the six children.
Scientists in the HudsonAlpha Center for Plant Science and Sustainable Agriculture are leaders in de novo sequencing, or the process of finding the DNA sequence of a species that has never been sequencing before. Because the de novo genomes are constructed without using a reference genome template, long-read sequencing technology helps the researchers re-assemble the genomes more accurately than short-read technology.
Recently, several HudsonAlpha scientists, along with collaborators at a number of other institutes, used long-read sequencing technology to improve a genome they had been sequencing for well over a decade. The new and improved switchgrass genome allows the scientists to spot regions in the genome that are associated with important traits like cold tolerance. Once they identify these regions, breeders can use them to develop new strains of switchgrass, which is a promising biofuel crop.