Advancing genomic research: the power of high-quality reference genomes
The rapid advancement of genome sequencing technology has revolutionized our understanding of life at the molecular level. A cornerstone of this revolution is the development of high-quality reference genomes. These critical genomic tools serve as fundamental blueprints for studying genetic variation, identifying disease-causing mutations, and exploring evolution.
A reference genome is a comprehensive map of an organism’s genetic code. It provides a standardized reference point for comparing and analyzing the genomes of other individuals of the same species. By aligning individual genomes to a reference, scientists can identify variations, such as single nucleotide polymorphisms (SNPs) and structural variants, that may contribute to phenotypic differences or disease susceptibility.
In plant biology, reference genomes are important for identifying genes associated with traits such as disease resistance, yield, and quality. They also facilitate marker-assisted selection, a technique that allows breeders to select plants with desirable traits more efficiently.
Creating a reference genome from scratch is a complex and computationally intensive task. It involves sequencing an organism’s entire genome and then assembling the resulting DNA sequences into longer, contiguous sequences. This process requires sophisticated algorithms and powerful computing resources.
HudsonAlpha scientists have long been at the forefront of genome sequencing and assembly. They have made significant contributions to the field by generating high-quality reference genomes for a wide range of organisms, including plants, animals, and microorganisms. Below are some of the noteworthy reference genomes recently published by HudsonAlpha scientists.
Cotton
The global cotton industry is a cornerstone of the textile industry, providing employment and income for millions worldwide. Cotton breeders have improved fiber yield and quality over the years using traditional breeding methods. However, achieving additional improvements, like drought tolerance and resistance to emerging pests and diseases, may be difficult due to the lack of genetic variation across modern domesticated cotton. Creating new genomic tools for the cotton industry will help take cotton improvements to the next level.
Researchers from the HudsonAlpha Genome Sequencing Center (GSC) created high-quality genome sequences for three key cotton varieties, providing invaluable tools for breeders to develop more resilient and productive crops. These new reference genomes offer a deeper understanding of cotton’s genetic makeup, enabling the identification of genes associated with traits like yield, fiber quality, and disease resistance.
By leveraging these genomic resources, breeders can accelerate the development of cotton varieties that can withstand changing climates, pests, and diseases, ensuring a sustainable future for this vital crop. This groundbreaking research demonstrates the power of genomics to address global challenges and drive agricultural innovation.

Sugarcane
Sugarcane is an important global crop that faces increasing challenges from climate change and emerging pests and diseases. Traditional breeding methods have created new varieties of sugarcane that can grow in new environments and survive some pathogens; however, genome-directed breeding could help speed up the process of creating new varieties of sugarcane that can thrive in our changing world.
Until recently, sugarcane breeding could not benefit from genome-directed breeding because genomic tools did not exist. In 2024, the GSC, along with numerous international collaborators, released a high-quality reference genome for a common variety of sugarcane called R570. The GSC has a long track record of sequencing complex plant genomes, but the sugarcane genome is the most complicated genome they’ve assembled to date, having, on average, 12 copies of each chromosome and a total of 114 chromosomes with highly repetitive regions.
Having a high-quality reference genome is a game changer for the sugarcane industry. Using the genome, scientists have already discovered two genes that protect sugarcane from brown rust disease, a notorious foe for sugarcane breeders and farmers. The reference genome will help accelerate sugarcane breeding and the adaptation of sugarcane to our changing environmental conditions.

The Power of Pangenomes in Agricultural Breeding Programs
Genomics has revolutionized crop breeding by providing a deeper understanding of plants’ genetic makeup. This knowledge is crucial for ensuring global food security by increasing crop productivity, reducing reliance on pesticides, and adapting to the changing needs of a growing population. By analyzing the complete set of an organism’s DNA, scientists can pinpoint specific genes responsible for desirable traits like drought tolerance, disease resistance, and increased yield.
Historically, scientists relied on single reference genomes to represent entire species for comparative analysis when looking for genetic variation. However, this approach can mask significant genetic variation, so many researchers have turned to pangenomes, which provide a more comprehensive view of genetic diversity.
A pangenome is a collection of genes and genetic elements present within a species. It encompasses both the core genome, which is shared by all individuals of the species, and the accessory genome, which contains genes present in only some individuals. By analyzing pangenomes, scientists can gain insights into a species’ evolutionary history, identify genes associated with specific traits, and develop new breeding strategies.
HudsonAlpha scientists are at the forefront of pangenome research, particularly in plant genomics. They have made significant contributions to our understanding of genetic diversity in crops like barley, switchgrass, and millet.
Incorporating pangenomics into crop breeding programs
HudsonAlpha Faculty Investigator Josh Clevenger, PhD, is helping crop and animal breeding programs worldwide incorporate pangenomics into their breeding strategies. When breeding a new variety of plants or a new breed of livestock, breeders often begin with a small group of individuals selected to serve as the genetic basis for the new varieties, called a founder population.
Pangenomes are especially helpful in breeding programs based on founder populations. They can reveal genes that are present in some founders but absent in the reference genome. These accessory genes might be crucial for adaptation or specific traits, and a pangenome ensures they are not overlooked. Pangenomes also help capture larger-scale genetic differences, such as insertions, deletions, and rearrangements of DNA. These structural variations can have significant effects on traits and are often missed by traditional methods.
Using their proprietary data analysis platform Khufu, the Clevenger team generates custom genomic tools for breeding programs, including pangenome graphs (KhufuPAN). With the pangenome graph, breeders can make more informed decisions about the selection, breeding or mating, and management of their founder populations, leading to more rapid genetic gain and the development of superior cultivars or livestock breeds.
The KhufuPAN technology allows for a more comprehensive and accurate analysis of genetic variation, which can benefit breeding programs. Clevenger and his team offer pangenome analysis, helping to reveal novel genetic markers that may be associated with important traits and identify and correctly call genetic variants, even in complex regions of the genome.
So far, Clevenger and his lab have built pangenome graphs for collaborators for chicken, grape, hemp, pumpkin, watermelon, blueberry, oak, and peanut breeding programs. Through a collaboration with the Groundnut Improvement Network of Africa, Clevenger and his lab are also creating pangenomic resources for peanut breeders across the continent of Africa.
Wiregrass Peanut Project
Dr. Clevenger doesn’t just offer pangenomic services to collaborators, he is also using pangenomics as the basis for a peanut breeding program he leads in Dothan, Alabama. The Wiregrass Peanut Project is one of the first known breeding programs in the world to base breeding decisions on founder pangenomics.
A diverse set of peanut individuals was used to start the Wiregrass Peanut Project breeding population. Clevenger and his team created a pangenome for this founder population. Now, when the students participating in the project sequence their individual peanut plant, they can compare the genomics back to the pangenome instead of a standard reference genome.
This higher-resolution comparison allows the students and the Clevenger lab to identify useful genomic variants that could confer benefits to peanut plants specific to the Wiregrass region, such as drought tolerance and disease resistance.
