In his nearly eight years working in the peanut breeding and genetics sectors, HudsonAlpha Institute for Biotechnology faculty investigator Josh Clevenger, PhD, has grown passionate about improving crops for more robust and sustainable agriculture. This led to a collaboration with HudsonAlpha computational biologist, Walid Korani, PhD, and the creation of a computational tool called Khufu to quickly and accurately identify and analyze variants in such complex genomes.
“I wanted to bridge the gap between science and nature by more rapidly introducing beneficial traits into cultivated crops that farmers can plant on their land,” Clevenger explained. To do this, his team at HudsonAlpha has developed a better computational tool to help identify genetic factors to select for beneficial traits, and new, rapid breeding practices to introduce these traits into existing crop lines.
In order to map traits to genes, the DNA sequences of the plants being studied must be lined up against a reference genome. When focusing on complex plant genomes, it is hard for software to map short DNA reads to a reference genome and accurately identify molecular markers like single-nucleotide polymorphisms (SNPs) that correlate with an observed trait.
After years of struggling with identifying SNPs in peanuts, Clevenger joined with Korani to develop a solution, the new computational tool Khufu (www.hudsonalpha.org/khufudata/plant-improvement).
Khufu uses low coverage, short-read sequencing data to provide genotyping results at a fraction of the normal cost. Using a novel approach, Khufu provides extremely accurate SNP identification using very low coverage sequence data. Because each individual requires less sequence, the cost of whole genome sequencing is practical even in small breeding programs. The availability of genome-wide markers greatly increases the power and precision of trait mapping and integration and delivers these tools to a wider array of breeders and geneticists.
”Collaborators can expect fast results, with analyses being done within days of the sequence being generated. Khufu is not a data generation service, but a data analysis service that can provide sequencing for a low cost or can analyze generated sequences for different applications,” says Clevenger. “For a similar cost to the raw output of SNP arrays, Khufu provides full analysis of genotypes, provides marker targets for traits of interest, and saves valuable time. Khufu is a highly accurate informatics platform that outperforms published methods.”
Identifying SNPs from Illumina™ low-depth short-reads plant samples is challenging since most plant genomes are polyploids and have big chunks of repeated regions. Even using hard filtering approaches causes a lot of informative SNPs to be lost.
“Khufu created a series of algorithms that efficiently extract falsely-identified SNPs, making it 99.9 percent accurate at identifying SNPs correlated to a given trait in both plant and animal populations. In addition, Khufu utilizes computational resources very efficiently to significantly speed up the calling process,” Korani explains.
This tool was originally developed for Clevenger and Korani’s work with peanut populations but through re-analyzing other large datasets from previous studies, it became clear how well this tool worked across a variety of species and populations. Khufu is effective in genotyping small or large numbers of plants, regardless of the population structure or the genome complexity. (https://www.biorxiv.org/content/10.1101/2021.03.13.435236v1).
“We hope that by offering other researchers the ability to use this low-cost, highly accurate computational software, we can help to advance genomic research in many different fields,” Clevenger said.