A multi-laboratory collaboration to improve plant gene function descriptions across diverse plant species
A conversation with HudsonAlpha researcher Avinash Sreedasyam, PhD
by: Sarah Sharman, PhD
All life depends on the flow of genetic information from DNA to functional molecules in a cell. Transcription is a central step in the process, transforming DNA into RNA molecules that serve as templates for protein synthesis. Proteins are the workhorses of the cell, performing essential functions in cellular structure, metabolism, signaling, and regulation. The transcriptome is the complete set of RNA molecules transcribed from all the genes within a cell or tissue at a specific time or under specific conditions. It represents the entire collection of expressed genes and their RNA products in an organism.
In plants, the transcriptome is dynamic and varies greatly depending on factors like plant species, tissue type, developmental stage, and responses to environmental stimuli or stressors. For example, different genes may be expressed in plant roots compared to leaves, and gene expression levels may change during plant growth or in response to temperature changes, water availability, or pathogen attacks.
Studying the plant transcriptome is crucial to understand gene function, gene regulation, and the molecular mechanisms underlying various physiological processes and responses to environmental cues. Modern technologies, such as RNA sequencing (RNA-seq), have revolutionized transcriptome analysis, enabling researchers to explore the expression patterns of thousands of genes simultaneously and gain insights into the complex biology of plants.
One project spanning 15 years and more than 17 research groups set out to characterize the transcriptome in over a dozen plants. To learn more about the project, called JGI Plant Gene Atlas, I sat down with Avinash Sreedasyam, PhD, a senior scientist in the HudsonAlpha Genome Sequencing Center (GSC), who led the work recently published in Nucleic Acids Research.
Sarah Sharman: What is the JGI Plant Gene Atlas?
Avinash Sreedasyam: JGI Plant Gene Atlas is a huge updateable transcriptome resource spanning diverse plant species. It was developed to improve plant genome annotations at the US Department of Energy (DOE) Joint Genome Institute (JGI)*, a national user facility located at Lawrence Berkeley National Laboratory, and add additional gene function descriptions. This resource also helps in performing cross-species comparative transcriptomics.
*The HudsonAlpha Genome Sequencing Center works with scientists at the JGI a lot. In fact, HudsonAlpha Faculty Investigator Jeremy Schmutz is the Plant Program Lead at the JGI.
Sarah: Why is a resource like this important to the field of plant genetics/genomics?
Avinash: Having a better handle on the gene function helps identify the molecular targets for plant improvement. Surprisingly, about 16 to 56 percent of plant genes are poorly characterized, meaning they have no known function. This is due to the overreliance on a few species like Arabidopsis or rice as homology models for computational function predictions and also due to the inability to link experimental evidence across species. Centralized databases with large-scale transcriptome projects, such as Expression Atlas and Plant Public RNA-seq Database (PPRD), could help with understanding gene functional roles, but the experimental inconsistency makes interpretation and integration across studies difficult. Our Plant Gene Atlas resource addresses that by providing standardized experimental conditions, tissue types, and analytical protocols that permit gene expression analysis across plants and add additional experimentally derived biological roles to genes.
Sarah: How many plants did you all look at?
Avinash: We started off with 12 plants, which are JGI flagship plants, mostly related to biofuels and feedstocks. And then we expanded that to include six more species, so in total, we are looking at 18 different species, which included over 2000 RNA Seq libraries. As I previously mentioned, this is an updateable resource. To demonstrate that, we included datasets from two species, one of which is sweet sorghum Rio from a JGI Community Science Program project and another is Lupinus albus from a non-JGI project.
Sarah: I assume you were not doing this alone. How many groups were you working with?
Avinash: In the initial planning phase, there were about 10 groups that came together to standardize the experimental protocols. Then, in 2020, seven more groups joined the project to contribute to the data for six new species. I must mention that some of the members from the initial team contributed additional sample sets, such as sorghum internode time course data from 4 different genotypes by Dr. John Mullet; Dr. Tom Juenger and his postdoc Xiaoyu Weng from Univ. of Austin, who previously led work on Arabidopsis and Panicum hallii, contributed panicle time course data from two ecotypes of Panicum hallii and multiple switchgrass (Panicum virgatum) experimental data.
Sarah: How important is collaboration in the big data and plant science community?
Avinash: Having collaboration within the plant science community is of utmost importance. The scope of handling 18 different species is beyond the capacity of a single lab. Growing different species poses significant challenges, as establishing standardized growth protocols demands time, and subjecting them to diverse conditions is a time-consuming process. Successfully accomplishing this requires a diverse team of specialists, each contributing their expertise to different aspects of the project.
Sarah: You all hit a big milestone getting the manuscript describing the atlas published in Nucleic Acids Research. Does this mean the project is over, or will you continue to update it as more plants are studied?
Avinash: JGI puts huge efforts into generating reference genomes for new species allowing it to fill gaps in the under-sampled area of the plant phylogeny and improving genome annotations. For that undertaking, new transcriptome datasets are generated through Community Science Program funding calls. JGI has research funding grants aimed specifically at “Gene function” and “Functional genomics.” We will keep updating the Gene Atlas with curated datasets from CSP projects and aim to improve plant gene function descriptions.
Sarah: Have you all learned anything from the data yet? Are there any specific examples you can share?
Avinash: Yes, I will mention two here. The main purpose of this project is to understand the gene function and add additional biological information. As I’ve said earlier, 16 to 56 percent of plant genes are poorly characterized. So the first thing we aimed for was to understand the functions of genes across the investigated plants. We did so by analyzing this huge sample set specifically using results from tissue and condition-specific expression groups, differential expression, co-expression network analysis, and ortholog function descriptions from nearest phylogenetic neighbors. Our pipeline allowed us to add expression-derived additional biological information to an average of 40 percent of genes across Gene Atlas plants. Comparing orthologs among common gene sets between species allowed us to pinpoint and rank biologically relevant and evolutionarily conserved genes that could be potential future targets for functional genomic studies.
We also looked at the cross-species comparable study, where plants were subjected to three nitrogen sources (urea, ammonium, and nitrate) as the sole nitrogen source. We looked at the plant’s response in the aboveground and root tissues. The striking thing we found was related to tissue-specific gene expression variation within genotypes. The root transcriptome was more responsive than aboveground tissues in all studied plants except Arabidopsis. We also observed that treatment with nitrate versus urea showed nitrogen and amino acid-specific metabolic pathways were overrepresented in the nitrate-subjected plants but not the ammonium. These results highlight differences in plants’ response to nitrate compared to ammonium as the sole nitrogen source at the metabolic level.
Sarah: Can just anyone use Plant Gene Atlas, or do you have to be a subscriber?
Avinash: It has been publicly available since 2018. There were more than 15 citations even before this was published, and people regularly contact Jeremy [Schmutz] or me to seek more information on the usage of this resource.
There are two different portals where this data is currently hosted. The first one is the JGI plant portal called Phytozome. There, you can query a single gene and look at the expression across the different tissues and conditions available for a species. You can also look at its co-expressed genes. It provides detailed functional annotations, protein homologs, plant family information, and a genome browser view of gene models.
The other one is the JGI Plant Gene Atlas, a dedicated portal where you can do bulk downloads of data and look at the expression of a single gene to multiple genes across the species. For your genes of interest, you can look at the expression of those genes in currently available 17 other species. And you can also access the differentially expressed genes and visualize plots representing the GO and KEGG pathway enrichments. Detailed documentation about using this resource is included under the “Help” tab on the portal.