City
Epaper

Large language models can accurately predict plant gene functions: Study

By IANS | Updated: June 1, 2025 18:08 IST

New Delhi, June 1 Large language models (LLMs), when trained on extensive plant genomic data, can accurately predict ...

Open in App

New Delhi, June 1 Large language models (LLMs), when trained on extensive plant genomic data, can accurately predict gene functions and regulatory elements, researchers said on Sunday.

This advancement holds promise for accelerating crop improvement, enhancing biodiversity conservation, and bolstering food security in the face of global challenges, said the study published in Tropical Plants journal.

Traditionally, plant genomics has grappled with the intricacies of vast and complex datasets, often limited by the specificity of traditional machine learning models and the scarcity of annotated data.

While LLMs have revolutionised fields like natural language processing, their application in plant genomics remained nascent. The primary hurdle has been adapting these models to interpret the unique "language" of plant genomes, which differ significantly from human linguistic patterns.

In this study, researchers explored the potential of LLMs in plant genomics.

By drawing parallels between the structures of natural language and genomic sequences, the study highlights how LLMs can be trained to understand and predict gene functions, regulatory elements, and expression patterns in plants.

The research discusses various LLM architectures, including encoder-only models like DNABERT, decoder-only models such as DNAGPT, and encoder-decoder models like ENBED.

The team employed a methodology that involved pre-training LLMs on vast datasets of plant genomic sequences, followed by fine-tuning with specific annotated data to enhance accuracy.

By treating DNA sequences akin to linguistic sentences, the models could identify patterns and relationships within the genetic code.

These models have shown promise in tasks like promoter prediction, enhancer identification, and gene expression analysis. Notably, plant-specific models like AgroNT and FloraBERT have been developed, demonstrating improved performance in annotating plant genomes and predicting tissue-specific gene expression.

However, the study also notes that most existing LLMs are trained on animal or microbial data, which often lack comprehensive genomic annotations, showcasing the versatility and robustness of LLMs in diverse plant species.

In summary, this study underscores the immense potential of integrating artificial intelligence, particularly large language models, into plant genomics research. The study was conducted by Meiling Zou, Haiwei Chai and Zhiqiang Xia’s team from Hainan University.

Disclaimer: This post has been auto-published from an agency feed without any modifications to the text and has not been reviewed by an editor

Open in App

Related Stories

InternationalIran imposes office closures, water restrictions amid unprecedented drought

Cricket"It's always been special to play for your country": Stuart Binny on representing India Champions in World Championship of Legends 2025

InternationalUN says Gaza aid operations under severe strain

Entertainment'Fire in my belly to do the next': Randeep Hooda on his upcoming projects

InternationalFlash floods wreak havoc in eastern Afghanistan, 3 dead

Health Realted Stories

HealthCentre approves 14,599 anganwadi cum creches under Palna Scheme

HealthTelangana CID arrest two more accused in human organ trafficking case

HealthTribal family treks 10 km with girl’s dead body in Jharkhand after hospital denies ambulance

HealthDr. Reddy's Laboratories' net profit slumps 11 pc to Rs 14,09 crore sequentially in Q1

HealthCentre details Shubhanshu Shukla’s studies on microalgae, cyanobacteria on space station