Artificial intelligence (AI) has been used to reveal details of a diverse and fundamental branch of life living right under our feet and in every corner of the globe.
161,979 new species of RNA virus have been discovered using a machine learning tool that researchers believe will vastly improve the mapping of life on Earth and could aid in the identification of many millions more viruses yet to be characterised.
READ MORE: ‘Sacrifice’ of virus data clears the path to open a disease discovery pipeline
READ MORE: Giant viruses found on Greenland ice sheet
Published in Cell and conducted by an international team of researchers, the study is the largest virus species discovery paper ever published.
Window into hidden world
“We have been offered a window into an otherwise hidden part of life on earth, revealing remarkable biodiversity,” said senior author Professor Edwards Holmes from the School of Medical Sciences in the Faculty of Medicine and Health at the University of Sydney.
“This is the largest number of new virus species discovered in a single study, massively expanding our knowledge of the viruses that live among us,” Professor Holmes said. “To find this many new viruses in one fell swoop is mind-blowing, and it just scratches the surface, opening up a world of discovery. There are millions more to be discovered, and we can apply this same approach to identifying bacteria and parasites.”
Although RNA viruses are commonly associated with human disease, they are also found in extreme environments around the world and may even play key roles in global ecosystems. In this study they were found living in the atmosphere, hot springs and hydrothermal vents.
“That extreme environments carry so many types of viruses is just another example of their phenomenal diversity and tenacity to live in the harshest settings, potentially giving us clues on how viruses and other elemental life-forms came to be,” Professor Holmes said.
Deep learning algorithm
The researchers built a deep learning algorithm, LucaProt, to compute vast troves of genetic sequence data, including lengthy virus genomes of up to 47,250 nucleotides and genomically complex information to discover more than 160,000 viruses.
“The vast majority of these viruses had been sequenced already and were on public databases, but they were so divergent that no one knew what they were,” Professor Holmes said. “They comprised what is often referred to as sequence ‘dark matter’. Our AI method was able to organise and categorise all this disparate information, shedding light on the meaning of this dark matter for the first time.
The AI tool was trained to compute the dark matter and identify viruses based on sequences and the secondary structures of the protein that all RNA viruses use for replication.
It was able to significantly fast track virus discovery, which, if using traditional methods, would be time intensive.
Dark matter
Co-author from Sun Yat-sen University, the study’s institutional lead, Professor Mang Shi said: “We used to rely on tedious bioinformatics pipelines for virus discovery, which limited the diversity we could explore. Now, we have a much more effective AI-based model that offers exceptional sensitivity and specificity, and at the same time allows us to delve much deeper into viral diversity. We plan to apply this model across various applications.”
Co-author Dr Zhao-Rong Li, who researches in the Apsara Lab of Alibaba Cloud Intelligence, said: “LucaProt represents a significant integration of cutting-edge AI technology and virology, demonstrating that AI can effectively accomplish tasks in biological exploration. This integration provides valuable insights and encouragement for further decoding of biological sequences and the deconstruction of biological systems from a new perspective. We will also continue our research in the field of AI for virology.”
Professor Holmes said: “The obvious next step is to train our method to find even more of this amazing diversity, and who knows what extra surprises are in store.”
No comments yet