Features Partner Sites Information LinkXpress hp
Sign In
Advertise with Us
LGC Clinical Diagnostics

Download Mobile App





Researchers Use Natural-Language Processing (NLP) Algorithms to Predict SARS-CoV-2 Virus Mutations

By LabMedica International staff writers
Posted on 18 Jan 2021
Natural-language processing (NLP) algorithms are now able to generate protein sequences and predict virus mutations, including key changes that help the SARS-CoV-2 virus evade the immune system.

The key insight making this possible is that many properties of biological systems can be interpreted in terms of words and sentences. In the last few years, a handful of researchers have shown that protein sequences and genetic codes can be modeled using NLP techniques. Now, computational biologists at the Massachusetts Institute of Technology (MIT; Cambridge, MA, USA) have pulled several of these strands together and use NLP to predict mutations that allow viruses to avoid being detected by antibodies in the human immune system, a process known as viral immune escape. The basic idea is that the interpretation of a virus by an immune system is analogous to the interpretation of a sentence by a human.

The team uses two different linguistic concepts: grammar and semantics (or meaning). The genetic or evolutionary fitness of a virus - characteristics such as how good it is at infecting a host - can be interpreted in terms of grammatical correctness. A successful, infectious virus is grammatically correct; an unsuccessful one is not. Similarly, mutations of a virus can be interpreted in terms of semantics. Mutations that make a virus appear different to things in its environment - such as changes in its surface proteins that make it invisible to certain antibodies - have altered its meaning. Viruses with different mutations can have different meanings, and a virus with a different meaning may need different antibodies to read it.

To model these properties, the researchers used an LSTM, a type of neural network that predates the transformer-based ones used by large language models like GPT-3. These older networks can be trained on far less data than transformers and still perform well for many applications. Instead of millions of sentences, they trained the NLP model on thousands of genetic sequences taken from three different viruses: 45,000 unique sequences for a strain of influenza, 60,000 for a strain of HIV, and between 3,000 and 4,000 for a strain of the SARS-CoV-2 virus.

NLP models work by encoding words in a mathematical space in such a way that words with similar meanings are closer together than words with different meanings. This is known as an embedding. For viruses, the embedding of the genetic sequences grouped viruses according to how similar their mutations were. The overall aim of the approach is to identify mutations that might let a virus escape an immune system without making it less infectious - that is, mutations that change a virus’s meaning without making it grammatically incorrect.

To test their approach, the team used a common metric for assessing predictions made by machine-learning models that scores accuracy on a scale between 0.5 (no better than chance) and 1 (perfect). In this case, they took the top mutations identified by the tool and, using real viruses in a lab, checked how many of them were actual escape mutations. Their results ranged from 0.69 for HIV to 0.85 for one coronavirus strain. This is better than results from other state-of-the-art models, according to the researchers.

The team has been running models on new variants of the coronavirus, including the so-called UK mutation, the mink mutation from Denmark, and variants taken from South Africa, Singapore and Malaysia. Using NLP accelerates a slow process. Previously, the genome of the virus taken from a COVID-19 patient in hospital could be sequenced and its mutations re-created and studied in a lab. However, that can take weeks, whereas the NLP model predicts potential mutations straight away, which focuses the lab work and speeds it up.

“We’re learning the language of evolution,” said Bonnie Berger, a computational biologist at the Massachusetts Institute of Technology. “Biology has its own language.”

Related Links:
Massachusetts Institute of Technology (MIT)


Gold Member
SARS-CoV-2 Reactive & Non-Reactive Controls
Qnostics SARS-CoV-2 Typing
Verification Panels for Assay Development & QC
Seroconversion Panels
New
Dermatophytosis Rapid Diagnostic Kit
StrongStep Dermatophytosis Diagnostic Kit
New
Respiratory Bacterial Panel
Real Respiratory Bacterial Panel 2
Read the full article by registering today, it's FREE! It's Free!
Register now for FREE to LabMedica.com and get complete access to news and events that shape the world of Clinical Laboratory Medicine.
  • Free digital version edition of LabMedica International sent by email on regular basis
  • Free print version of LabMedica International magazine (available only outside USA and Canada).
  • Free and unlimited access to back issues of LabMedica International in digital format
  • Free LabMedica International Newsletter sent every week containing the latest news
  • Free breaking news sent via email
  • Free access to Events Calendar
  • Free access to LinkXpress new product services
  • REGISTRATION IS FREE AND EASY!
Click here to Register








Channels

Clinical Chemistry

view channel
Image: The tiny clay-based materials can be customized for a range of medical applications (Photo courtesy of Angira Roy and Sam O’Keefe)

‘Brilliantly Luminous’ Nanoscale Chemical Tool to Improve Disease Detection

Thousands of commercially available glowing molecules known as fluorophores are commonly used in medical imaging, disease detection, biomarker tagging, and chemical analysis. They are also integral in... Read more

Immunology

view channel
Image: The cancer stem cell test can accurately choose more effective treatments (Photo courtesy of University of Cincinnati)

Stem Cell Test Predicts Treatment Outcome for Patients with Platinum-Resistant Ovarian Cancer

Epithelial ovarian cancer frequently responds to chemotherapy initially, but eventually, the tumor develops resistance to the therapy, leading to regrowth. This resistance is partially due to the activation... Read more

Microbiology

view channel
Image: The lab-in-tube assay could improve TB diagnoses in rural or resource-limited areas (Photo courtesy of Kenny Lass/Tulane University)

Handheld Device Delivers Low-Cost TB Results in Less Than One Hour

Tuberculosis (TB) remains the deadliest infectious disease globally, affecting an estimated 10 million people annually. In 2021, about 4.2 million TB cases went undiagnosed or unreported, mainly due to... Read more

Pathology

view channel
Image: The UV absorbance spectrometer being used to measure the absorbance spectra of cell culture samples (Photo courtesy of SMART CAMP)

Novel UV and Machine Learning-Aided Method Detects Microbial Contamination in Cell Cultures

Cell therapy holds great potential in treating diseases such as cancers, inflammatory conditions, and chronic degenerative disorders by manipulating or replacing cells to restore function or combat disease.... Read more

Technology

view channel
Image: The HIV-1 self-testing chip will be capable of selectively detecting HIV in whole blood samples (Photo courtesy of Shutterstock)

Disposable Microchip Technology Could Selectively Detect HIV in Whole Blood Samples

As of the end of 2023, approximately 40 million people globally were living with HIV, and around 630,000 individuals died from AIDS-related illnesses that same year. Despite a substantial decline in deaths... Read more

Industry

view channel
Image: The collaboration aims to leverage Oxford Nanopore\'s sequencing platform and Cepheid\'s GeneXpert system to advance the field of sequencing for infectious diseases (Photo courtesy of Cepheid)

Cepheid and Oxford Nanopore Technologies Partner on Advancing Automated Sequencing-Based Solutions

Cepheid (Sunnyvale, CA, USA), a leading molecular diagnostics company, and Oxford Nanopore Technologies (Oxford, UK), the company behind a new generation of sequencing-based molecular analysis technologies,... Read more
Copyright © 2000-2025 Globetech Media. All rights reserved.