Virus mutation and virus escape. NLP Model predicts mutation.
Image: MIT

Virus mutation and virus escape. NLP Model predicts mutation.

Virus mutation and virus escape cause a huge number of deaths every year. Researchers at MIT have successfully deployed a method to deal with virus evasion. It is indeed a massive development that an NLP model analyzes how viruses escape the immune system. The success of the model lies in its ability to identify parts of the virus that would mutate rapidly. And even identify parts that have the least mutation probability. Thus these immutable parts of the viral surface become potential targets for vaccines. And Natural Language Processing is proving beneficial for scientists in developing vaccines that help fight virus mutation and virus escape.

Virus mutation and virus escape-definitions

Simply put, viruses mutate rapidly, i.e. they change the shapes of their surface proteins. Different viruses have different rates and patterns of mutation in their genes. And this makes it tough for antibodies to attach themselves to these viruses and combat them. The surface of the envelope area might change rapidly but this does not hamper the functioning of protein in viruses. This, in simple terms, is called a virus mutation and virus escape i.e. a virus has been successful in escaping the defense mechanism of our bodies. Thus, existing vaccines and our immune systems emerge feeble against these viruses.  

Why it is difficult to develop a strong vaccine against viruses easily?

It is often a cumbersome and tedious task to produce effective vaccines against viruses. Scientists and researchers’ community struggle to date about having a strong vaccine for HIV and similar strands. Bonnie Berger is a Simons Professor of Computation and Biology group in CSAILT at MIT. He reveals that the real struggle in the vaccine development for influenza and HIV is the highly mutative nature of surface protein and envelope surface protein respectively. Apart from Berger, the team of researchers for this study includes Bryan Bryson and Brian Hie (the main author of the paper). And these researchers have applied the latest computational model to viruses like SARS-CoV-2 as well.

Natural Language Processing for virus proteins

The researchers used this virus related information to train the model for computation. This model is based on yet another growing area of research in the computer world, Natural language processing. Now, these are the models especially interested in analyzing and predicting the natural language. But researchers found these models efficient enough to process such critical biological data as well. They used certain rules to determine the functionality of a protein in a virus. These rules are used synonymously with the grammar of the language processing model. And the probability of the protein adapting to a new shape is synonymous with the language semantic. The model emerges successful and promising as it requires only genetic sequence information as input to be trained.

Also Read: Motivation to learn declines with age

Number of sequences used

Some 60,000 sequences of HIV, 45,000 influenza sequences, and some 4,000 coronavirus sequences were used in this research. After the researchers trained the model, it was efficiently used to predict probable sequences of coronavirus, HIV, and the influenza hemagglutinin, that help mutate the virus. This model is thus

What analysis did the model give for influenza, coronavirus, and HIV?

Now in influenza, it is the stalk of HA protein that carries the least mutable sequences and hence is the most probable area of mutation. This prediction corroborates with the existing studies which talk about the antibodies attacking the HA stalk. And such vaccines provide a broader shield against these viruses. For the coronavirus, it is a subunit, called S2, of its spike protein that exhibits the most sluggish mutation. And it can serve as the most favorable target for the antibodies. The mutation rate of the coronavirus is still understudied though. But prior evidences suggest it does not mutate as fast as influenza or HIV. While on the other hand, in HIV, it is the V1-V2 region of the protein is hypervariable and has the most chances of mutation.

Natural language processing model-its promising future

This team of researchers now generously shares their model with others to help find vaccines for diseases like cancer that will help the body’s immune system to attack tumors. The model is also now increasingly used in designing drugs for diseases like tuberculosis that will have the least possibility of inciting the body’s resistance. Byson feels hopeful for the future of vaccines in the health care sector with the help of this model as it requires minimal data and produces great results.


The future seems promising with the amalgamation of the natural language processing model and the prediction of virus mutation and virus escape. The model is extremely efficient and is being widely used not only for the mutation prediction of viruses. But also the model can be effective in designing drugs of smaller molecules that are expected to evoke the least resistance in bodies. And thus, is emerging as a landmark research in the health sector.

Disclaimer: The above article has been aggregated by a computer program and summarised by an Steamdaily specialist. You can read the original article at mit
Close Menu