Monday, September 23, 2024

Protein language model

 

Here are some recent generative protein language models:


1. ProtGPT2 (2022):

- Transformer-based model trained on ~50 million protein sequences

- Generates de novo protein sequences that follow natural amino acid propensities

- Can sample unexplored regions of protein sequence space


2. ProGen (2023):

- Trained on 280 million protein sequences from over 19,000 families  

- Can generate functional protein sequences across diverse protein families

- Demonstrated ability to generate artificial lysozymes with catalytic activity


3. PoET (Protein Evolutionary Transformer) (2024):

- Generative model that uses unsupervised learning to model protein evolution

- Can make zero-shot variant effect predictions and generate sequences de novo

- Outperforms larger models on perplexity evaluations, especially for proteins with few homologs


4. ESM-2 (2022):

- Large protein language model from Meta AI with up to 15 billion parameters

- Trained on 250 million protein sequences

- Can generate protein sequences and make structure/function predictions


5. xTrimoPGLM (2023):

- Very large protein language model with 100 billion parameters

- Trained on trillions of tokens from protein sequences

- Demonstrates strong few-shot learning capabilities for protein tasks


These models represent some of the latest advances in generative protein language modeling, with capabilities for de novo sequence generation, property prediction, and exploration of protein sequence space. They leverage large-scale pretraining on protein sequence data to learn generalizable representations of protein language.


Citations:

[1] https://elifesciences.org/articles/79854

[2] https://www.nature.com/articles/s41467-022-32007-7

[3] https://www.nature.com/articles/s41587-022-01618-2

[4] https://www.openprotein.ai/poet-a-high-performing-protein-language-model-for-zero-shot-prediction

[5] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10701588/

[6] https://www.nature.com/articles/s41586-021-03819-2

[7] https://piip.co.kr/en/blog/AlphaFold2_Architecture_Improvements

[8] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469483/

1 comment:

  1. Fascinating insights into protein language models! Just like flipperzerounleashed enhances the gaming landscape, these models are reshaping our understanding of biology and unlocking new possibilities in research!

    ReplyDelete