Based on the search results, scGPT determines "known" and "unknown" genes in the following ways:
1. During Training:
- A random proportion of genes are selected as "unknown" genes.
- The expression values of these "unknown" genes are omitted from the input.
- The remaining genes are considered "known" genes.
2. Attention Masking:
- The model uses a specialized attention mask that only allows attention computation between the "known" genes and the query gene itself.
- "Unknown" genes cannot attend to other "unknown" genes.
3. Iterative Prediction Process:
- The model predicts expression values for "unknown" genes based on information from "known" genes.
- During inference, the process is conducted in K iterative steps.
4. Confidence-based Selection:
- In each iteration, scGPT selects the top 1/K genes from the "unknown" set with the highest prediction confidence.
- These newly predicted genes become "known" genes for the next iteration.
5. Auto-regressive Generation:
- This process creates a form of auto-regressive generation for non-sequential data.
- Gene expressions predicted with highest confidence are generated first and then used to help predict subsequent genes.
6. Cell-prompt vs. Gene-prompt Generation:
- For cell-prompt generation, it starts with a cell embedding representing the cell type condition.
- For gene-prompt generation, it begins with a set of known genes with observed expression values.
This approach allows scGPT to handle the non-sequential nature of single-cell data while still leveraging the power of transformer models for prediction tasks.
Citations:
[1] https://twitter.com/simocristea/status/1676323087959179264
[2] https://www.linkedin.com/pulse/new-generative-ai-tool-predicts-gene-expression-single-colangelo-x7ebf
[3] https://www.the-scientist.com/a-new-ai-tool-predicts-gene-expression-in-a-single-cell-71295
[4] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2.full
[5] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v1.full
No comments:
Post a Comment