Tuesday, July 30, 2024

scGPT

 

Based on the search results, scGPT determines "known" and "unknown" genes in the following ways:


1. During Training:

   - A random proportion of genes are selected as "unknown" genes.

   - The expression values of these "unknown" genes are omitted from the input.

   - The remaining genes are considered "known" genes.


2. Attention Masking:

   - The model uses a specialized attention mask that only allows attention computation between the "known" genes and the query gene itself.

   - "Unknown" genes cannot attend to other "unknown" genes.


3. Iterative Prediction Process:

   - The model predicts expression values for "unknown" genes based on information from "known" genes.

   - During inference, the process is conducted in K iterative steps.


4. Confidence-based Selection:

   - In each iteration, scGPT selects the top 1/K genes from the "unknown" set with the highest prediction confidence.

   - These newly predicted genes become "known" genes for the next iteration.


5. Auto-regressive Generation:

   - This process creates a form of auto-regressive generation for non-sequential data.

   - Gene expressions predicted with highest confidence are generated first and then used to help predict subsequent genes.


6. Cell-prompt vs. Gene-prompt Generation:

   - For cell-prompt generation, it starts with a cell embedding representing the cell type condition.

   - For gene-prompt generation, it begins with a set of known genes with observed expression values.


This approach allows scGPT to handle the non-sequential nature of single-cell data while still leveraging the power of transformer models for prediction tasks.


Citations:

[1] https://twitter.com/simocristea/status/1676323087959179264

[2] https://www.linkedin.com/pulse/new-generative-ai-tool-predicts-gene-expression-single-colangelo-x7ebf

[3] https://www.the-scientist.com/a-new-ai-tool-predicts-gene-expression-in-a-single-cell-71295

[4] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2.full

[5] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v1.full


No comments:

Post a Comment