Showing posts with label scGPT. Show all posts
Showing posts with label scGPT. Show all posts

Monday, January 6, 2025

human scRNA aging data

 There are some human single cell aging data,

 
 
 
 
https://pmc.ncbi.nlm.nih.gov/articles/PMC10306289/#_ad93_

Human PBMC scRNA-seq–based aging clocks reveal ribosome to inflammation balance as a single-cell aging hallmark and super longevity

 1, 2, 2, 1,3 1 4 4 1,* 2,*

Friday, November 22, 2024

scGPT only has classificaiton decoders

 

It seems that current scGPT has 3 classificaiton decoders: 

 CLS

 ExpressClas

 MVC - masked value cell embedding? 

Monday, September 23, 2024

self attenion map in gene-based transformer models

 

Based on the search results, self-attention maps offer several key advantages over traditional methods for analyzing gene interaction networks:


1. Dynamic and context-dependent relationships:

- Self-attention maps can capture complex, non-linear relationships between genes that change based on the specific cellular context or condition being analyzed.

- Traditional interaction networks are typically static and do not adapt to different contexts.


2. Long-range dependencies:

- Transformer architectures with self-attention can effectively model long-range dependencies between distant elements in the genome sequence.

- This allows capturing interactions between genes or regulatory elements that are far apart in the linear sequence, which is challenging for traditional methods.


3. Learning novel interactions:

- Self-attention maps are learned from data during training, potentially discovering novel relationships between genes that are not captured in existing interaction databases.

- This data-driven approach can reveal previously unknown interactions.


4. Improved prediction accuracy:

- Models using self-attention have demonstrated superior performance on tasks like gene expression prediction compared to previous approaches.

- For example, the Enformer model showed improved correlation between predictions and measured data relative to previous state-of-the-art models without self-attention.


5. Capturing regulatory relationships:

- Studies have shown that attention maps can reveal meaningful biological patterns like regulatory elements, coding vs non-coding regions, and gene expression relationships.

- The Enformer model, for instance, learned about the role of tissue-specific enhancers, promoters, and insulator elements.


6. Integration of multiple data types:

- Self-attention mechanisms can integrate information from multiple omics data types and experimental assays to build a more comprehensive view of gene interactions.


7. Interpretability:

- While more abstract than traditional networks, attention weights can be analyzed to understand which gene relationships the model deems important for a given prediction.

- This allows researchers to discern which molecular sections the model prioritizes, providing insights into structure-activity relationships.


8. Parallel processing:

- Thanks to their self-attention properties, these models can process data in parallel, greatly enhancing computational efficiency compared to sequential processing in traditional methods.


In summary, self-attention maps offer a more flexible, data-driven approach to modeling gene interactions that can adapt to specific contexts, capture long-range dependencies, and potentially reveal novel biological insights beyond what is captured in static interaction networks.


Citations:

[1] https://www.nature.com/articles/s41592-021-01252-x

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10772984/

[3] https://academic.oup.com/bib/article/25/1/bbad467/7512647

[4] https://academic.oup.com/nar/article/49/13/e77/6266414

[5] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v1.full

[6] https://www.instadeep.com/2024/04/building-the-next-generation-of-ai-models-to-decipher-human-biology/

[7] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10376273/

[8] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10400306/





Tuesday, July 30, 2024

scGPT

 

Based on the search results, scGPT determines "known" and "unknown" genes in the following ways:


1. During Training:

   - A random proportion of genes are selected as "unknown" genes.

   - The expression values of these "unknown" genes are omitted from the input.

   - The remaining genes are considered "known" genes.


2. Attention Masking:

   - The model uses a specialized attention mask that only allows attention computation between the "known" genes and the query gene itself.

   - "Unknown" genes cannot attend to other "unknown" genes.


3. Iterative Prediction Process:

   - The model predicts expression values for "unknown" genes based on information from "known" genes.

   - During inference, the process is conducted in K iterative steps.


4. Confidence-based Selection:

   - In each iteration, scGPT selects the top 1/K genes from the "unknown" set with the highest prediction confidence.

   - These newly predicted genes become "known" genes for the next iteration.


5. Auto-regressive Generation:

   - This process creates a form of auto-regressive generation for non-sequential data.

   - Gene expressions predicted with highest confidence are generated first and then used to help predict subsequent genes.


6. Cell-prompt vs. Gene-prompt Generation:

   - For cell-prompt generation, it starts with a cell embedding representing the cell type condition.

   - For gene-prompt generation, it begins with a set of known genes with observed expression values.


This approach allows scGPT to handle the non-sequential nature of single-cell data while still leveraging the power of transformer models for prediction tasks.


Citations:

[1] https://twitter.com/simocristea/status/1676323087959179264

[2] https://www.linkedin.com/pulse/new-generative-ai-tool-predicts-gene-expression-single-colangelo-x7ebf

[3] https://www.the-scientist.com/a-new-ai-tool-predicts-gene-expression-in-a-single-cell-71295

[4] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v2.full

[5] https://www.biorxiv.org/content/10.1101/2023.04.30.538439v1.full