Thursday, February 20, 2025

scGPT on wahab



To use scGPT, please follow these commands:

module load pytorch-gpu/2.1
salloc -p gpu --gres gpu:1 -c 8
crun -p ~/envs/scGPT wandb
crun.pytorch-gpu -p ~/envs/scGPT python code.py


crun on wahab

module load python3

crun python3


Wednesday, February 19, 2025

A threshold of 0.997 in inductive confirmal prediciton

 A threshold of 0.997 means that—when forming conformal prediction sets—we only include a class if the model’s predicted probability is at least 99.7% confident. In other words, the nonconformity score cutoff is so strict that only extremely confident predictions will make it into the prediction set.

Here’s what that implies:

Very High Confidence Requirement:
With a 0.997 threshold, the model must be almost certain (≥ 99.7% probability) about a class before it’s included in the prediction set. For many examples, this might result in a prediction set with only one class (if that class’s probability exceeds 0.997) or—even worse—an empty set if no class meets that bar.

Coverage vs. Set Size Tradeoff:
Conformal prediction is designed to guarantee that the true label is included in the prediction set at a desired rate (coverage). If you set the threshold so high, you risk lowering the coverage (i.e. many instances might not have the true label in their prediction set) or you might get very sparse (tiny) prediction sets. In practical terms, if most of your examples end up with empty or overly “confident” (but possibly incorrect) prediction sets, then the threshold is too strict.

Is it Good or Bad?
If the model is extremely well-calibrated and truly confident: A threshold of 0.997 could indicate that the model is rarely uncertain, and its predictions are reliable. In such a rare scenario, you might see high coverage (almost every true label is included) and prediction sets that almost always have a single label.
In most realistic settings: Such a high threshold is likely too conservative. It may lead to prediction sets that are too small (or even empty), failing to capture the uncertainty inherent in the data. That would be “bad” because it undermines one of the strengths of conformal prediction—providing informative prediction sets that reflect the model’s uncertainty.

In summary, unless your model is known to be extremely confident and well-calibrated (so that nearly every correct prediction is given with ≥ 99.7% probability), a threshold of 0.997 is likely too strict. You would typically aim for a threshold that balances having reasonably sized prediction sets (capturing uncertainty) while still meeting your desired coverage rate (for instance, 90% coverage when targeting a 10% error rate).

Wednesday, February 12, 2025

Youtube video on biological AI

 

https://www.youtube.com/@valence_labs


https://pmc.ncbi.nlm.nih.gov/articles/PMC11118704/


Multimodal Learning for Mapping the Genotype-Phenotype Dynamics

PMCID: PMC11118704  PMID: 38798675

Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states

 Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states 

https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae1212/7921050?login=false#498065910


Tuesday, February 11, 2025

ESM evolutionary ai

 https://github.com/evolutionaryscale/esm/tree/main

write a tutorial for AI for health science course. 

Sunday, February 9, 2025

Friday, February 7, 2025

ants image classification

 https://images.cv/dataset/ant-image-classification-dataset


https://github.com/naturalis/FormicID



CS795 lecture biomedical image

 Zoom, start recording

unit 2. biomedical image

project topics, presentation

use ipad to illustrate data organization


TEAM drive for data sharing



Wednesday, February 5, 2025

Heafy.AI

 https://www.heavy.ai/

why first epoch takes the longest time during training,

 The reason why the first epoch takes the longest time during training, with the second epoch taking slightly less time, and later epochs being even faster, is due to several factors, including data preprocessing, caching, and optimization mechanisms. Let's break it down:


1. Initial Data Loading & Preprocessing

  • First Epoch:
    • During the first epoch, the dataset needs to be fully loaded from disk and passed through the preprocessing pipeline.
    • If you are using ImageDataGenerator or tf.data pipelines, the images may be resized, augmented, normalized, and converted to tensors for the first time.
    • This initial processing is computationally expensive.
  • Later Epochs:
    • TensorFlow and Keras utilize caching and preloaded batches to speed up data access in subsequent epochs.
    • Once the images are processed in the first epoch, they may be cached in RAM (or SSD) and reused, reducing load times.

2. CPU/GPU Warm-Up and Optimization

  • First Epoch:
    • When training starts, TensorFlow initializes GPU memory allocation, CUDA kernels, and computational graph optimizations.
    • This setup process introduces additional overhead, making the first epoch the slowest.
  • Later Epochs:
    • By the second epoch, TensorFlow has already optimized its execution path.
    • GPU computation becomes more efficient as TensorFlow compiles kernels and caches intermediate computations.

3. Adaptive Learning Rate & Optimizer State

  • Some optimizers, like Adam, RMSprop, and SGD with momentum, dynamically adjust learning rates and gradients over epochs.
  • This results in:
    • First Epoch: Gradients might be noisier, and initial computations may take longer.
    • Later Epochs: Optimizers become more stable, requiring slightly less computation.

4. Disk I/O Bottleneck

  • If you are loading images from a slow disk (HDD) or network storage, the first epoch takes longer due to disk read latency.
  • In later epochs, data might be cached in RAM or SSD, making loading much faster.

Conclusion

  • First epoch is slow due to data preprocessing, GPU warm-up, and disk I/O.
  • Second epoch is faster because some of these operations are now cached or optimized.
  • Later epochs become even more efficient as optimizations and caching mechanisms continue to improve performance.

Would you like me to check if your training pipeline is utilizing caching properly? 🚀

Monday, February 3, 2025

explain training progress print out



Below is an overview of what you’re seeing in each line of the training log and why the number “163” appears:

  1. Epoch n/20
    This indicates which epoch (out of 20 total) is currently being trained.

  2. 163/163

    • This means your dataset has been split into 163 mini-batches (steps) for each epoch.
    • In other words, every epoch processes 163 batches of data before finishing.
    • For instance, if you have a total of 16,300 samples in your training set and you set the batch size to 100, you would end up with 163 steps per epoch.
  3. Time/Step Info
    For example: 80s 415ms/step. The first number (80s) usually shows how long the entire epoch took. Then 415ms/step is roughly how long each training step took on average.

  4. acc: 0.7738 (Training Accuracy)
    The model’s accuracy on the training set at the end of the epoch.

  5. f1_score: 0.6252 (Training F1)
    The model’s F1 score on the training set at the end of the epoch, if you’ve added F1 as a metric.

  6. loss: 0.5913 (Training Loss)
    The training loss calculated over all batches in that epoch.

  7. val_acc, val_f1_score, val_loss (Validation Metrics)
    These show how the model performed on the validation set during that epoch:

    • val_acc: Validation accuracy.
    • val_f1_score: Validation F1 score.
    • val_loss: Loss on the validation set.

Putting it all together: each epoch processes 163 mini-batches (hence 163/163), and at the end, TensorFlow displays the final metrics on both the training set (acc, f1_score, loss) and the validation set (val_acc, val_f1_score, val_loss)