- Article
- Open access
- Published:
Inferring effects of mutations on SARS-CoV-2 transmission from genomic surveillance data
Nature Communications , Article number: 16441 (2025)
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
module load python3
crun python3
A threshold of 0.997 means that—when forming conformal prediction sets—we only include a class if the model’s predicted probability is at least 99.7% confident. In other words, the nonconformity score cutoff is so strict that only extremely confident predictions will make it into the prediction set.
Here’s what that implies:
• Very High Confidence Requirement:
With a 0.997 threshold, the model must be almost certain (≥ 99.7% probability) about a class before it’s included in the prediction set. For many examples, this might result in a prediction set with only one class (if that class’s probability exceeds 0.997) or—even worse—an empty set if no class meets that bar.
• Coverage vs. Set Size Tradeoff:
Conformal prediction is designed to guarantee that the true label is included in the prediction set at a desired rate (coverage). If you set the threshold so high, you risk lowering the coverage (i.e. many instances might not have the true label in their prediction set) or you might get very sparse (tiny) prediction sets. In practical terms, if most of your examples end up with empty or overly “confident” (but possibly incorrect) prediction sets, then the threshold is too strict.
• Is it Good or Bad?
– If the model is extremely well-calibrated and truly confident: A threshold of 0.997 could indicate that the model is rarely uncertain, and its predictions are reliable. In such a rare scenario, you might see high coverage (almost every true label is included) and prediction sets that almost always have a single label.
– In most realistic settings: Such a high threshold is likely too conservative. It may lead to prediction sets that are too small (or even empty), failing to capture the uncertainty inherent in the data. That would be “bad” because it undermines one of the strengths of conformal prediction—providing informative prediction sets that reflect the model’s uncertainty.
In summary, unless your model is known to be extremely confident and well-calibrated (so that nearly every correct prediction is given with ≥ 99.7% probability), a threshold of 0.997 is likely too strict. You would typically aim for a threshold that balances having reasonably sized prediction sets (capturing uncertainty) while still meeting your desired coverage rate (for instance, 90% coverage when targeting a 10% error rate).
https://www.youtube.com/@valence_labs
https://pmc.ncbi.nlm.nih.gov/articles/PMC11118704/
Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states
https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae1212/7921050?login=false#498065910
https://github.com/evolutionaryscale/esm/tree/main
write a tutorial for AI for health science course.
Nature Communications volume 16, Article number: 441 (2025) Cite this article
2967 Accesses
106 Altmetric
https://images.cv/dataset/ant-image-classification-dataset
https://github.com/naturalis/FormicID
Zoom, start recording
unit 2. biomedical image
project topics, presentation
use ipad to illustrate data organization
TEAM drive for data sharing
The reason why the first epoch takes the longest time during training, with the second epoch taking slightly less time, and later epochs being even faster, is due to several factors, including data preprocessing, caching, and optimization mechanisms. Let's break it down:
ImageDataGenerator
or tf.data
pipelines, the images may be resized, augmented, normalized, and converted to tensors for the first time.Would you like me to check if your training pipeline is utilizing caching properly? 🚀
Below is an overview of what you’re seeing in each line of the training log and why the number “163” appears:
Epoch n/20
This indicates which epoch (out of 20 total) is currently being trained.
163/163
Time/Step Info
For example: 80s 415ms/step
. The first number (80s) usually shows how long the entire epoch took. Then 415ms/step
is roughly how long each training step took on average.
acc: 0.7738 (Training Accuracy)
The model’s accuracy on the training set at the end of the epoch.
f1_score: 0.6252 (Training F1)
The model’s F1 score on the training set at the end of the epoch, if you’ve added F1 as a metric.
loss: 0.5913 (Training Loss)
The training loss calculated over all batches in that epoch.
val_acc, val_f1_score, val_loss (Validation Metrics)
These show how the model performed on the validation set during that epoch:
val_acc
: Validation accuracy.val_f1_score
: Validation F1 score.val_loss
: Loss on the validation set.Putting it all together: each epoch processes 163 mini-batches (hence 163/163
), and at the end, TensorFlow displays the final metrics on both the training set (acc, f1_score, loss) and the validation set (val_acc, val_f1_score, val_loss)