module load python3
crun python3
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
module load python3
crun python3
A threshold of 0.997 means that—when forming conformal prediction sets—we only include a class if the model’s predicted probability is at least 99.7% confident. In other words, the nonconformity score cutoff is so strict that only extremely confident predictions will make it into the prediction set.
Here’s what that implies:
• Very High Confidence Requirement:
With a 0.997 threshold, the model must be almost certain (≥ 99.7% probability) about a class before it’s included in the prediction set. For many examples, this might result in a prediction set with only one class (if that class’s probability exceeds 0.997) or—even worse—an empty set if no class meets that bar.
• Coverage vs. Set Size Tradeoff:
Conformal prediction is designed to guarantee that the true label is included in the prediction set at a desired rate (coverage). If you set the threshold so high, you risk lowering the coverage (i.e. many instances might not have the true label in their prediction set) or you might get very sparse (tiny) prediction sets. In practical terms, if most of your examples end up with empty or overly “confident” (but possibly incorrect) prediction sets, then the threshold is too strict.
• Is it Good or Bad?
– If the model is extremely well-calibrated and truly confident: A threshold of 0.997 could indicate that the model is rarely uncertain, and its predictions are reliable. In such a rare scenario, you might see high coverage (almost every true label is included) and prediction sets that almost always have a single label.
– In most realistic settings: Such a high threshold is likely too conservative. It may lead to prediction sets that are too small (or even empty), failing to capture the uncertainty inherent in the data. That would be “bad” because it undermines one of the strengths of conformal prediction—providing informative prediction sets that reflect the model’s uncertainty.
In summary, unless your model is known to be extremely confident and well-calibrated (so that nearly every correct prediction is given with ≥ 99.7% probability), a threshold of 0.997 is likely too strict. You would typically aim for a threshold that balances having reasonably sized prediction sets (capturing uncertainty) while still meeting your desired coverage rate (for instance, 90% coverage when targeting a 10% error rate).
https://www.youtube.com/@valence_labs
https://pmc.ncbi.nlm.nih.gov/articles/PMC11118704/
Predicting gene expression from histone marks using chromatin deep learning models depends on histone mark function, regulatory distance and cellular states
https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae1212/7921050?login=false#498065910
https://github.com/evolutionaryscale/esm/tree/main
write a tutorial for AI for health science course.
Nature Communications volume 16, Article number: 441 (2025) Cite this article
2967 Accesses
106 Altmetric
https://images.cv/dataset/ant-image-classification-dataset
https://github.com/naturalis/FormicID
Zoom, start recording
unit 2. biomedical image
project topics, presentation
use ipad to illustrate data organization
TEAM drive for data sharing
The reason why the first epoch takes the longest time during training, with the second epoch taking slightly less time, and later epochs being even faster, is due to several factors, including data preprocessing, caching, and optimization mechanisms. Let's break it down:
ImageDataGenerator
or tf.data
pipelines, the images may be resized, augmented, normalized, and converted to tensors for the first time.Would you like me to check if your training pipeline is utilizing caching properly? 🚀
Below is an overview of what you’re seeing in each line of the training log and why the number “163” appears:
Epoch n/20
This indicates which epoch (out of 20 total) is currently being trained.
163/163
Time/Step Info
For example: 80s 415ms/step
. The first number (80s) usually shows how long the entire epoch took. Then 415ms/step
is roughly how long each training step took on average.
acc: 0.7738 (Training Accuracy)
The model’s accuracy on the training set at the end of the epoch.
f1_score: 0.6252 (Training F1)
The model’s F1 score on the training set at the end of the epoch, if you’ve added F1 as a metric.
loss: 0.5913 (Training Loss)
The training loss calculated over all batches in that epoch.
val_acc, val_f1_score, val_loss (Validation Metrics)
These show how the model performed on the validation set during that epoch:
val_acc
: Validation accuracy.val_f1_score
: Validation F1 score.val_loss
: Loss on the validation set.Putting it all together: each epoch processes 163 mini-batches (hence 163/163
), and at the end, TensorFlow displays the final metrics on both the training set (acc, f1_score, loss) and the validation set (val_acc, val_f1_score, val_loss)
Zoom, start recording
Datacamp review,
= A primer on deep learning in genetics, classification model, continued
Review. ask a student to run it and explain.
pytorch
transformer
Office hours: breakout room with each student. expectation. what to learn.
what are your short and long-term career goals. How does course align with that.
what kind of topics do you suggest?
kaggle analysis, logistic regression
Prior-knowledge-defined attention masks for transformers involve incorporating domain-specific information or constraints into the attention mechanism. This approach can offer several advantages and disadvantages:
## Advantages
1. Enhanced Interpretability: By incorporating prior knowledge, the model's attention patterns become more aligned with human understanding, making the model's decision-making process more transparent[2].
2. Improved Performance: In specific domains, prior knowledge can guide the model to focus on relevant information, potentially leading to better performance on targeted tasks[2].
3. Reduced Computational Complexity: By limiting attention to specific areas defined by prior knowledge, the model may require fewer computations, especially for long sequences[4].
4. Task-Specific Adaptation: Prior-knowledge masks can be tailored to specific tasks or domains, allowing for more efficient fine-tuning of pre-trained models[4].
## Disadvantages
1. Limited Flexibility: Rigid prior-knowledge masks might constrain the model's ability to learn unexpected patterns or relationships in the data[2].
2. Potential for Bias: If the prior knowledge is incomplete or biased, it may lead the model to make suboptimal decisions or reinforce existing biases in the data[4].
3. Increased Complexity in Design: Creating effective prior-knowledge masks requires domain expertise and careful design, which can be time-consuming and challenging[2].
4. Reduced Generalization: Highly specific prior-knowledge masks might limit the model's ability to generalize across different tasks or domains[4].
To implement prior-knowledge-defined attention masks:
1. Define the Mask: Create a binary or continuous mask based on domain knowledge or task-specific requirements[2].
2. Integration: Incorporate the mask into the attention mechanism, typically by element-wise multiplication with the attention scores before softmax[7].
3. Training: Fine-tune the model with the integrated mask, allowing it to learn within the constraints of the prior knowledge[4].
4. Evaluation: Assess the model's performance and interpretability to ensure the prior-knowledge mask enhances rather than hinders the model's capabilities[2].
By carefully designing and implementing prior-knowledge-defined attention masks, researchers can potentially create more efficient, interpretable, and task-specific transformer models. However, it's crucial to balance the benefits of prior knowledge with the need for model flexibility and generalization.
Citations:
[1] https://stackoverflow.blog/2024/09/26/masked-self-attention-how-llms-learn-relationships-between-tokens/
[2] https://arxiv.org/html/2406.02761v1
[3] https://stackoverflow.com/questions/58127059/how-to-understand-masked-multi-head-attention-in-transformer/59713254
[4] https://openreview.net/forum?id=abHtkQkumD
[5] https://www.reddit.com/r/MLQuestions/comments/1fqjdrf/understanding_masked_attention_in_transformer/
[6] https://blog.pangeanic.com/what-are-transformers-in-nlp
[7] https://datascience.stackexchange.com/questions/65067/proper-masking-in-the-transformer-model
[8] https://www.turing.com/kb/brief-introduction-to-transformers-and-their-power
Conformity in machine learning model prediction evaluation is calculated using a measure called the nonconformity score. This score quantifies how different or "nonconforming" a new data point is compared to the patterns observed in the training data[2]. The process of calculating conformity involves several steps:
1. Training Phase:
- Split the dataset into a proper training set and a calibration set.
- Train the model on the proper training set.
- Use the trained model to make predictions on the calibration set.
2. Nonconformity Calculation:
- For each instance in the calibration set, calculate a nonconformity score.
- This score measures how different the prediction is from the actual value.
3. Prediction Phase:
- For a new data point, calculate its nonconformity score using the trained model.
- Compare this score to the distribution of nonconformity scores from the calibration set.
The nonconformity score can be calculated in various ways, depending on the type of problem:
- For regression: It could be the absolute difference between the predicted and actual values.
- For classification: It might be based on the probability assigned to the correct class.
The key idea is that instances with higher nonconformity scores are less conforming to the training patterns and are therefore associated with higher uncertainty[2].
By using this approach, Inductive Conformal Prediction (ICP) can generate prediction intervals or sets that capture the uncertainty associated with individual predictions. This allows for a more nuanced evaluation of model performance, going beyond simple point predictions to provide a measure of confidence in each prediction.
Citations:
[1] https://www.geeksforgeeks.org/metrics-for-machine-learning-model/
[2] https://www.linkedin.com/pulse/inductive-conformal-prediction-yeshwanth-n
[3] https://kanerika.com/glossary/model-evaluation-metrics/
[4] https://www.youtube.com/watch?v=oqK6rM8fbkk
[5] https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234?gi=7cd5f38faaf8
[6] https://www.datasource.ai/en/data-science-articles/model-evaluation-metrics-in-machine-learning
[7] https://www.nature.com/articles/s41598-024-56706-x
[8] https://towardsdatascience.com/all-you-need-is-conformal-prediction-726f18920241?gi=6dd1cfc4136e
Conformity and SHAP (SHapley Additive exPlanations) value analysis are related in the context of machine learning model interpretation and uncertainty quantification. Both approaches aim to provide insights into model behavior, but they focus on different aspects:
1. Uncertainty Quantification: Conformity measures, particularly in the form of nonconformity scores, are used to quantify the uncertainty of model predictions. SHAP values, on the other hand, explain the impact of individual features on model outputs[1][3].
2. Shapley-value Conformity Scores: Recent research has explored combining Shapley values with conformal prediction to create more informative prediction sets. This approach uses Shapley values as conformity scores, resulting in smaller prediction sets for certain significance levels compared to traditional methods[5].
3. Complementary Information: While SHAP values provide feature importance and impact on model predictions, conformity measures offer insights into the reliability and uncertainty of those predictions. Together, they can provide a more comprehensive understanding of model behavior[2].
4. Uncertainty in SHAP Values: Research has also focused on quantifying uncertainty in SHAP value estimations. This includes using Shapley Residuals, Mean-Standard-Error, and Bayesian SHAP to capture different sources of uncertainty in SHAP explanations[6].
5. Application to Uncertainty Explanation: Recent work has adapted the Shapley value framework to explain various types of predictive uncertainty, quantifying each feature's contribution to the conditional entropy of model outputs[4].
By combining conformity measures with SHAP value analysis, researchers and practitioners can gain a more nuanced understanding of both model predictions and their associated uncertainties, leading to more reliable and interpretable machine learning applications.
Citations:
[1] https://proceedings.neurips.cc/paper_files/paper/2023/file/16e4be78e61a3897665fa01504e9f452-Paper-Conference.pdf
[2] https://papers.phmsociety.org/index.php/phmap/article/download/3694/2161
[3] https://mindfulmodeler.substack.com/p/shap-is-not-all-you-need
[4] https://arxiv.org/abs/2306.05724
[5] https://proceedings.mlr.press/v152/jaramillo21a.html
[6] https://scholarship.tricolib.brynmawr.edu/items/1c209352-e4ab-454e-822c-1fe30211b92d
[7] https://pmc.ncbi.nlm.nih.gov/articles/PMC10985608/
[8] https://soil.copernicus.org/articles/10/679/2024/
Conformity and attention masks in transformers can be combined in innovative ways to enhance model performance and uncertainty quantification. Here are some key approaches:
1. Uncertainty-Guided Transformer (UGT): This approach uses conformity measures to guide the attention mechanism. By introducing an uncertainty-guided random masking algorithm (UGRM), higher probability of masking is assigned to uncertain regions during training. This forces the transformer to become more efficient at inferring and recovering content in uncertain regions by exploiting contextual information.[2]
2. Stochastic Attention: Instead of using deterministic attention distributions, the attention mechanism can be made stochastic. This involves sampling attention from a Gumbel-Softmax distribution, which controls the concentration over values. Additionally, key heads in self-attention can be regularized to attend to a set of learnable centroids, effectively performing clustering over keys or hidden states.[4]
3. Probabilistic Transformer: This approach uses probabilistic attention scores to quantify epistemic uncertainties in model predictions. It involves training two models - a majority model focusing on low-uncertainty samples and a minority model focusing on high-uncertainty samples. During inference, these models are dynamically combined based on the input uncertainty to make the final prediction.[6]
4. Transformer Conformal Prediction: This method uses the Transformer architecture, particularly the decoder, as a conditional quantile estimator to predict the quantiles of prediction residuals. These quantiles are then used to estimate prediction intervals. The Transformer's ability to learn temporal dependencies across past prediction residuals benefits the estimation of prediction intervals.[5]
5. Topological Feature Extraction: This approach extracts topological features from attention matrices, providing a low-dimensional, interpretable representation of the model's internal dynamics. This can be used to estimate uncertainty in the transformer's predictions.[8]
These approaches demonstrate how conformity measures and attention masks can be combined to improve uncertainty quantification, enhance model interpretability, and potentially boost performance in various tasks. By integrating these concepts, researchers can develop more robust and reliable transformer models that not only make accurate predictions but also provide valuable insights into their confidence levels.
Citations:
[1] https://www.reddit.com/r/MLQuestions/comments/1fqjdrf/understanding_masked_attention_in_transformer/
[2] https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Uncertainty-Guided_Transformer_Reasoning_for_Camouflaged_Object_Detection_ICCV_2021_paper.pdf
[3] https://proceedings.mlr.press/v206/seedat23a/seedat23a.pdf
[4] https://cdn.aaai.org/ojs/21364/21364-13-25377-1-2-20220628.pdf
[5] https://arxiv.org/html/2406.05332v1
[6] https://sites.ecse.rpi.edu/~cvrl/Publication/pdf/Guo2022.pdf
[7] https://nejsds.nestat.org/journal/NEJSDS/article/10/text
[8] https://arxiv.org/abs/2308.11295
Zoom, start recording
Datacamp review,
slides,
= A primer on deep learning in genetics, classification model
https://colab.research.google.com/github/hongqin/Python-CoLab-bootcamp/blob/master/A_Primer_on_Deep_Learning_in_Genomics_Public.ipynb
= CoLab
= Github - Wahab
= Wahab ondemand,
todo: kaggle analysis, logistic regression
Here are some recent papers exploring the mathematical foundations of artificial intelligence:
"Formal Mathematical Reasoning: A New Frontier in AI" (December 2024)
"Artificial Intelligence: Advanced Mathematical Constructs and Applications" (November 2024)
"Integrating Discrete Mathematics in Artificial Intelligence: A Computational Perspective with a Vision for Future Technologies" (June 2024)
"A Mathematical Framework of Intelligence and Consciousness Based on Riemannian Geometry" (July 2024)
"Artificial Intelligence and Inherent Mathematical Difficulty" (August 2024)
These papers provide valuable insights into the mathematical underpinnings of AI, highlighting the ongoing efforts to establish rigorous theoretical foundations for the field.
tried to install whisperx to run on gpu, have trouble with cuda libaries. So, default to cpu instend.
On wahab, crun only work after module load xxxx
hqin@wahab-01 ~]$ ls
[hqin@wahab-01 ~]$ module load container_env
[hqin@wahab-01 ~]$ python -m venv whisperx_env
python: Command not found.
[hqin@wahab-01 ~]$ crun python -m venv whisperx_env
Did not work.
on odu waha, github can be used with RSA publication key.
For example
git clone git@github.com:hongqin/AI4Health.git
Zoom, start recording
Datacamp: registration
HPC survey (5 minutes)
The Research & Cloud Computing group (RCC) recently launched a survey regarding the need for training for research computing users. We would like to ask you to promote this survey among your students in classes and research groups as well as your colleagues, postdocs and other staff. The survey link is:
https://odu.co1.qualtrics.com/jfe/form/SV_9zCyC5peVHeQgl0
Please encourage them to submit responses by the end of January so we can use the findings to adjust offerings for this semester. Your help will be greatly appreciated!
CoLab
syllabus,
SoCrative ice break, anonymous
Github
AI101, tensor flow playground.
== did not finish. leave for next class.
skipp self-introduction video.
project team,
ChatGPT, anthropic,
all of us account
A primer on deep learning in genetics, classification model
https://colab.research.google.com/github/hongqin/Python-CoLab-bootcamp/blob/master/A_Primer_on_Deep_Learning_in_Genomics_Public.ipynb
rsync
https://www.cisecurity.org/advisory/multiple-vulnerabilities-in-rsync-could-allow-for-remote-code-execution_2025-007
https://zenodo.org/records/10405106
CS 795/895 DASC, AI for health and life sciences.
Type | Time | Days | Where | Date Range | Schedule Type | Instructors |
---|---|---|---|---|---|---|
Scheduled In-Class Meetings | 4:30 pm - 7:10 pm | F | ENGINEERING & COMP SCI BLDG 2120 | Jan 11, 2025 - Apr 28, 2025 | LECTURE | HONG QIN (P) |
There are some human single cell aging data,
https://arxiv.org/abs/2412.14135
OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-level performances on many challanging tasks that require strong reasoning this http URL has claimed that the main techinique behinds o1 is the reinforcement learining. Recent works use alternative approaches like knowledge distillation to imitate o1's reasoning style, but their effectiveness is limited by the capability ceiling of the teacher model. Therefore, this paper analyzes the roadmap to achieving o1 from the perspective of reinforcement learning, focusing on four key components: policy initialization, reward design, search, and learning. Policy initialization enables models to develop human-like reasoning behaviors, equipping them with the ability to effectively explore solution spaces for complex problems. Reward design provides dense and effective signals via reward shaping or reward modeling, which is the guidance for both search and learning. Search plays a crucial role in generating high-quality solutions during both training and testing phases, which can produce better solutions with more computation. Learning utilizes the data generated by search for improving policy, which can achieve the better performance with more parameters and more searched data. Existing open-source projects that attempt to reproduce o1 can be seem as a part or a variant of our roadmap. Collectively, these components underscore how learning and search drive o1's advancement, making meaningful contributions to the development of LLM.