Showing posts with label deep learning. Show all posts
Showing posts with label deep learning. Show all posts

Sunday, July 2, 2023

Shapley Attributed Ablation with Augmented Learning (ShapAAL)

 One new method that improves Shapley values for deep learning explanation is **Shapley Attributed Ablation with Augmented Learning (ShapAAL)**. It is a novel push-pull deep architecture where the subset selection through Shapley value attribution pushes the model to lower dimension while augmented training augments the learning capability of the model over unseen data¹.


ShapAAL demonstrates that a deep learning algorithm with a suitably selected subset of the seen examples or ablating the unimportant ones from the given limited training dataset can ensure consistently better classification performance under augmented training¹.


Would you like more information on this topic?


Source: Conversation with Bing, 7/2/2023

(1) When less is more powerful: Shapley value attributed ablation with .... https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277975.

(2) Explaining a series of models by propagating Shapley values. https://www.nature.com/articles/s41467-022-31384-3.

(3) [2104.02297] Shapley Explanation Networks - arXiv.org. https://arxiv.org/abs/2104.02297.

(4) GitHub - slundberg/shap: A game theoretic approach to explain the .... https://github.com/slundberg/shap.

flash attention

 FlashAttention is an **IO-aware exact attention algorithm** that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. It is designed to be **fast and memory-efficient**¹.


It is optimal for a range of SRAM sizes and requires fewer HBM accesses than standard attention¹. FlashAttention trains Transformers faster than existing baselines and enables longer context in Transformers, yielding higher quality models¹.


Would you like more information on this topic?


Source: Conversation with Bing, 7/2/2023

(1) [2205.14135] FlashAttention: Fast and Memory-Efficient Exact Attention .... https://arxiv.org/abs/2205.14135.

(2) arXiv:2205.14135v2 [cs.LG] 23 Jun 2022. https://arxiv.org/pdf/2205.14135.pdf.

(3) Introducing Lightning Flash — From Deep Learning Baseline To ... - Medium. https://medium.com/pytorch/introducing-lightning-flash-the-fastest-way-to-get-started-with-deep-learning-202f196b3b98.

(4) Attention in Neural Networks - 1. Introduction to attention mechanism. https://buomsoo-kim.github.io/attention/2020/01/01/Attention-mechanism-1.md/.

Monday, May 1, 2023

Prompt-based training and fine-tuning training

 

Prompt-based training and fine-tuning training are two different approaches to customizing a pre-trained language model for a specific task.


Fine-tuning involves training a pre-trained model on a new dataset to improve its performance on a specific task. This customization step will let you get more out of the service by providing higher quality results than what you can get just from prompt design, the ability to train on more examples than can fit into a prompt, lower-latency requests and token savings due to shorter prompts². In particular, while prompts for base models often consist of multiple examples (few-shot learning), for fine-tuning, each training example generally consists of a single input example and its associated output, without the need to give detailed instructions or include multiple examples in the same prompt¹.


On the other hand, prompt-based training involves designing prompts that elicit the desired behavior from a pre-trained model without updating its weights. The main difference between pretrain-finetuning and prompt-tuning is that the former makes the model fit the downstream task, while the latter elicits the knowledge from the model by prompting³.


Source: Conversation with Bing, 5/1/2023

(1) How to customize a model with Azure OpenAI Service - Azure OpenAI. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/fine-tuning.

(2) Fine-tuning - OpenAI API. https://platform.openai.com/docs/guides/fine-tuning.

(3) Brief Introduction to NLP Prompting | Finisky Garden. https://finisky.github.io/briefintrotoprompt.en/.

(4) Can prompt engineering methods surpass fine-tuning performance ... - Medium. https://medium.com/@lucalila/can-prompt-engineering-surpass-fine-tuning-performance-with-pre-trained-large-language-models-eefe107fb60e.

Wednesday, April 12, 2023

knowledge map and DCell

 it seems GO-based DCell deep learning method is very similar to knowledge map based machine learning approach. 

Saturday, December 24, 2022

data set, imaging

 https://radiopaedia.org/articles/imaging-data-sets-artificial-intelligence



Monday, July 18, 2022

Are Deep Neural Networks Dramatically Overfitted? March 14, 2019 · 22 min · Lilian Weng

 https://lilianweng.github.io/posts/2019-03-14-overfit/

Are Deep Neural Networks Dramatically Overfitted?

Monday, April 11, 2022

Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer’s disease neuropathologies

 

Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer’s disease neuropathologies

https://www.nature.com/articles/s41467-021-25680-7

All code for our study, including code to train the MD-AD model and to generate all figures included in the manuscript, are available at https://github.com/suinleelab/MD-AD (archived at https://doi.org/10.5281/zenodo.5043447).

Thursday, March 24, 2022

attention in deep learning

 in computer science: sequence refers to time-varying data. 

Biological sequence is not a time-varying data, but is similar to a sentence in NLP. 

Ref: 

https://theaisummer.com/attention/



Tuesday, February 15, 2022

NLP with transformer

natural language processing with transformers

https://transformersbook.com/  

https://github.com/nlp-with-transformers/notebooks


Friday, February 11, 2022

1D, 2D, 3D CNN

 

  • In 1D CNN, kernel moves in 1 direction. Input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series data.
  • In 2D CNN, kernel moves in 2 directions. Input and output data of 2D CNN is 3 dimensional. Mostly used on Image data.
  • In 3D CNN, kernel moves in 3 directions. Input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans, Video).

https://towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610 

Saturday, November 27, 2021

training set, validation set, and test test for machine learing / deep learning

 

from: https://machinelearningmastery.com/difference-test-validation-datasets/

– Training set: A set of examples used for learning, that is to fit the parameters of the classifier.

– Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.

– Test set: A set of examples used only to assess the performance of a fully-specified classifier.



Friday, October 29, 2021

LSTM is dead. Long lives transformers.

 


LSTM is hard to training, and generally not transferable. So, LSTM typically require a new labeled training set for a new task. 


https://youtu.be/S27pHKBEp30



Sunday, September 26, 2021

validation for deep learning


It seems that "validation data sets" may be used in different ways in practice.  

https://stackoverflow.com/questions/46308374/what-is-validation-data-used-for-in-a-keras-sequential-model


Qin: 

See 

https://www.tensorflow.org/guide/keras/train_and_evaluate#using_a_validation_dataset

model.fit(train_dataset, epochs=1, validation_data=val_dataset)

Thanks,

 

From TP: 

"After the meeting I wasn't 100% satisfied with our explanation of what the validation set is used for. I realized if we train using the training set, then applying the loss of the validation set to the training set is useless.

 

I found two articles to this question which sum up the answer very well:

 

 

To summarize,

 

You use the validation set to determine how well your model is learning during training. It is mostly used for hyperparameter training as you can retrain the model with different parameters and see how it compares. The idea is that it is also trained on so you can see how fast the model picks it up.

 

Overall though, we would use the Test set at the very end to gauge the accuracy of the model on completely new data it's never seen before.

 

To me, this seems like it can be done with the training set alone, however I understand the concept to just check a small subset of the training data to see how quickly the model will learn it. Since it isn't too difficult, I will incorporate this into the models and try to add some graphs to chart the training. This way, I can do some hyperparameter tuning once the transfer learning is set up and working.

 "