Friday, April 30, 2021

review of open source deep learining segemation tools

 

Open-source deep-learning software for bioimage segmentation 

https://carpenterlab.broadinstitute.org/files/anne/files/mbc.e20-10-0660.pdf



Ready4R

 

Ready 4 R

https://ready4r.netlify.app/schedule/


Thursday, April 29, 2021

testing event effect in time series

 change point analysis can be applied. 

https://www.nature.com/articles/s41598-017-19067-2


https://www.researchgate.net/post/How_to_test_the_impact_of_single_event_ie_introducing_property_tax_on_housing_market_using_time_series_data

It seems that regression were discussed by treating the even as intervention. So, before and after t-test can be used. 

For covid19 analysis, we can co-integrate deaths ~ mobility with a window around holiday events. 



Sunday, April 25, 2021

late layers of neural networks are responsible for memorization

 

https://openreview.net/forum?id=V8jrrnwGbuc


time lag, Johansen test, perioditic time series

 

when random is linear, cross corelation on time lag only gave a gradual trend.  When periodic time series is gave, cross correlation gave an obvious cycling effect. 


see https://github.com/hongqin/cointegration-sandbox/blob/main/random-perioditic-walk.pdf













order of integration I(d)

 


Order of Integration I(d)

If you have unit roots in your time series, a series of successive differences, d, can transform the time series into one with stationarity. The differences are denoted by I(d), where d is the order of integration. Non-stationary time series that can be transformed in this way are called series integrated of order k. Usually, the order of integration is either I(0) or I(1); It’s rare to see values for d that are 2 or more.

From: https://www.statisticshowto.com/order-of-integration/

lag does not affect cointegration test

 



"Thus in theory you can test for cointegration either between y1,t and y2,t or y1,t and y2,th and the answer should be the same. Empirically the answer may differ, but hopefully you have a large enough sample so that it does not differ in your case."

Reference: 

https://stats.stackexchange.com/questions/285582/cointegration-with-lagged-variables

So, the lag is best analyzed from cross correlation analysis. 



Saturday, April 24, 2021

Co-dominant neutralizing epitopes make anti-measles immunity resistant to viral evolution

Co-dominant neutralizing epitopes make anti-measles immunity resistant to viral evolution 

https://www.cell.com/action/showPdf?pii=S2666-3791%2821%2900073-2

Measles led to poly-clonal antibody with multiple epitope response, so the antigenic shift has a very small chance. 

SARS-CoV2 and Influenza led to focused antibody response, so antigenic shift has a higher chance. 

Theoretically, a vaccine with poly-epitope response would be a better vaccine. 



Friday, April 23, 2021

PROGRAMMING QUANTUM COMPUTERS: A PRIMER WITH IBM Q AND D-WAVE EXERCISES

 PROGRAMMING QUANTUM COMPUTERS:  A PRIMER WITH IBM Q AND D-WAVE EXERCISES

https://sites.google.com/ncsu.edu/qc-tutorial/


somatic mutation patterns in huma body

PMCID: PMC6930685
PMID: 31874648

The somatic mutation landscape of the human body

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6930685/

 



CVPR learning representation via graph structured networks

 

CVPR 2020, The 2nd Tutorial on

Learning Representations via Graph-structured Networks

Slides and recorded videos are provided in this webpage.
Sunday afternoon (1PM - 4:30PM PDT), June 14, 2020


https://xiaolonw.github.io/graphnnv2/


graph meta learning

 

https://zitniklab.hms.harvard.edu/projects/G-Meta/


ongoing recombination in sars-cov-2 genomes

 ignatieva

https://www.biorxiv.org/content/10.1101/2021.01.21.427579v1.full.pdf


CR conserved pathway with deep learning

Classification:  Yeast deletion with CR effect: extend or shorten lifespan

Input: double deletion genetic interactions

neural networks: DCell or a hypothesis-based-graph model


Tuesday, April 20, 2021

Monday, April 19, 2021

CPSC2100, introduction to deep learning,

zoom

socractive

utc anonymous survey


 deep learning illustrated

Student interested in research. 

key concepts:

loss function

activation function

regularization

epoch




Friday, April 16, 2021

Artis ML intrusion detection


There are several GitHub repo with intrusion detection codes:

https://github.com/rahulvigneswaran/Intrusion-Detection-Systems

https://github.com/cstub/ml-ids 

 https://github.com/rambasnet/DeepLearning-IDS

https://github.com/vinayakumarr/Network-Intrusion-Detection

We only need to pick 2 of these methods that work for us. 

There is a Kaggle competition on intrusion detection, it provide training and testing data at

https://www.kaggle.com/c/fcupdf1920


For MS thesis, Artis may try two ML method on the Kaggle data set, compare their performance, which would good for your thesis. You can first start to try run the GitHub sample codes.

GitHub 

https://raw.githubusercontent.com/CynthiaKoopman/Network-Intrusion-Detection/master/KDDTrain%2B_2.csv




Thursday, April 15, 2021

GTEx tissue and cell specific gene networks in humans


Some of the best multi-view data are at NCBI GTEx site. 

https://www.gtexportal.org/home/datasets

 

These data sets however are quite complicated and need substantial analysis because they can be fed into deep learning models. 

 

nano -w GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct 

This file seems to show Ensembl gene ids and counts

There are genotypic data, so we can infer how SNPS -> expression -> phenotypic changes

The GTEx Consortium atlas of genetic regulatory effects across human tissues

https://www.biorxiv.org/content/10.1101/787903v1

ML papers using GTEx
* standard ML outperform deep learning
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3427-8 

human protein atlas

 
https://www.proteinatlas.org/humanproteome/cell

69 cell lines of human, 
deep RNA seq. 
protein localization by antibody profiling with immunofluorescence and confocal microscopy -> 35 locations


cpsc2100 review cipher coding, plotting

plotting with pylab. 

subplot position is bit tricky. line type is 'r--'


2:20pm -> 

 cipher coding review




Wednesday, April 14, 2021

virtual imaging lab

 https://www.scholastic.com/pathways/techlab/index.html



multi-view or multi-task genomics data sets

 yeast: 

scmd version morphology

fitness

lifespan, RLS, CLS


 GTX -> cell type specific genetcion regulation Science

Cell type–specific genetic regulation of gene expression across human tissues

https://science.sciencemag.org/content/sci/369/6509/eaaz8528.full.pdf


Tuesday, April 13, 2021

ISCB HPC AI competition

 

The International Society for Computational Biology is pleased to announce the HPC-AI Advisory Council (HPCAIAC) and National Supercomputing Centre (NSCC) Singapore 2021 APAC HPC-AI Competition.

High-performance computing and artificial intelligence are the most essential tools fueling the advancement of science. In order to handle the ever-growing demands for higher computation performance and the increase in the complexity of research problems, the world of scientific computing continues to re-innovate itself in a fast pace.

The competition encourages international teams in the APAC region to showcase their HPC and AI expertise in a friendly yet spirited competition that builds critical skills, professional relationships, competitive spirits and lifelong comraderies.

Important Deadlines

  • The HPC-AI Advisory Council will finalize the list of competitive teams by April 30th, 2021
  • The HPC-AI Advisory Council will announce the training plan on May 7th, 2021
  • All teams should submit presentation slides together with their code before October 15th 2021
  • The HPC-AI Advisory Council will announce the presentation review agenda on October 19th, 2021
  • The presentation review is scheduled from October 26 to November 6, 2021 via video conference. Each team will have 30 minutes to present and 30 minutes for Q&A.
  • The final results will be announced at the Supercomputing Conference 2021 in November 2021, in St. Louis, MO, USA.
  • The award ceremony will take place at the SupercomputingAsia 2022 conference in Singapore

The winning teams will receive the following awards*:

  • First Place (one team): $5,000 (USD) and a reserved spot representing APAC at the 2022 International ISC Student Cluster Competition
  • Second Place (one team): $3,000 (USD)
  • Third Place (one team): $1,500 (USD)
  • Merit Prize (up to three teams): $1,000 (USD)
  • Each team member will receive a certificate.

To become part of a team – register here - http://www.hpcadvisorycouncil.com/events/2021/APAC-AI-HPC/register.php



nih genomcis

 encoder, imputing, 


GAN to generate diverse data set, using European people to GAN under-represented data. 

there are thing that we know we don't know, there are things that we do not know we don't know. Calibration, test, re-calibrate

race, socio-economical, life style, 



Sunday, April 11, 2021

quantum alignment, shor's algorithm

 

convert Shor's algorithm:

https://qiskit.org/textbook/ch-algorithms/shor.html

Shor's algorihm uses some kind of transformation for prime number factorization, and use a good guess, to my intuitive understanding. 

Saturday, April 10, 2021

edge dynamics are easy to control in power-law and transcription networks

controlling edge dynamics in complex networks

 https://www.nature.com/articles/nphys2327

Nepusz, Vicsek

 "We also find that transcriptional regulatory networks are particularly easy to control. Analytic calculations show that networks with scale-free degree distributions have better controllability properties than uncorrelated networks, and positively correlated in- and out-degrees enhance the controllability of the proposed dynamics."

Qin: interesting point on the positively correlated in and out degree enhance controllability. Any implication in evolution of biological networks? 




High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications

 

High-Dimensional Data Analysis with Low-Dimensional Models:
Principles, Computation, and Applications

John Wright   and   Yi Ma,    Cambridge University Press

https://book-wright-ma.github.io/



Friday, April 9, 2021

PhD positions

 My lab has multiple PhD positions open. Research directions are in Data Science, machine learning, and biomedical big data. One research direction is to develop multi-view deep learning neural networks to integrate heterogeneous genomics data sets to predict aging and diseases. The second research direction is to develop MASK-RCNN models to detect and quantify cell objects, and develop graph-based algorithms to infer cell division events. The third research direction is to apply algebraic graph theory and develop deep-learning methods for single-cell genomics data analysis. Lab GitHub projects can be seen at github.com/hongqin


Please contact hong-qin@utc.edu or qinstat@gmail.com with your resume, transcripts, personal statement, and references.


UTC graduate school application

https://www.utc.edu/apply/

Select semester "Spring 2022"

create an account

In Enrollment, select "PhD Computational Science: Computer Science". 








 





Wednesday, April 7, 2021

Tuesday, April 6, 2021

cpsc2100 sorting algorithm

utc course evaluation


Poker example with video recording? (Poker card does not have zero, I used Joker card instead). 

Power point example? 

For selection sort, my poker card demo and Python output are not consistent, likely due to the implementation of the inner loop. 

I used bisection search as a reverse analogy for merge-sort. However, bisection search is O(log2(n)), but merge-sort has O( n log2(n)) 



 




PAIZA cloud

 write and learn programming languages

https://paiza.io/en


Monday, April 5, 2021

cpsc2100 search algorithm, part 1

 poker cards,  unorganized, organized, search for heart ace. 


key concepts: 
search space
unstructured data --> structured data  
bisection search

Poker cards:
clubs (♣), diamonds (♦), hearts (♥), and spades (♠)

Looking for a spade Jake 

bisection search




Skeletal muscle transcriptome in healthy aging

 

Skeletal muscle transcriptome in healthy aging

https://www.nature.com/articles/s41467-021-22168-2

 RNA was extracted and sequenced from muscle biopsies collected from 53 healthy individuals (22–83 years old) of the GESTALT study of the National Institute on Aging–NIH



multi-task deep learning

 implicit data augmentation

efficient training. 


cost function in deep learning

 

1. cost function should be able to average /sum up over each sample

2. cost function does not dependent on the activation function, which is required for back-propagation to work. 



reference:

https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications



Friday, April 2, 2021

image transformer

 

https://youtu.be/TrdevFK_am4



A transformer is a self-attention model, which seems to be my gene-network model based on gene expression!!!





multiview multi-task on plant images

 

for herbarium and plant images, segmented leaves, barks, and flowers can be separated into views, which can then be fed into feature extraction layers such as shapes and vines, followed by typical neurla networks or graph networks, and multi-task prediction on family-genus-species

multi-task training here make sense because the predicted, family, genus species is hierarchical by nature. 

 https://en.wikipedia.org/wiki/Species

Q: ImageNet Classification is inherently multi-task, is it? 

vision transformers with attention, image with 16x16 words

https://arxiv.org/abs/2010.11929








multitask multivew deeplearning on yeast fitness and lifespan

 multitask deep learning on yeast fitness and lifespan, morphology, integrated learning

multi-task learning on similar task can mitigate missing data. This is in contrast to transfer learning. 

basically, a vector output for multiple outcomes, 




Partek single cell analysis

 

https://www.partek.com/webinar-registration-follow-up/


Thursday, April 1, 2021

Past data science videos

 cpsc 4180

https://youtu.be/TEpD7aP9m3I 

https://youtu.be/qcEjJGjnIcA

https://youtu.be/maK7uYSK2Bk

https://youtu.be/gTAnRNaIBps

https://youtu.be/zrZXhRmGaMU


cpsc5180



cpsc 2100 computational complexity,

 Zoom recording, 

// no Breakout room in spring 2021


 * Basic running time types:

 Constant, linear, logliner, quadratic, polynomial, exponential. 

 * Dominant terms in Big O notation (addition and subtractions)

* law of multiplication

Why recursive is exponential? 

 socrative test on linear, log-linear, quadratic etc // converted to Canvas multiple-choice questions


TODO: in unit 13 plot, 

 review computational complexity with plot

 with plotting from unit 13plotting.ipynb. 

# poker cards as an example, organized vs unorganized for search and sort.