https://cyrin.atcorp.com/catalog/
cybersecurity catalog
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
In zoom breakout room, I went over student project one on one.
== pre-class to do:
calendar email invitation: including guests; done.
socrative questions (midterm exam, questions on contents from last lecture ).
update Canvas course materials, update learning objectives. assignments as needed. done.
Test-run code: Rmd -> HTML report with content. not today
learning objectives: not today
== In-class to do:
clean up desktop space, calendars,
announce midterm surveys.
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
go over student problems, go over final projects, add sample student project presentation videos.
need to generate final project sign up sheets.
It seems that "validation data sets" may be used in different ways in practice.
https://stackoverflow.com/questions/46308374/what-is-validation-data-used-for-in-a-keras-sequential-model
Qin:
See
https://www.tensorflow.org/guide/keras/train_and_evaluate#using_a_validation_dataset
model.fit(train_dataset, epochs=1, validation_data=val_dataset)
Thanks,
"After the meeting I wasn't 100% satisfied with our explanation of what the validation set is used for. I realized if we train using the training set, then applying the loss of the validation set to the training set is useless.
I found two articles to this question which sum up the answer very well:
To summarize,
You use the validation set to determine how well your model is learning during training. It is mostly used for hyperparameter training as you can retrain the model with different parameters and see how it compares. The idea is that it is also trained on so you can see how fast the model picks it up.
Overall though, we would use the Test set at the very end to gauge the accuracy of the model on completely new data it's never seen before.
To me, this seems like it can be done with the training set alone, however I understand the concept to just check a small subset of the training data to see how quickly the model will learn it. Since it isn't too difficult, I will incorporate this into the models and try to add some graphs to chart the training. This way, I can do some hyperparameter tuning once the transfer learning is set up and working.
== pre-class to do:
calendar email invitation: including guests; done.
socrative questions (midterm exam, questions on contents from last lecture ). done.
update Canvas course materials, update learning objectives. assignments as needed. done.
Test-run code: Rmd -> HTML report with content. done.
learning objectives: done.
== In-class to do:
clean up desktop space, calendars,
announce midterm surveys.
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
Redo CoLab (without GoogleDrive) with R
Graph permutation, part 2 on Z-score calculation.
Midterm exams, sharing tips
use network with known theoretical random permutation
this direction might be too theoretic and has very little practical importance.
== pre-class to do:
calendar email invitation: including guests; done.
socrative questions (midterm exam, questions on contents from last lecture ). done.
update Canvas course materials, update learning objectives. assignments as needed. done.
Test-run code: Rmd -> HTML report with content. done.
learning objectives: done.
== In-class to do:
clean up destk top space, calendars,
announce midterm surveys.
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
In the 48 continental states example, I found out that california and florida are always neighbors. So, it seems power-law seems to put the nodes in limited search spaces.
ssh -x user@ecs323gpustation
gs *pdf
a X windown poped up at remote local computer.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138544
It contained seq counts in a matrix.
reconstruct boolean networks
https://www.sciencedirect.com/science/article/pii/S2001037021003974?via%3Dihub
https://www.nitrd.gov/stem4all/
== pre-class to do:
calendar email invitation: including guests; done
socrative questions (midterm exam, questions on contents from last lecture ). done
update Canvas course materials, update learning objectives. assignments as needed. done.
Test-run code: Rmd -> HTML report with content. done
learning objectives: done
== In-class to do:
clean up destktop space, calendars,
announce midterm surveys.
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
CoLab and Google Drive for midterm projects.
https://twitter.com/FenixAmmunition
== pre-class to do:
calendar email invitation: including faculty peer evaluators. done
socrative questions (midterm exam, questions on contents from last lecture ). done.
update Canvas course materials, update learning objectives. assignments as needed. done
Test-run code: Rmd -> HTML report with content. done
== In-class to do:
clean up destktop space, calendars,
announce midterm surveys.
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
input, output.
CoLab and Google Drive for midterm projects.
https://github.com/Qin-Courses/Emojinator
mummer -maxmatch -n -l 100 ratg13.fasta prC31.fasta > ratg13-prc31.mumm
mummer -maxmatch -n -l 50 cov-fasta/ncbi-ref.fasta cov-fasta/ratg13.fasta > output/ncbiref-ratg13-09091457.mumm
mummerplot output/ncbiref-ratg13-09091457.mumm
mummerplot -x "[0,32000]" -y "[0,32000]" --png output/ncbiref-ratg13-09091457.mumm
under root
sudo apt isinstall autoconf, automake, libtool
install yaggo.
then download release tarball.
./configure prefix=/opt/mummer
make
sudo make install.
== pre-class to do:
calendar email invitation: including faculty peer evaluators. done.
socrative questions (rbind, cbind, merge, questions on contents from last lecture ). done.
update Canvas course materials, update learning objectives. assignments as needed. done
Test-run code: Rmd -> HTML report with content. done
== In-class to do:
clean up destktop space, calendars,
announce midterm projects.
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
R coding
slides,
make solutions
take even
https://www.gisaid.org/hcov19-variants/
Alpha peaked and then dropped.
Delta increasing and accelerating.
https://github.com/zonination/perceptions
https://rpubs.com/landon2000/790207
From:
Due to the naturally expanding genetic diversity of hCoV-19 viruses, GISAID introduced a nomenclature system for major clades, developed by Sebastian Maurer-Stroh et al, based on marker mutations within 8 high-level phylogenetic groupings from the early split of S and L, to the further evolution of L into V and G, and later of G into GH, GR and GV, and more recently GR into GRY.
GISAID clades are augmented with more detailed lineages assigned by the Phylogenetic Assignment of Named Global Outbreak LINeages (Pango lineage) tool, aiding in the understanding of patterns and determinants of the global spread of the pandemic strain causing COVID-19. A third effort uses a Year-Letter nomenclature to facilitate discussion of large-scale diversity patterns of hCoV-19 and label clades that persist for at least several months and have significant geographic spread.
The list of the marker variants is as follows:
S: C8782T,T28144C includes NS8-L84S
L: C241,C3037,A23403,C8782,G11083,G26144,T28144 (early clade markers in WIV04-reference sequence)
V: G11083T,G26144T NSP6-L37F + NS3-G251V
G: C241T,C3037T,A23403G includes S-D614G
GK: C241T,C3037T,A23403G,C22995A S-D614G + S-T478K
GH: C241T,C3037T,A23403G,G25563T includes S-D614G + NS3-Q57H
GR: C241T,C3037T,A23403G,G28882A includes S-D614G + N-G204R
GV: C241T,C3037T,A23403G,C22227T includes S-D614G + S-A222V
GRY: C241T,C3037T,21765-21770del,21991-21993del,A23063T,A23403G,G28882A includes S-H69del, S-V70del, S-Y144del, S-N501Y + S-D614G + N-G204R
SRS coronavirus Tor2 complete genome
https://www.ncbi.nlm.nih.gov/nuccore/NC_004718.3
https://www.ncbi.nlm.nih.gov/assembly/GCF_000864885.1/?&utm_source=None
Tor2 is the Toronto strain. Urbani strain is the Asian strain.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1125963/
"The differences between the two strains turn out to be minor. Both comprise about 30000 nucleotides, making the genome of SARS-CoV the largest of any RNA virus. It is possible but unlikely that the differences are a result of sequencing errors."
"The structural differences from other coronaviruses, and the lack of evidence of recombination, suggest that the SARS virus is not a result of other viruses swapping DNA with a previously benign coronavirus that already lived unnoticed in humans."
"Rather, the researchers say, the evidence indicates that SARS is genuinely new in humans and until recently inhabited an unknown animal species, probably in Guangdong province, China."
== pre-class to do:
calendar email invitation: including faculty peer evaluators. done
socrative questions (rbind, cbind, merge, questions on contents from last lecture ): done.
update Canvas course materials, update learning objectives. assignments as needed: done
Test-run code: Rmd -> HTML report with content. done
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording). Turn computer speaker on.
Socrative sign in
Review Chapter 3
R-COVID19 Chapter 4. weather. + socrative questions.
---
title: "yeast power study"
author: "H Qin"
date: "8/31/2021"
output:
pdf_document: default
html_document: default
---
```{r simulate genotypes}
rm(list=ls())
N = 150
nuc_means = rpois(10, lambda=10)
mit_means = rpois(15, lambda=10)
summary(nuc_means)
summary(mit_means)
b0= 0
b1= 1 # mito influence on phenotype
b2= 1 # nuclear influence on phenotype
b3 =0.2 # mit X nuc interaction influence on phenotype, p << 0.001
b3 =0.1 # mit X nuc interaction influence on phenotype, p=0.049
#b3 = 0.05 # p = 0.3
```
```{r simulate-phenotype}
debug = 0
phenotype_mit_nuc = function(b0, b1, b2, b3, mit_single_mean, nuc_single_mean, debug){
y = b0 + b1*mit_single_mean + b2*nuc_single_mean + b3*mit_single_mean * nuc_single_mean
if (debug > 0) {
print( paste("pmn:: mit_single_mean =", mit_single_mean, "nuc_single_mean", nuc_single_mean) )
}
return (y)
}
nuc_genotypes = sample(1:10, N, replace=TRUE)
mit_genotypes = sample(1:15, N, replace=TRUE)
y = 1:N
for ( i in 1:N ){
#print(paste("i:", i, "mit_genotypes[i]",mit_genotypes[i] ))
y[i] = phenotype_mit_nuc(b0, b1, b2, b3, mit_means[mit_genotypes[i]], nuc_means[nuc_genotypes[i]], debug=0) + rnorm(1)
}
tb = data.frame( cbind( y, mit_genotypes, nuc_genotypes))
tb$mit_genotypes = factor( tb$mit_genotypes)
tb$nuc_genotypes = factor( tb$nuc_genotypes)
summary(tb)
```
```{r}
library(nlme);
m1a = glm(y ~ mit_genotypes , tb, family='gaussian');
m2 = glm(y ~ mit_genotypes + nuc_genotypes , tb, family='gaussian');
m3 = glm( y ~ mit_genotypes + nuc_genotypes + mit_genotypes:nuc_genotypes, data=tb)
```
```{r}
#summary(m1a)
```
```{r}
anova( m1a, m2, test='F')
```
```{r}
summary(m2)
summary(m3)
anova(m2, m3, test='F')
```