Thursday, September 30, 2021

Cyber Range, CYRIN

https://cyrin.atcorp.com/catalog/

cybersecurity catalog

Wednesday, September 29, 2021

cpsc4180 midterm Q&A

In zoom breakout room, I went over student project one on one.

Tuesday, September 28, 2021

estimate house roof area from satellite image

estimate house roof area from satellite images

Monday, September 27, 2021

cpsc4180 Q&A midterm projects

== pre-class to do:

calendar email invitation: including guests; done.

socrative questions (midterm exam, questions on contents from last lecture ).

update Canvas course materials, update learning objectives. assignments as needed. done.

Test-run code: Rmd -> HTML report with content. not today

learning objectives: not today

== In-class to do:

clean up desktop space, calendars,

announce midterm surveys.

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

go over student problems, go over final projects, add sample student project presentation videos.

need to generate final project sign up sheets.

Sunday, September 26, 2021

validation for deep learning

It seems that "validation data sets" may be used in different ways in practice.

https://stackoverflow.com/questions/46308374/what-is-validation-data-used-for-in-a-keras-sequential-model

Qin:

See

https://www.tensorflow.org/guide/keras/train_and_evaluate#using_a_validation_dataset

model.fit(train_dataset, epochs=1, validation_data=val_dataset)

Thanks,

From TP:

"After the meeting I wasn't 100% satisfied with our explanation of what the validation set is used for. I realized if we train using the training set, then applying the loss of the validation set to the training set is useless.

I found two articles to this question which sum up the answer very well:

https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set

https://datascience.stackexchange.com/questions/18339/why-use-both-validation-set-and-test-set

To summarize,

You use the validation set to determine how well your model is learning during training. It is mostly used for hyperparameter training as you can retrain the model with different parameters and see how it compares. The idea is that it is also trained on so you can see how fast the model picks it up.

Overall though, we would use the Test set at the very end to gauge the accuracy of the model on completely new data it's never seen before.

To me, this seems like it can be done with the training set alone, however I understand the concept to just check a small subset of the training data to see how quickly the model will learn it. Since it isn't too difficult, I will incorporate this into the models and try to add some graphs to chart the training. This way, I can do some hyperparameter tuning once the transfer learning is set up and working.

Thursday, September 23, 2021

UTC new password link

https://ds.tennessee.edu/passwords/login.asp?redirect=%2Fpasswords%2Fpassword%2Easp

Wednesday, September 22, 2021

cpsc4180 9/22 Colab, permutation-Zscore, midterm exam

== pre-class to do:

calendar email invitation: including guests; done.

socrative questions (midterm exam, questions on contents from last lecture ). done.

update Canvas course materials, update learning objectives. assignments as needed. done.

Test-run code: Rmd -> HTML report with content. done.

learning objectives: done.

== In-class to do:

clean up desktop space, calendars,

announce midterm surveys.

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

Redo CoLab (without GoogleDrive) with R

Graph permutation, part 2 on Z-score calculation.

Midterm exams, sharing tips

Monday, September 20, 2021

ms02 randomness verification

use network with known theoretical random permutation

this direction might be too theoretic and has very little practical importance.

single cell multiplexed and image and proteomic data

Automated assignment of cell identity from single-cell multiplexed imaging and proteomic data

Geuenich Michael; Hou Jinyu; Lee Sunyun; Ayub Shanza; Jackson Hartland; Campbell Kieran

https://zenodo.org/record/5156049#.YUjywGZKjzc

cpsc4180 9/20 data science, CoLab, graph permutation

== pre-class to do:

calendar email invitation: including guests; done.

socrative questions (midterm exam, questions on contents from last lecture ). done.

update Canvas course materials, update learning objectives. assignments as needed. done.

Test-run code: Rmd -> HTML report with content. done.

learning objectives: done.

== In-class to do:

clean up destk top space, calendars,

announce midterm surveys.

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

In the 48 continental states example, I found out that california and florida are always neighbors. So, it seems power-law seems to put the nodes in limited search spaces.

Sunday, September 19, 2021

ssh -x ecs323gpustation

ssh -x user@ecs323gpustation

gs *pdf

a X windown poped up at remote local computer.

Saturday, September 18, 2021

single cell RNA, human aging

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138544

It contained seq counts in a matrix.

reconstruct boolean networks

https://www.sciencedirect.com/science/article/pii/S2001037021003974?via%3Dihub

eureka covid19 CoLab Python

Thursday, September 16, 2021

Gov internship search

https://www.nitrd.gov/stem4all/

Wednesday, September 15, 2021

cpsc4180 9/15 simple statistic with us election results

== pre-class to do:

calendar email invitation: including guests; done

socrative questions (midterm exam, questions on contents from last lecture ). done

update Canvas course materials, update learning objectives. assignments as needed. done.

Test-run code: Rmd -> HTML report with content. done

learning objectives: done

== In-class to do:

clean up destktop space, calendars,

announce midterm surveys.

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

CoLab and Google Drive for midterm projects.

UTC MS thesis

Dickerson, Jessica <Jessica-Dickerson@utc.edu>

The requirements to finish your thesis is to have six credit of satisfactory progress and defend your thesis. If you are at the six credit limit then you need to continue registering for thesis credit hours until you defend your thesis. Note that you need to register for at least two credit hours of thesis in the semester you are defending in.

As for evaluating your thesis credits, the evaluation criteria is decided by your thesis advisor. So I would encourage you to discuss this with your advisor and see what are the final deliverables for a satisfactory progress grade.

Tuesday, September 14, 2021

deep learning in bio sequence analysis and modeling

https://github.com/hussius/deeplearning-biology

https://colab.research.google.com/drive/17E4h5aAOioh5DiTo7MZg4hpL6Z_0FyWr

https://github.com/nageshsinghc4/DNA-Sequence-Machine-learning

https://github.com/search?q=dna+sequence+deep+learning

https://www.researchgate.net/publication/301703031_DNA_Sequence_Classification_by_Convolutional_Neural_Network

STIGs

security technical implementation guiges (STIGS)

https://public.cyber.mil/stigs/downloads/

Monday, September 13, 2021

COVID19 accounts

https://twitter.com/FenixAmmunition

CPSC 4180 R, input output

== pre-class to do:

calendar email invitation: including faculty peer evaluators. done

socrative questions (midterm exam, questions on contents from last lecture ). done.

update Canvas course materials, update learning objectives. assignments as needed. done

Test-run code: Rmd -> HTML report with content. done

== In-class to do:

clean up destktop space, calendars,

announce midterm surveys.

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

input, output.

CoLab and Google Drive for midterm projects.

Saturday, September 11, 2021

Emojinator, Python

https://github.com/Qin-Courses/Emojinator

Thursday, September 9, 2021

mummer, mummerplot ecs323gpu

mummer -maxmatch -n -l 100 ratg13.fasta prC31.fasta > ratg13-prc31.mumm

mummer -maxmatch -n -l 50 cov-fasta/ncbi-ref.fasta cov-fasta/ratg13.fasta > output/ncbiref-ratg13-09091457.mumm

mummerplot output/ncbiref-ratg13-09091457.mumm

mummerplot -x "[0,32000]" -y "[0,32000]" --png output/ncbiref-ratg13-09091457.mumm

JC Question: Do mutation hotspots cover spike and Mpro regions?

Based on NCBI NC045512:

S gene is 21563 - 25384, which contain a large gap.

3CL nsp5Ais 10055 - 10972. The maxmatch results are:

ratg wuhan

9227 9339 57

10487 10615 77

10661 10791 63

So, the beginning and middle sections of 3CL are mutation hotspots too.

MUMMER installation ecs323gpu workstation

under root

sudo apt isinstall autoconf, automake, libtool

install yaggo.

then download release tarball.

./configure prefix=/opt/mummer

make

sudo make install.

Wednesday, September 8, 2021

Student project,

I uploaded the notebook I'm working with to my own github. I can also add it to the one you shared with me in our initial meeting.

https://github.com/rwedell/covid/blob/main/COVID-19%20Variant%20Distribution.ipynb

CPSC 4180 Sep 8, R coding

== pre-class to do:

calendar email invitation: including faculty peer evaluators. done.

socrative questions (rbind, cbind, merge, questions on contents from last lecture ). done.

update Canvas course materials, update learning objectives. assignments as needed. done

Test-run code: Rmd -> HTML report with content. done

== In-class to do:

clean up destktop space, calendars,

announce midterm projects.

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

R coding

slides,

make solutions

take even

Tuesday, September 7, 2021

online SVG neural network drawing and SVG editor

http://alexlenail.me/NN-SVG/index.html

https://boxy-svg.com/app

Monday, September 6, 2021

GISAID tracking variant

https://www.gisaid.org/hcov19-variants/

Alpha peaked and then dropped.

Delta increasing and accelerating.

Sunday, September 5, 2021

advanced R book

https://adv-r.hadley.nz/index.html

stacked barplots in ggplot2

https://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2.html

Friday, September 3, 2021

ggplot overlay, joy plots

https://github.com/zonination/perceptions

ggplot2 gallery

https://exts.ggplot2.tidyverse.org/gallery/

Sars-cov-2 lineages A and B

https://virological.org/t/evidence-against-the-veracity-of-sars-cov-2-genomes-intermediate-between-lineages-a-and-b/754

Thursday, September 2, 2021

Landon REU 2021 GISAID resampling

https://rpubs.com/landon2000/790207

GISAID lineage information

From:

https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/

Clade and lineage nomenclature aids in genomic epidemiology studies of active hCoV-19 viruses

Due to the naturally expanding genetic diversity of hCoV-19 viruses, GISAID introduced a nomenclature system for major clades, developed by Sebastian Maurer-Stroh et al, based on marker mutations within 8 high-level phylogenetic groupings from the early split of S and L, to the further evolution of L into V and G, and later of G into GH, GR and GV, and more recently GR into GRY.

GISAID clades are augmented with more detailed lineages assigned by the Phylogenetic Assignment of Named Global Outbreak LINeages (Pango lineage) tool, aiding in the understanding of patterns and determinants of the global spread of the pandemic strain causing COVID-19. A third effort uses a Year-Letter nomenclature to facilitate discussion of large-scale diversity patterns of hCoV-19 and label clades that persist for at least several months and have significant geographic spread.

The list of the marker variants is as follows:

S: C8782T,T28144C includes NS8-L84S
L: C241,C3037,A23403,C8782,G11083,G26144,T28144 (early clade markers in WIV04-reference sequence)
V: G11083T,G26144T NSP6-L37F + NS3-G251V
G: C241T,C3037T,A23403G includes S-D614G
GK: C241T,C3037T,A23403G,C22995A S-D614G + S-T478K
GH: C241T,C3037T,A23403G,G25563T includes S-D614G + NS3-Q57H
GR: C241T,C3037T,A23403G,G28882A includes S-D614G + N-G204R
GV: C241T,C3037T,A23403G,C22227T includes S-D614G + S-A222V
GRY: C241T,C3037T,21765-21770del,21991-21993del,A23063T,A23403G,G28882A includes S-H69del, S-V70del, S-Y144del, S-N501Y + S-D614G + N-G204R

SARS-COV-1 info

SRS coronavirus Tor2 complete genome

https://www.ncbi.nlm.nih.gov/nuccore/NC_004718.3

https://www.ncbi.nlm.nih.gov/assembly/GCF_000864885.1/?&utm_source=None

Tor2 is the Toronto strain. Urbani strain is the Asian strain.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1125963/

"The differences between the two strains turn out to be minor. Both comprise about 30000 nucleotides, making the genome of SARS-CoV the largest of any RNA virus. It is possible but unlikely that the differences are a result of sequencing errors."

"The structural differences from other coronaviruses, and the lack of evidence of recombination, suggest that the SARS virus is not a result of other viruses swapping DNA with a previously benign coronavirus that already lived unnoticed in humans."

"Rather, the researchers say, the evidence indicates that SARS is genuinely new in humans and until recently inhabited an unknown animal species, probably in Guangdong province, China."

Wednesday, September 1, 2021

CPSC 4180 data science Sep 1, weather + COVID19

== pre-class to do:

calendar email invitation: including faculty peer evaluators. done

socrative questions (rbind, cbind, merge, questions on contents from last lecture ): done.

update Canvas course materials, update learning objectives. assignments as needed: done

Test-run code: Rmd -> HTML report with content. done

== In-class to do:

clean up destktop space, calendars,

ZOOM, live transcript (start video recording). Turn computer speaker on.

Socrative sign in

Review Chapter 3

R-COVID19 Chapter 4. weather. + socrative questions.

yeast quantitative genetics cross study

---

title: "yeast power study"

author: "H Qin"

date: "8/31/2021"

output:

pdf_document: default

html_document: default

---

```{r simulate genotypes}

rm(list=ls())

N = 150

nuc_means = rpois(10, lambda=10)

mit_means = rpois(15, lambda=10)

summary(nuc_means)

summary(mit_means)

b0= 0

b1= 1 # mito influence on phenotype

b2= 1 # nuclear influence on phenotype

b3 =0.2 # mit X nuc interaction influence on phenotype, p << 0.001

b3 =0.1 # mit X nuc interaction influence on phenotype, p=0.049

#b3 = 0.05 # p = 0.3

```

```{r simulate-phenotype}

debug = 0

phenotype_mit_nuc = function(b0, b1, b2, b3, mit_single_mean, nuc_single_mean, debug){

y = b0 + b1*mit_single_mean + b2*nuc_single_mean + b3*mit_single_mean * nuc_single_mean

if (debug > 0) {

print( paste("pmn:: mit_single_mean =", mit_single_mean, "nuc_single_mean", nuc_single_mean) )

}

return (y)

}

nuc_genotypes = sample(1:10, N, replace=TRUE)

mit_genotypes = sample(1:15, N, replace=TRUE)

y = 1:N

for ( i in 1:N ){

#print(paste("i:", i, "mit_genotypes[i]",mit_genotypes[i] ))

y[i] = phenotype_mit_nuc(b0, b1, b2, b3, mit_means[mit_genotypes[i]], nuc_means[nuc_genotypes[i]], debug=0) + rnorm(1)

}

tb = data.frame( cbind( y, mit_genotypes, nuc_genotypes))

tb$mit_genotypes = factor( tb$mit_genotypes)

tb$nuc_genotypes = factor( tb$nuc_genotypes)

summary(tb)

```

```{r}

library(nlme);

m1a = glm(y ~ mit_genotypes , tb, family='gaussian');

m2 = glm(y ~ mit_genotypes + nuc_genotypes , tb, family='gaussian');

m3 = glm( y ~ mit_genotypes + nuc_genotypes + mit_genotypes:nuc_genotypes, data=tb)

```

```{r}

#summary(m1a)

```

```{r}

anova( m1a, m2, test='F')

```

```{r}

summary(m2)

summary(m3)

anova(m2, m3, test='F')

```