Tuesday, August 31, 2021

simulated correlated random values in R

 

The rnorm_multi() function makes multiple normally distributed vectors with specified parameters and relationships.

https://cran.r-project.org/web/packages/faux/vignettes/rnorm_multi.html


SGRP sanger yeast resequecing data download

 

http://www.moseslab.csb.utoronto.ca/sgrp/download.html

Data from Bergström et al. (2014)

Download the new SGRP sequences here
Download the annotations for the new SGRP here
Download the loss of function variants for the new SGRP here
Download the SIFT prediction for SNPs here
Download the reads here
Download the S. cerevisiae reference-based SNP calls and strain genotypes here
Download the S. paradoxus reference-based SNP calls and strain genotypes here


Monday, August 30, 2021

cpsc4180 Aug 30, Google mobility and covid19 analysis

  == pre-class to do: 

calendar email invitation: done

socrative questions (questions on contents from last lecture ): 

update Canvas course materials, update learning objectives. assignments as needed: done

Test-run code: Rmd -> HTML report with content. done

* GitHub has a connection problem today due to hurrican Ida?!. So, we have to download the jhu data manually. *

== In-class to do: 

clean up destktop space, calendars, 

ZOOM, live transcript (start video recording). 

Socrative sign in 

Review Chapter 2

R-COVID19  Chapter 3. Google Mobility. 

Only one student was in-person today, an Amy veteran. 


Thursday, August 26, 2021

MS thesis, fall 2021, important dates

 

  • Fall Primary Graduation Deadlines for Master Students. 
    • Oct 1- Last day to schedule your defense
    • Oct 15- Last day to defend your thesis
    • Nov 1- Deadline to submit thesis and thesis exam results for Dec. Graduation.
    • Nov 30- Last day for final edits to be submitted and accepted as requested by graduate school. 
    • Dec 10- Graduate Student Commencement. 

 

Sanger yeast sequencing project

 

https://www.sanger.ac.uk/research/projects/genomeinformatics/sgrp.html

Where is SNP matrix? 



Wednesday, August 25, 2021

R heatmap references

 

https://jokergoo.github.io/ComplexHeatmap-reference/book/index.html 

https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#colors



cpsc4180 data science programming, Day 4, August 25, R Covid, Chapter 2

 == pre-class to do: 

calendar email invitation: done

socrative questions (data camp, have you checked into course canvas site? basic R and data bias questions from last lecture ); done

update Canvas course materials, update learning objectives. assignments as needed. 

Test-run code: Rmd -> HTML report with content

== In-class to do: 

clean up destktop space, calendars, 

ZOOM, live transcript (start video recording). 

Socrative sign in 

Review Chapter 1

R-COVID19  Chapter 2. Qin went through this chapter in about 30 mintues, then asked student to modify the code to look at different counties for about 20 minutes. 

In the remaining 20 minutes, Qin posed some more challenging questions: How to find weird counties and weird states. This question is not clear to students, because their picked weird counties names like "whitefield". So, the weirdness was interpreted by the students as in the literacy sense not as computer science or data science. 

A student reminded me that homework assignment on Canvas does automatically showed on the modules, So, I have manually put them there, even though they show up in Canvas Calendars. 




Tuesday, August 24, 2021

SFS jobs

 

https://www.ziprecruiter.com/c/Tennessee-Valley-Authority/Job/Analyst,-Cybersecurity-511763/-in-Chattanooga,TN?jid=813d5b8dc243b15f&lvk=b7FPSZvocsofEH8OGaoYaA.--M9GzCimh- 

https://www.teksystems.com/en/locations


Term II, Fall 2021 course for incoming PhD student

TERM 2  fall 2021 courses, 


Special Topics
Lecture
CPSC5910R01344824Fall 2021 (Primary)
None
 - Type: Class Building: None Room: None Start Date: 10/05/2021 End Date: 11/29/2021
UT Chattanooga2 of 2 seats remain.
CECS Differential Course Fee

Repeatable Course


Individual Studies
Lecture 5
CPSC5997R051 TO 944926Fall 2021 (Primary)
None
 - Type: Class Building: None Room: None Start Date: 10/05/2021 End Date: 11/29/2021
UT Chattanooga1 of 1 seats remain.
CECS Differential Course Fee

Repeatable Course


Doctoral Research
Lecture
CPSC7950R071 TO 1244930Fall 2021 (Primary)
None
 - Type: Class Building: None Room: None Start Date: 10/05/2021 End Date: 11/29/2021
UT Chattanooga1 of 1 seats remain.
CECS Differential Course Fee

Repeatable Course


Research
Lecture
CPSC5998R011 TO 944825Fall 2021 (Primary)
None
 - Type: Class Building: None Room: None Start Date: 10/05/2021 End Date: 11/29/2021
UT Chattanooga5 of 5 seats remain.
CECS Differential Course Fee

Repeatable Course

 



 


Term II start on Oct 4. 10days before Sep 23. 

44745ESC5997R07C1.000-6.000Individual Studies TBA02-2000A K M Azad Hossain (P)




Monday, August 23, 2021

WORD problem

 

Someone inadvertently add  some blank lines into the header in WORD, probably online, which pushed my text down. 


cpsc 4180 8/23 Day 3, R-covid chapter 1

== pre-class to do: 

calendar email invitation 

socrative questions (datacamp, R RStudio installation, datacamp registration and assignment)

update Canvas course materials, update learning objectives. assignments as needed. 

Test-run code: Rmd -> HTML report with content

== In-class to do: 

ZOOM, live transcript (start video recording). 

Socrative sign in 

Review simple R, on CoLab and Rstudio Cloud. 

R-COVID19 Rmd code. Prepare to finish Chapter 1. 




Sunday, August 22, 2021

sites of conferences

IEEE conferendes 

https://cis.ieee.org/conferences/conference-calendar 

https://www.allconferencealert.com/las-vegas.html?page=2

https://conferenceindex.org/conferences



CSCI2021 las vegas

 

Las vegas computer science conference

https://www.american-cse.org/csci2021/paper_submission



GA adertisement



Graduate researcher positions are available to apply data science and machine learning to predict new coronavirus variants, develop interpretable deep learning methods to predict biological clocks and diseases. Candidates are expected to code in R and/or Python and have good writing skills. Our group’s recent publications appeared in Scientific Reports, GeroScience, and BMC Bioinformatics. Students from all backgrounds are welcome, as long as they have strong interests in the multi-disciplinary research projects and have sufficient skills. To apply, please send your resume, transcripts, and relevant supporting materials such as past coding projects or essays to hong-qin@utc.edu 


Thursday, August 19, 2021

ratg13 vs RmYN sequences

 




hqin@ECS323GPUStation:~$ mummer -maxmatch -n -l 100 ratg13.fasta RmYn.fasta 

# reading input file "ratg13.fasta" of length 29855

# construct suffix tree for sequence of length 29855

# (maximum reference length is 536870908)

# (maximum query length is 4294967295)

# CONSTRUCTIONTIME mummer ratg13.fasta 0.01

# reading input file "RmYn.fasta" of length 146512

# matching query-file "RmYn.fasta"

# against subject-file "ratg13.fasta"

> hCoV-19/bat/Yunnan/RmYN05/2020|EPI_ISL_1699445|2020-05-25

   29642     29492       177

> hCoV-19/bat/Yunnan/RmYN07/2020|EPI_ISL_1699447|2020-06-03

   15034     14968       119

   26254     26098       126

   29670     29542       149

> hCoV-19/bat/Yunnan/RmYN08/2020|EPI_ISL_1699448|2020-07-14

   29642     29492       177

> hCoV-19/bat/Yunnan/RmYN01/2019|EPI_ISL_412976|2019-06-25

> hCoV-19/bat/Yunnan/RmYN02/2019|EPI_ISL_412977|2019-06-25

     175       163       150

     782       770       116

    2938      2926       102

    4211      4196       113

    5666      5651       104

    6356      6341       153

    7166      7151       119

    7301      7286       107

    7469      7454       122

    8207      8192       110

    9572      9557       123

   10854     10839       112

   11540     11525       107

   12017     12002       173

   12608     12593       164

   12821     12806       149

   13199     13184       125

   13325     13310       193

   13621     13606       113

   13735     13720       107

   13918     13903       111

   14380     14365       104

   14846     14831       133

   15001     14986       116

   15401     15386       115

   15595     15580       116

   16084     16069       122

   16564     16549       158

   16766     16751       118

   16937     16922       214

   17452     17437       194

   17770     17755       116

   19093     19078       125

   20347     20332       104

   25304     25162       108

   25413     25271       118

   25702     25560       104

   26089     25947       164

   26254     26112       155

   28639     28488       122

   29284     29133       104

   29461     29310       180

   29648     29497       171




RaTG13

Bat coronavirus RaTG13, complete genome

GenBank: MN996532.2 

https://www.ncbi.nlm.nih.gov/nuccore/1916859392


align seq mummer on ecs323gpustation

Had trouble to compile mummer4

 Installed mummer3.23 from its binary

this worked:

$ mummer -maxmatch -n -l 100 ratg13.fasta prC31.fasta > ratg13-prc31.mumm

hqin@ECS323GPUStation:~$ cat ratg13-prc31.mumm 

> hCoV-19/bat/Yunnan/PrC31/2018|EPI_ISL_1098866|2018-08

    1103      1065       140

    4211      4173       158

    5666      5628       104

    6399      6361       103

    7160      7122       113

    7301      7263       107

    7469      7431       122

    7607      7569       119

   10854     10816       142

   11957     11919       233

   12320     12282       131

   26317     26209       115

   27959     27858       109

   28480     28379       101

   28666     28565       110

   29266     29165       124

   29461     29360       155

   29708     29606       111



> hCoV-19/Wuhan/WH01/2019|EPI_ISL_406798|2019-12-26

     191       166       134

    4291      4269       120

    5728      5706       105

    6734      6712       146

    7160      7138       104

    7739      7717       101

    8159      8137       104

   10484     10462       106

   10854     10832       124

   11522     11500       107

   12320     12298       116

   12608     12586       204

   12813     12791       157

   13325     13303       193

   13621     13599       113

   13735     13713       107

   13918     13896       111

   14263     14241       116

   14485     14463       111

   14626     14604       158

   14845     14823       134

   15001     14979       203

   15373     15351       143

   15595     15573       116

   15811     15789       113

   16132     16110       179

   16342     16320       137

   16723     16701       140

   16903     16881       104

   17152     17130       146

   19093     19071       335

   20938     20916       116

   21908     21886       131

   25840     25830       119

   26038     26028       115

   26154     26144       162

   26317     26307       200

   27117     27110       132

   27742     27736       139

   28151     28145       127

   28480     28474       101

   28585     28579       203

   29500     29494       141



hqin@ECS323GPUStation:~$ mummer -maxmatch -n -l 200 ratg13.fasta ncbi-ref.fasta > ratg-ncib.mumm

# reading input file "ratg13.fasta" of length 29855

# construct suffix tree for sequence of length 29855

# (maximum reference length is 536870908)

# (maximum query length is 4294967295)

# CONSTRUCTIONTIME mummer ratg13.fasta 0.01

# reading input file "ncbi-ref.fasta" of length 29903

# matching query-file "ncbi-ref.fasta"

# against subject-file "ratg13.fasta"

# COMPLETETIME mummer ratg13.fasta 0.01

# SPACE mummer ratg13.fasta 0.06

hqin@ECS323GPUStation:~$ cat ratg-ncib.mumm 

> NC_045512.2

   12608     12611       204

   15001     15004       203

   19093     19096       335

   26317     26332       200

   28585     28604       203



hqin@ECS323GPUStation:~$ mummer -maxmatch -n -l 200 ncbi-ref.fasta ratg13.fasta > ncbi-ratg.mumm

# reading input file "ncbi-ref.fasta" of length 29903

# construct suffix tree for sequence of length 29903

# (maximum reference length is 536870908)

# (maximum query length is 4294967295)

# CONSTRUCTIONTIME mummer ncbi-ref.fasta 0.01

# reading input file "ratg13.fasta" of length 29855

# matching query-file "ratg13.fasta"

# against subject-file "ncbi-ref.fasta"

# COMPLETETIME mummer ncbi-ref.fasta 0.01

# SPACE mummer ncbi-ref.fasta 0.06

hqin@ECS323GPUStation:~$ cat ncbi-ratg.mumm 

> hCoV-19/bat/Yunnan/RaTG13/2013|EPI_ISL_402131|2013-07-24

   12611     12608       204

   15004     15001       203

   19096     19093       335

   26332     26317       200

   28604     28585       203