Thursday, March 31, 2022

cpsc4900 Ch reliable programming, part 2

  

== In-class to do: 

For ECS Room 220, podium computer use channel 4 in Extron controller. 

restart computer to make sure ZOOM works. make sure Bluetooth mouse are turned off. 

clean up desktop space, calendars: 

ZOOM, live transcript (start video recording).  Turn the computer speaker on. 

* Ch8 reliable programming, part2  

* tech symposium submission, poster prep, 

For your poster preparation, I suggest changing the "Results" section to "Results and Progress", in which you can provide:

  • software architecture, 
  • product backlogs, and scrum activities  
  • screenshots of software prototypes as is,
  • sample characters or sample pages, 
  • a diagram or picture of the intended software, etc, 


Tuesday, March 29, 2022

Twitter sentiment analysis

 

https://github.com/QinLab/Covid19Politics_TwitterSentiment 

? cannot find seaborn after conda install. 

? cannot find us package. 



CPSC 4900, Ch8, reliable programming

 



== In-class to do: 

For ECS Room 220, podium computer use channel 4 in Extron controller. 

restart computer to make sure ZOOM works. make sure Bluetooth mouse are turned off. 

clean up desktop space, calendars: 

ZOOM, live transcript (start video recording).  Turn the computer speaker on. 

* tech symposium submission

* Start Ch8 socrative,  


Sunday, March 27, 2022

2022 spring, fall Qin lab funding acknowledgement

For covid19
HQ thanks the support of NSF PIPP #2200138, BD Spoke  #1761839, SFS #1663105,  REU #1852042 and  #2149956, and an internal CEACSE award and support from the Office of the Vice Chancellor of Research at the University of Tennessee at Chattanooga.


For REU: 
We thank NSF REU #1852042 and  #2149956


For image related works
We thank the support of NSF Career award #1453078 and #1720215, BD Spoke  #1761839, and internal support of the University of Tennessee at Chattanooga. 


For yeast aging:

We thank the support of NSF Career award #1453078 and #1720215, BD Spoke  #1761839,  REU   #1852042, and internal support of the University of Tennessee at Chattanooga. 


For Machine Learning
We thank the support of NSF Career award #1453078 and #1720215, BD Spoke  #1761839, and internal support of the University of Tennessee at Chattanooga. TP, DM thanks the support of a DoD capacity building grant. 




Saturday, March 26, 2022

R table return integer as chr

 







rerun with adf test on mobility

After filtering moblity on adf p-value, the number available mobility test are much smaller. 

On ECS323 Lambda server 
K02, 
18:28, auto_county_cajo_20200501-20210215-01percent-K02.R
18:33 auto_county_cajo_20200501-20210215-05percent-K02.R
18:39 auto_county_cajo_20200501-20210215-10percent-K02.R
19:02 K02, summary and plot udpated. Similar figures. 

K03
19:20.  auto_county_cajo_20200501-20210215-01percent-K03.R
19:27 auto_county_cajo_20200501-20210215-05percent-K03.R
19:32, auto_county_cajo_20200501-20210215-10percent-K03.R

K04
19:42 auto_county_cajo_20200501-20210215-01percent-K04.R
19:47 auto_county_cajo_20200501-20210215-05percent-K04.R
19:52 auto_county_cajo_20200501-20210215-10percent-K04.R

K05
20:15 auto_county_cajo_20200501-20210215-01percent-K05.R
20:32 auto_county_cajo_20200501-20210215-05percent-K05.R
20:44 auto_county_cajo_20200501-20210215-10percent-K05.R

K06
20:56 auto_county_cajo_20200501-20210215-01percent-K06.R
21:05 auto_county_cajo_20200501-20210215-05percent-K06.R
21:11 auto_county_cajo_20200501-20210215-10percent-K06.R

21:42, Redo all plots, 

21:43 - 23:11, Rmd to generate summary table based  cval=1% cajo test. There was an index error to find out the appropriate numerators. It should be chr '2' or chr '3', not integer 2 or 3. This is a very tricky error. 


Counties with adf_p-value > 0.2 for dewpoint

McKenzie

North Dakota

Williams

North Dakota

Fresno

California

Flathead

Montana

Madera

California

Mountrail

North Dakota

Stark

North Dakota

Juneau

Alaska

Pennington

South Dakota

Santa Cruz

Arizona

Meade

South Dakota

Lewis and Clark

Montana

Morton

North Dakota

Cascade

Montana

Tulare

California

Ward

North Dakota

Keith

Nebraska

Pima

Arizona

Mariposa

California

Douglas

Washington

Burleigh

North Dakota

Lake

Montana

Stutsman

North Dakota

 




Thursday, March 24, 2022

attention in deep learning

 in computer science: sequence refers to time-varying data. 

Biological sequence is not a time-varying data, but is similar to a sentence in NLP. 

Ref: 

https://theaisummer.com/attention/



cpsc4900, Ch7 part 2

   


== In-class to do: 

For ECS Room 220, podium computer use channel 4 in Extron controller. 

restart computer to make sure ZOOM works. make sure Bluetooth mouse are turned off. 

clean up desktop space, calendars: 

ZOOM, live transcript (start video recording).  Turn the computer speaker on. 

* transcard recruitment (on 3/29 next Tuesday)

* Start Ch7 socrative,  start with #16, finished. Next time , Chapter 8. 


Tuesday, March 22, 2022

federal interview tips

 critical thinking and writing skills are important

How to address problems with multiple solutions

At some federal agencies, people are expected to condense 50 pages materials into 2 page report. 



K05

13:33 10% run on lambda

13:40 5% cval run on lambda

13:49 1% cval run


plots done. 






K03

12:35 10% run on lamba

 12:42, 5% run

12:53, 1% run









auto mnemonic software for student projects

 auto mnemonic software


Monday, March 21, 2022

K01

Chose 1% critical value for plot, since the results are so good. Too good to be true? 






 




K02









 



K=4 amazing results

 







cpsc4900 Ch7

  


== In-class to do: 

For ECS Room 220, podium computer use channel 4 in Extron controller. 

restart computer to make sure ZOOM works. make sure Bluetooth mouse are turned off. 

clean up desktop space, calendars: 

ZOOM, live transcript (start video recording).  Turn the computer speaker on. 

* transcard recruitment

college tech symposium pre-registraion, title, team members, faculty name and course number, poster. Let students work in class to pre-registrate. 

* Start Ch7 socrative, stop at #15. next time start with #16


K06 run

Rt-workplace good results


Rt - T2m poor result




 




K08

K08 show good cointegration of Rt - workplace mobility, but not Rt- T2m











 


K17

19:17 10% run

19:21, 5% run

19:33, 1% run

19:50, generate table and plot

















K15 run

K14 - K16 seem to have good co-integrated Rt-T2m.  





K20 run

K20, Rt-T2m trends weakens. It seems to peak at K14-16. So, I should try K15.

17:00 lambda 10% run, 17:10 done

17:23, 5% run done. 

17:31, 01% run. done

generate table and plots. Add limits=c(0, 0.6) to plots. 












K18 run

 16:00 lamba, 5% R code run. HQ could not find the output file. 

16:25, 10% run on lambda. 







K16 conintegration

It seems K=16 days is the optimal lag between Rt and T2m. 










 




K14 cointegration


auto_county_cajo_20200501-20210215-01percent-K14.R 













Sunday, March 20, 2022

15 month time window, no Rt-T2m rank of 2

 


I finished another cointegration analysis for 15 month year time window, 2020 May 1 - 2021 August 15. No rank of two is found for cajo Rt ~ T2m. So, either vaccination really alter the correlation, or second winter has to be included. 




Nat. Biotechnol. | 使用深度学习来注释蛋白质宇宙

 

https://mp.weixin.qq.com/s/iJcS3AP2VXeJSAp_7ftNyA


Friday, March 18, 2022

unit root test

 https://en.wikipedia.org/wiki/Dickey%E2%80%93Fuller_test






cajo Rt - T2m pattern

 After filter adf p-value, NE cluster disappeared. Instead, Alaska and west coat are the new cluster. 

tb <- tb %>% filter( adf_p_Rt>0.01 & adf_p_t2m >0.05)



Thursday, March 17, 2022

cajo 2020 May 1 to 2021 Feb 15

 

(base) hqin@CS313BQin _run-20200501-20210215 % pwd

/Users/hqin/github/COVID19_transmission_MS/_run-20200501-20210215

(base) hqin@CS313BQin _run-20200501-20210215 % ll

total 1496

-rw-r--r--  1 hqin  staff   735K Mar 17 22:49 _autocajo_1-3151-2020-05-01-to-2021-02-15-cval=1percent-K=12trend-longrun.csv

-rw-r--r--  1 hqin  staff    11K Mar 17 22:34 auto_county_cajo_2020501-20210215-1percent-K12.R

Run can pause stochastically. Kill and restart R code three times, and the output is done. 




2020May1-Sep1 covid19 peak analysis

 pwd: 

(base) hqin@CS313BQin _run-2020501-20200901 % pwd

/Users/hqin/github/COVID19_transmission_MS/_run-2020501-20200901





















Rt ~ T2m is very strong!!!


In the summer, T2m are mostly non-stationary!!!


















adf.test error

tryCatch fixed this running error. 

i = 2677

Error in if (k < 0) stop("k negative") : 

  missing value where TRUE/FALSE needed

Calls: adf.test

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Execution halted

get_adf_pvalue (tb_local$era5_t2m)


Loving, Texas, There are zero cases in the time window. So, adf.test is meaningless. 

re-install R and Rstudio to remove segfaulty memory not mapped error.

tried reinstall R and packages

https://stackoverflow.com/questions/49190251/caught-segfault-memory-not-mapped-error-in-r


cajo, Carter, TN,

 Cater, TN

time window April 1 to September 1, 2020. 

adf.test(Rt) --> p-value <=0.01. !!!!









This means Rt is stationary and leads to numerical errors in further analysis. 



Turgeson defense

 

https://www.biorxiv.org/content/10.1101/2022.01.26.477967v1.abstract 


3 forms of FadL channel in Vibrio

docking studies

energetic analysis for transportation

https://en.wikipedia.org/wiki/Molecular_dynamics 



Saturday, March 12, 2022

Augmented Dickey–Fuller test

null H0: unit root is present in a time series sample, ie non-stationary. 
alternative H1: stationary. 

Stationary means that the characteristic is not changing over time. 


Reference: 
https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test
https://en.wikipedia.org/wiki/Stationary_process

two independent random walks, r=0 for Johansen test

 two independent random walks, r=0 for Johansen test

one random walk and one stationary series, r=1 for Johansen test. 



*** update cajo with new interpretation of rank r

HQ realized that rank=2 for two variable is actually a good combination.  

Moderate cointegration of Rt and Apple driving. 











Northern states are more likely to have Rt ~ T2m









Confirmed by Rt, T2m, Apple-driving










21:30. HQ added adf.test and found many small p-value for T2m, dewpoint and even Rt. Small p-value of ADF-test means alternative hypothesis cannot be rejected.  This means dewpoint are mostly stationary. 

> adftest
Augmented Dickey-Fuller Test
data:  x_nona[, 1]
Dickey-Fuller = -5.2384, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary



Friday, March 11, 2022

cajo results summary for publication, error found!


working code:   _generate_tables_for_publication.Rmd

10% critical value


Rt ~T2m 





# 35.6% counties has positive ca.jo Rt ~ T2m at 10% critical value, 28.2% at 5%, and 14.9% at 1% critical value. 


cajo Rt - Apple driving






cajo Rt - google workplace 

# cajo Rt, Apple driving, T2m




ERROR DISCOVERed. 

Based on https://www.quantstart.com/articles/Johansen-Test-for-Cointegrating-Time-Series-Analysis-in-R/,  when three time series are used, r = 3 means three time series are needed to generate a stationary series!!! 

So, its seems eigen values can be used to evaluate the combination of time series for stationary combinations.