Tuesday, November 29, 2016

*** Control systems engineering, control theory, Laplace transform, observability,

A control system has an input, a process, and an output. It can be open loop or closed loop. Open loop systems do not monitor or correct the output. Closed loop systems can monitor output and make adjustments.

linear time-invariant differential equation


Transfer function is another way of mathematically modeling a system.  Transfer function can be derived from the linear, time-invariant differential equation using Laplace transform. Transfer function can only be used for linear systems. (Lapalace transformation was developed as a technique to solve differential equations).

State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.

Test signals with different waveforms can be used to study systems.

The basic analysis of a system is to evaluate the time response of a system.

A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.

In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.

Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables 

Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?

Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable. 
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t). 



























































cpsc 5210

RSA
project nolvety,



genome compression


https://en.wikipedia.org/wiki/Compression_of_Genomic_Re-Sequencing_Data

Number theory, data compression for NGS data

Can RSA or other methods be used for NGS sequence compression?

RSA


https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Operation

lab meeting

1a) DE gene lists for RNAseq project
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?

It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is

and 

So, "Gene ID" is a standard NCBI number.

1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).

2) time-lapsed image analysis for yeast replicative lifespan
   We can use ImageJ, MATlab or R.

Saturday, November 26, 2016

simcenter qinlab tools

"module load qinlab" can add these to $PATH

hqin@ridgeside[~/demo.lgf/
RNAseq.hisat2]->ls /usr/local/qinlab/
bin                            samtools-1.3.1.tar.bz2
hisat2                         share
hisat2-2.0.5                   stringtie
hisat2-2.0.5-Linux_x86_64.zip  stringtie-1.3.1c.Linux_x86_64
samtools-1.3.1                 stringtie-1.3.1c.Linux_x86_64.tar.gz

Monday, November 21, 2016

R libarary ridgeside (simcenter)


Global
ls /usr/local/lib/R/site-library/

Local

SimCenter mailing address


University of Tennessee at Chattanooga
701 E. 701 ML King Blvd
Chattanooga, TN 37403

UTC teaching evaluations


http://www.utc.edu/planning-evaluation-institutional-research/student-rating-of-faculty/index.php

RNAseq software installation on qbert or Simcenter clusters

====================For hisat2 and supporting programs
Install hisat2
ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip

Install stringtie 1.3.1c
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.1c.Linux_x86_64.tar.gz

Install samtools
https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2
The above link is from http://www.htslib.org/download/
See also https://github.com/samtools/samtools/releases/


====================For R packages
Under shell, run R

Inside of R:
 source("https://bioconductor.org/biocLite.R")
 biocLite('ballgown')

 install.packages('devtools') #A USA mirror site may be chosen

 library(devtools)
 devtools::install_github('alyssafrazee/RSkittleBrewer') 


========== Testing the installation
Download the test files and codes from
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/

under shell
$ ./rnaseq_pipeline.config.sh
$./rnaseq_pipeline.sh out

=========Additional R packages
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())

#A prompt will ask for a mirror site. Any site from USA should work.

big java sites



http://bcs.wiley.com/he-bcs/Books?action=index&bcsId=9799&itemId=1119056446

Wiley rep


Rep NameContact Details
MARY VANN - 0065
Phone: 6175041370
Email:  MVANN@WILEY.COM

Friday, November 18, 2016

bibtex doi bug


in qin_network.bib, I added a reference with DOI field. This filed generates an error in *bbl file using $bibtex$.  I removed the DOI fileds and the bug disappeared.


Wednesday, November 16, 2016

toread, Graph Metrics for Temporal Networks - Springer


http://www.springer.com/cda/content/document/cda_downloaddocument/9783642364600-c1.pdf?SGWID=0-0-45-1393604-p174915729

toread: An Introduction to Temporal Graph Data Management1


https://www.cs.umd.edu/sites/default/files/scholarly_papers/Khurana_SchPaper_1.pdf

toread Path Problems in Temporal Graphs


http://www.vldb.org/pvldb/vol7/p721-wu.pdf

Path Problems in Temporal Graphs
Huanhuan Wu∗, James Cheng∗ , Silu Huang∗, Yiping Ke#, Yi Lu∗, Yanyan Xu∗ ∗Department of Computer Science and Engineering, The Chinese University of Hong Kong {hhwu,jcheng,slhuang,ylu,yyxu}@cse.cuhk.edu.hk #Institute of High Performance Computing, Singapore

safety training, UTC

hazardous materials

gasoline can be easily ignited, but diesel is not.

universal waste:
florescent lamp should be recycled.
computer batteries.
motor batteries

Dot hazard marking
Global harmonization container markings
NFPA rating explanation guide, NFPA 704, HMIS

423 425 HELP


Tuesday, November 15, 2016

integrating gene expression and network, a reference collection


Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF.


Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated.
Ideker02 seems to combine K-means and simulated annealing for network clustering. 


Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289.

Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272.

Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.



Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221


Integrating gene expression and protein-protein interaction network to prioritize cancer-associated
genes, Chao Wu, Jun Zhu  and Xuegong Zhang

http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en


Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation

http://www.biomedcentral.com/1755-8794/7/S2/S4

http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html

http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html

http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html

From Ma, 2007 Bioinformatics CGI paper:
Gene expression data and protein interaction data have been
integrated for gene function prediction. For example, Ideker
et al. (2002) used protein interaction data and gene expression
data to screen for differentially expressed subnetworks between
different conditions
. In Tornow and Mewes (2003) and Segal
et al. (2003), gene expression data and protein interactions are
used to group genes into functional modules. These methods provide
insights into the regulatory modules of the whole networks at
the systems biology level. However, it is not clear how to adapt their
methods to identify genes contributing to the phenotype of interest.
Morrison et al. (2005) adapted the Google search engine to prioritize
genes for a phenotype by integrating gene expression profiles
and protein interaction data. However, the algorithm ignores the
information from proteins linked to the target protein through other
intermediate proteins, referred to in the rest of this paper as indirect
neighbors.

Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM. 

X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002


WGCNA: an R package for weighted correlation network analysis.




Monday, November 14, 2016

RNAseq demo (hisat2, stringtie) error at GBitVec: index 7 out of bounds (size 7) (osX and linux)



Data and codes are downloaded from ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/ 



Byte-5:sva hqin$ ps
  PID TTY           TIME CMD
49035 ttys000    0:00.08 -bash
51361 ttys000    0:00.01 bash ./rnaseq_pipeline.sh out
51377 ttys000    0:00.03 bash ./rnaseq_pipeline.sh out
51378 ttys000    0:00.00 tee ./run.log
52036 ttys000    0:00.07 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52044 ttys000    0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52045 ttys000    0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52046 ttys000    3:02.29 /Users/hqin/bin/hisat2-align-s --wrapper basic-0 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisa
52047 ttys000    0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_2.fastq.gz
52048 ttys000    0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_1.fastq.gz
  492 ttys001    0:00.21 -bash
51812 ttys002    0:00.02 -bash

51867 ttys002    0:11.01 tar xvfz hg38_tran.tar.gz




Byte-5:samtools hqin$ cd
Byte-5:~ hqin$ cd demo.lgf/
Byte-5:demo.lgf hqin$ cd RNAseq.hisat2/
Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
ERROR: samtools program not found, please edit the configuration script.
Byte-5:RNAseq.hisat2 hqin$ source /Users/hqin/.bash_profile
Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] #> START:  ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] Processing sample: ERR188044_chrX
[2016-11-14 15:07:24]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:07:56]    * Alignments conversion (SAMTools)
[2016-11-14 15:08:40]    * Assemble transcripts (StringTie)
[2016-11-14 15:08:51] Processing sample: ERR188104_chrX
[2016-11-14 15:08:51]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:09:37]    * Alignments conversion (SAMTools)
[2016-11-14 15:10:24]    * Assemble transcripts (StringTie)
[2016-11-14 15:10:36] Processing sample: ERR188234_chrX
[2016-11-14 15:10:36]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:11:11]    * Alignments conversion (SAMTools)
[2016-11-14 15:12:23]    * Assemble transcripts (StringTie)
[2016-11-14 15:12:45] Processing sample: ERR188245_chrX
[2016-11-14 15:12:45]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:13:45]    * Alignments conversion (SAMTools)
[2016-11-14 15:14:40]    * Assemble transcripts (StringTie)
[2016-11-14 15:14:51] Processing sample: ERR188257_chrX
[2016-11-14 15:14:51]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:15:42]    * Alignments conversion (SAMTools)
[2016-11-14 15:16:50]    * Assemble transcripts (StringTie)
[2016-11-14 15:17:04] Processing sample: ERR188273_chrX
[2016-11-14 15:17:04]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:17:50]    * Alignments conversion (SAMTools)
[2016-11-14 15:18:34]    * Assemble transcripts (StringTie)
[2016-11-14 15:18:44] Processing sample: ERR188337_chrX
[2016-11-14 15:18:44]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:21:18]    * Alignments conversion (SAMTools)
[2016-11-14 15:22:44]    * Assemble transcripts (StringTie)
[2016-11-14 15:23:09] Processing sample: ERR188383_chrX
[2016-11-14 15:23:09]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:25:31]    * Alignments conversion (SAMTools)
[2016-11-14 15:27:13]    * Assemble transcripts (StringTie)
[2016-11-14 15:27:36] Processing sample: ERR188401_chrX
[2016-11-14 15:27:36]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:31:58]    * Alignments conversion (SAMTools)
[2016-11-14 15:33:53]    * Assemble transcripts (StringTie)
[2016-11-14 15:34:12] Processing sample: ERR188428_chrX
[2016-11-14 15:34:12]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:35:46]    * Alignments conversion (SAMTools)
[2016-11-14 15:36:44]    * Assemble transcripts (StringTie)
[2016-11-14 15:36:59] Processing sample: ERR188454_chrX
[2016-11-14 15:36:59]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:39:12]    * Alignments conversion (SAMTools)
[2016-11-14 15:40:34]    * Assemble transcripts (StringTie)
[2016-11-14 15:40:50] Processing sample: ERR204916_chrX
[2016-11-14 15:40:50]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:41:50]    * Alignments conversion (SAMTools)
[2016-11-14 15:43:03]    * Assemble transcripts (StringTie)
[2016-11-14 15:43:18] #> Merge all transcripts (StringTie)
[2016-11-14 15:43:29] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 7 out of bounds (size 7)
Byte-5:RNAseq.hisat2 hqin$ 
Rerun the shell at Linux (ridgeside) same error:
Error at GBitVec: index 7 out of bounds (size 7)
./rnaseq_pipeline.sh: line 82:  7126 Segmentation fault      (core dumped) $STRINGTIE -e -B -p $
NUMCPUS -G ${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf ${ALI
GNLOC}/${sample}.bam



Download v.1.3.1b, rerun the shell script at osX
... ... 
[2016-11-15 14:04:53] #> Merge all transcripts (StringTie)
[2016-11-15 14:04:57] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 9 out of bounds (size 9)
./rnaseq_pipeline.sh: line 82:  2422 Abort trap: 6           $STRINGTIE -e -B -p $NUMCPUS -G ${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf ${ALIGNLOC}/${sample}.bam

Byte-5:RNAseq.hisat2 hqin$ stringtie -v
Command line was:
stringtie -v

StringTie v1.3.1b usage:





Hisat2 demo

install hisat2
install stringtie
install samtools
#make sure all program are in $PATH



Packages installation in R on ridgeside

Under shell, run R

Inside of R:
 source("https://bioconductor.org/biocLite.R")
 biocLite('ballgown')


#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
#A prompt will ask for a mirror site. Any site from USA should work. 

R install package on ridgeside (failed)

new.packages(repos="http://cran.us.r-projects.org") 

install.packages( new.packages(repos="http://cran.us.r-projects.org") ) /*failed*/

install.packages( new.packages(), lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*still failed*/


biocLite( "ballgown", lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*failed*/




source("https://bioconductor.org/biocLite.R")
biocLite()
install.packages("XML", lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*failed*/

Install samtools locally on Linux (ridgeside), osX (byte)

Install htslib locally to /home/lib/

Install bcftools. 
Edit .cshrc

              
Biostar suggestion works: 

Download samtools-1.3.1 from www.htslib.org/download
https://www.biostars.org/p/173832/
make prefix=/home/hqin/bin
make prefix=/home/hqin/bin install

Official Samtools installation guide (Qin could not follow this one due to user limitation on ridgeside)
https://github.com/samtools/samtools/blob/develop/README.md



==========
osX, byte install

http://www.htslib.org/download/
Download samtools-1.3.1

cd /Downloads/samtools-1.3.1

Byte-5:samtools-1.3.1 hqin$ make prefix=/Users/hqin/bin/samtools

Byte-5:samtools-1.3.1 hqin$ make prefix=/Users/hqin/bin/samtools install
mkdir -p -m 755 /Users/hqin/bin/samtools/bin /Users/hqin/bin/samtools/share/man/man1
install -p samtools misc/ace2sam misc/maq2sam-long misc/maq2sam-short misc/md5fa misc/md5sum-lite misc/wgsim misc/blast2sam.pl misc/bowtie2sam.pl misc/export2sam.pl misc/interpolate_sam.pl misc/novo2sam.pl misc/plot-bamstats misc/psl2sam.pl misc/sam2vcf.pl misc/samtools.pl misc/seq_cache_populate.pl misc/soap2sam.pl misc/varfilter.py misc/wgsim_eval.pl misc/zoom2sam.pl /Users/hqin/bin/samtools/bin
install -p -m 644 samtools.1 misc/wgsim.1 /Users/hqin/bin/samtools/share/man/man1



GFF GTF genome annotation file format

http://useast.ensembl.org/info/website/upload/gff.html


Friday, November 11, 2016

sh make_grch38.sh

hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh  
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25--  ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
          => ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done.    ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)

Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M  22.6MB/s    in 60s   

2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]

Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
 Output files: "genome.*.ht2"
 Line rate: 6 (line is 64 bytes)
 Lines per side: 1 (side is 64 bytes)
 Offset rate: 4 (one in 16)
 FTable chars: 10
 Strings: unpacked
 Local offset rate: 3 (one in 8)
 Local fTable chars: 6
 Local sequence length: 57344
 Local sequence overlap between two consecutive indexes: 1024
 Endianness: little
 Actual local endianness: little
 Sanity checking: disabled
 Assertions: disabled
 Random seed: 0
 Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
 genome.fa
Reading reference sizes
 Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
 Time to join reference sequences: 00:00:17
 Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
 Doing ahead-of-time memory usage test
 Passed!  Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
 Building sPrime
 Building sPrimeOrder
 V-Sorting samples
 V-Sorting samples time: 00:00:24
 Allocating rank array
 Ranking v-sort output
 Ranking v-sort output time: 00:00:14
 Invoking Larsson-Sadakane on ranks
 Invoking Larsson-Sadakane on ranks time: 00:00:29
 Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
 (Using difference cover)
 Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
 Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
 Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
 Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
 Reserving size (552346700) for bucket 1
 Calculating Z arrays for bucket 1
 Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
 Reserving size (552346700) for bucket 3
Getting block 4 of 8
 Reserving size (552346700) for bucket 4
 Reserving size (552346700) for bucket 2
 Calculating Z arrays for bucket 3
 Calculating Z arrays for bucket 4
 Calculating Z arrays for bucket 2
 Entering block accumulator loop for bucket 4:
 Entering block accumulator loop for bucket 3:
 Entering block accumulator loop for bucket 2:
 bucket 1: 10%
 bucket 2: 10%
 bucket 3: 10%
 bucket 4: 10%
 bucket 1: 20%
 bucket 2: 20%
 bucket 1: 30%
 bucket 3: 20%
 bucket 4: 20%
 bucket 1: 40%
 bucket 2: 30%
 bucket 1: 50%
 bucket 3: 30%
 bucket 2: 40%
 bucket 4: 30%
 bucket 1: 60%
 bucket 2: 50%
 bucket 3: 40%
 bucket 1: 70%
 bucket 4: 40%
 bucket 2: 60%
 bucket 1: 80%
 bucket 3: 50%
 bucket 1: 90%
 bucket 2: 70%
 bucket 4: 50%
 bucket 1: 100%
 Sorting block of length 291744419 for bucket 1
 (Using difference cover)
 bucket 3: 60%
 bucket 2: 80%
 bucket 4: 60%
 bucket 3: 70%
 bucket 2: 90%
 bucket 4: 70%
 bucket 2: 100%
 Sorting block of length 399816717 for bucket 2
 (Using difference cover)
 bucket 3: 80%
 bucket 4: 80%
 bucket 3: 90%
 bucket 3: 100%
 Sorting block of length 424570505 for bucket 3
 (Using difference cover)
 bucket 4: 90%
 bucket 4: 100%
 Sorting block of length 480190664 for bucket 4
 (Using difference cover)
 Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
 Reserving size (552346700) for bucket 5
 Calculating Z arrays for bucket 5
 Entering block accumulator loop for bucket 5:
 bucket 5: 10%
 bucket 5: 20%
 bucket 5: 30%
 Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
 bucket 5: 40%
 bucket 5: 50%
 bucket 5: 60%
 Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
 bucket 5: 70%
 bucket 5: 80%
Getting block 6 of 8
 Reserving size (552346700) for bucket 6
 Calculating Z arrays for bucket 6
 Entering block accumulator loop for bucket 6:
 bucket 5: 90%
 bucket 6: 10%
 bucket 5: 100%
 Sorting block of length 398074230 for bucket 5
 (Using difference cover)
 bucket 6: 20%
 Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
 bucket 6: 30%
Getting block 7 of 8
 Reserving size (552346700) for bucket 7
 Calculating Z arrays for bucket 7
 Entering block accumulator loop for bucket 7:
 bucket 6: 40%
 bucket 7: 10%
 bucket 6: 50%
 bucket 7: 20%
 bucket 6: 60%
 bucket 7: 30%
 bucket 6: 70%
 bucket 7: 40%
Getting block 8 of 8
 Reserving size (552346700) for bucket 8
 Calculating Z arrays for bucket 8
 Entering block accumulator loop for bucket 8:
 bucket 6: 80%
 bucket 8: 10%
 bucket 7: 50%
 bucket 8: 20%
 bucket 6: 90%
 bucket 7: 60%
 bucket 8: 30%
 bucket 6: 100%
 Sorting block of length 241117192 for bucket 6
 (Using difference cover)
 bucket 8: 40%
 bucket 7: 70%
 bucket 8: 50%
 bucket 7: 80%
 bucket 8: 60%
 bucket 8: 70%
 bucket 7: 90%
 bucket 8: 80%
 bucket 7: 100%
 Sorting block of length 547672632 for bucket 7
 (Using difference cover)
 bucket 8: 90%
 bucket 8: 100%
 Sorting block of length 162662701 for bucket 8
 (Using difference cover)
 Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
 Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
 Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
 Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
   len: 2945849067
   gbwtLen: 2945849068
   nodes: 2945849068
   sz: 736462267
   gbwtSz: 736462268
   lineRate: 6
   offRate: 4
   offMask: 0xfffffff0
   ftabChars: 10
   eftabLen: 0
   eftabSz: 0
   ftabLen: 1048577
   ftabSz: 4194308
   offsLen: 184115567
   offsSz: 736462268
   lineSz: 64
   sideSz: 64
   sideGbwtSz: 48
   sideGbwtLen: 192
   numSides: 15342964
   numLines: 15342964
   gbwtTotLen: 981949696
   gbwtTotSz: 981949696
   reverse: 0
   linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files




/* Qin: genome.1.ht2 etc are saved in scripts/ directory */

sh make_grch38.sh

hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh  
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25--  ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
          => ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done.    ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)

Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M  22.6MB/s    in 60s    

2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]

Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
 Output files: "genome.*.ht2"
 Line rate: 6 (line is 64 bytes)
 Lines per side: 1 (side is 64 bytes)
 Offset rate: 4 (one in 16)
 FTable chars: 10
 Strings: unpacked
 Local offset rate: 3 (one in 8)
 Local fTable chars: 6
 Local sequence length: 57344
 Local sequence overlap between two consecutive indexes: 1024
 Endianness: little
 Actual local endianness: little
 Sanity checking: disabled
 Assertions: disabled
 Random seed: 0
 Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
 genome.fa
Reading reference sizes
 Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
 Time to join reference sequences: 00:00:17
 Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
 Doing ahead-of-time memory usage test
 Passed!  Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
 Building sPrime
 Building sPrimeOrder
 V-Sorting samples
 V-Sorting samples time: 00:00:24
 Allocating rank array
 Ranking v-sort output
 Ranking v-sort output time: 00:00:14
 Invoking Larsson-Sadakane on ranks
 Invoking Larsson-Sadakane on ranks time: 00:00:29
 Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
 (Using difference cover)
 Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
 Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
 Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
 Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
 Reserving size (552346700) for bucket 1
 Calculating Z arrays for bucket 1
 Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
 Reserving size (552346700) for bucket 3
Getting block 4 of 8
 Reserving size (552346700) for bucket 4
 Reserving size (552346700) for bucket 2
 Calculating Z arrays for bucket 3
 Calculating Z arrays for bucket 4
 Calculating Z arrays for bucket 2
 Entering block accumulator loop for bucket 4:
 Entering block accumulator loop for bucket 3:
 Entering block accumulator loop for bucket 2:
 bucket 1: 10%
 bucket 2: 10%
 bucket 3: 10%
 bucket 4: 10%
 bucket 1: 20%
 bucket 2: 20%
 bucket 1: 30%
 bucket 3: 20%
 bucket 4: 20%
 bucket 1: 40%
 bucket 2: 30%
 bucket 1: 50%
 bucket 3: 30%
 bucket 2: 40%
 bucket 4: 30%
 bucket 1: 60%
 bucket 2: 50%
 bucket 3: 40%
 bucket 1: 70%
 bucket 4: 40%
 bucket 2: 60%
 bucket 1: 80%
 bucket 3: 50%
 bucket 1: 90%
 bucket 2: 70%
 bucket 4: 50%
 bucket 1: 100%
 Sorting block of length 291744419 for bucket 1
 (Using difference cover)
 bucket 3: 60%
 bucket 2: 80%
 bucket 4: 60%
 bucket 3: 70%
 bucket 2: 90%
 bucket 4: 70%
 bucket 2: 100%
 Sorting block of length 399816717 for bucket 2
 (Using difference cover)
 bucket 3: 80%
 bucket 4: 80%
 bucket 3: 90%
 bucket 3: 100%
 Sorting block of length 424570505 for bucket 3
 (Using difference cover)
 bucket 4: 90%
 bucket 4: 100%
 Sorting block of length 480190664 for bucket 4
 (Using difference cover)
 Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
 Reserving size (552346700) for bucket 5
 Calculating Z arrays for bucket 5
 Entering block accumulator loop for bucket 5:
 bucket 5: 10%
 bucket 5: 20%
 bucket 5: 30%
 Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
 bucket 5: 40%
 bucket 5: 50%
 bucket 5: 60%
 Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
 bucket 5: 70%
 bucket 5: 80%
Getting block 6 of 8
 Reserving size (552346700) for bucket 6
 Calculating Z arrays for bucket 6
 Entering block accumulator loop for bucket 6:
 bucket 5: 90%
 bucket 6: 10%
 bucket 5: 100%
 Sorting block of length 398074230 for bucket 5
 (Using difference cover)
 bucket 6: 20%
 Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
 bucket 6: 30%
Getting block 7 of 8
 Reserving size (552346700) for bucket 7
 Calculating Z arrays for bucket 7
 Entering block accumulator loop for bucket 7:
 bucket 6: 40%
 bucket 7: 10%
 bucket 6: 50%
 bucket 7: 20%
 bucket 6: 60%
 bucket 7: 30%
 bucket 6: 70%
 bucket 7: 40%
Getting block 8 of 8
 Reserving size (552346700) for bucket 8
 Calculating Z arrays for bucket 8
 Entering block accumulator loop for bucket 8:
 bucket 6: 80%
 bucket 8: 10%
 bucket 7: 50%
 bucket 8: 20%
 bucket 6: 90%
 bucket 7: 60%
 bucket 8: 30%
 bucket 6: 100%
 Sorting block of length 241117192 for bucket 6
 (Using difference cover)
 bucket 8: 40%
 bucket 7: 70%
 bucket 8: 50%
 bucket 7: 80%
 bucket 8: 60%
 bucket 8: 70%
 bucket 7: 90%
 bucket 8: 80%
 bucket 7: 100%
 Sorting block of length 547672632 for bucket 7
 (Using difference cover)
 bucket 8: 90%
 bucket 8: 100%
 Sorting block of length 162662701 for bucket 8
 (Using difference cover)
 Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
 Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
 Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
 Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
   len: 2945849067
   gbwtLen: 2945849068
   nodes: 2945849068
   sz: 736462267
   gbwtSz: 736462268
   lineRate: 6
   offRate: 4
   offMask: 0xfffffff0
   ftabChars: 10
   eftabLen: 0
   eftabSz: 0
   ftabLen: 1048577
   ftabSz: 4194308
   offsLen: 184115567
   offsSz: 736462268
   lineSz: 64
   sideSz: 64
   sideGbwtSz: 48
   sideGbwtLen: 192
   numSides: 15342964
   numLines: 15342964
   gbwtTotLen: 981949696
   gbwtTotSz: 981949696
   reverse: 0
   linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files




/* Qin: genome.1.ht2 etc are saved in scripts/ directory */

hisat2 use log

cd scripts
sh make_grch38.sh 

Thursday, November 10, 2016

physical activity, Cochrane public health

Michel 2014,
->money, increase attendance, but not physical activities.

smart app, wearable devices: not increase physical activities

Qui 2015, goal setting can be effective.

Brown 2016: social support is likely effective.

Evidence-informed decision making


health evidence


healthevidence.org


Install R locally on Linux


http://unix.stackexchange.com/questions/149451/install-r-in-my-own-directory

todo: Elastic net method

33 Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J
Roy Stat Soc B 67, 301-320, (2005).
34 Zou, H. & Zhang, H. H. On the Adaptive Elastic-Net with a Diverging Number of

Parameters. Ann Stat 37, 1733-1751, (2009).

https://www.r-bloggers.com/kickin-it-with-elastic-net-regression/
"Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm (euclidean distance) of the coefficient vector which results in “shrinking” the beta coefficients. The aggressiveness of the penalty is controlled by a parameter lambda."

"Lasso regression is a related regularization method. Instead of using the L2 norm, though, it penalizes the L1 norm (manhattan distance) of the coefficient vector."

"Elastic net regression is a hybrid approach that blends both penalization of the L2 and L1 norms."

Barretina 2012 CCL enables predictive modeling of anticancer drug sensitivity

Barrentina 2012 Nature. The CancerCell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity

There are 8 point does-response across 479 cell lines. A logistical sigmoidal function with maximal effect A_max, concentration at half-maximal activity of the compound (EC50), and a Hill coefficient represeting the sigmoidal transition, and the concentration of an absolute inhibition of 50% (IC50). 

947 cell lines were profiled at genomes and expression levels. 

Amazingly, Barrentian12 used the same logistical model with Qin08PONE: 
All dose-response data was reduced to a fitted model using a decision tree
methodology based on the NIH/NCGC assay guidelines
(http://assay.nih.gov/assay/index.php/Table_of_Contents). Models were generated for the
duplicate data points generated for each cell line run day. In brief, dose-response data was
fitted to one of three models depending on the statistical quality of the fits measured
using a Chi-squared test. One approach was the 4 parameter sigmoid model shown
below:



Alternatively, a constant model y = Ainf was employed; or a non-parametric spline
interpolation of the data points was performed (note that this last model represents less
than 5% of models). In these models, A0 and Ainf are the top and bottom asymptotes of the
response; EC50 is the inflection point of the curve; and Hill is the Hill slope, which
describes the steepness of the curve. Other key parameters derived from the models
include the IC50, the concentration where the fitted curve crosses -50%; and Amax, which

is the maximal activity value reached within a model. For the spline interpolation model,
For the spline interpolation model, IC50 and EC50 parameters were both set to the concentration where the fitted model first
crosses -50%. Additionally, we calculated two forms of the Activity area for each curve,
defined as the area between the response curve and a fixed reference Aref = 0 or a variable
reference Aref = max(0, Alow) where Alow is the activity at the lowest concentration, up to
the maximum tested concentration. In practice, the Activity area was calculated as the
sum of differences between the measured Ai at concentration i and the reference level.
Thus, using the fixed reference, Activity area = 0 corresponds to an inactive compound,
and 8 corresponds to a compound which had A = -100% at all eight concentrations
points. The variable reference form was introduced to adjust for curves with large
positive activities close to zero concentration, which are usually artifacts of imperfectly
corrected variations on the assay plate. For this measure, the median of all replicate
activity values was used regardless of cell line run day. To prevent confusion, the Activity
Area was calculated using Aref = 0 unless otherwise noted. 


Friday, November 4, 2016

UTC printing

To better serve you we request that you use our online TRAC system to submit your job.  If you have previously submitted a job you should already be in our system. Most usernames are your UTC ID and the password you entered at the time you set up the account. Most users leave the default password which was “password,” both the username and password, are case sensitive. If you have yet to use our online website to submit a job, you can do so by creating a new user account by visiting the web address (https://utc.ricohtrac.com) or you can access our page through the UTC Website, once on the site simply click on the search engine and type in Graphic and Mail Services. Click on our link and you will be redirected to our website. Once on our site, click the tab named “Support”. It’s one of the blue and gold icons on the left side of the screen. Scroll down to the middle of the page and click on, “Submit a job-Go to Trac now,” this will take you to another screen that allows you to create a new user account. Once you have entered your new user info, we will receive the request for a new user account, we can then approve your request electronically and you will receive an email confirming the approval.

 If you have used our system before you should already have your username and password to login. If you are a first time user or a returning customer and you have questions on how to use the online job submission tool, please call us and we can walk you through the process. If you have any questions, you can call me at x4092 and we will gladly guide you through on how to complete the job ticket. If you have files that need to be uploaded and are too large to upload.  You can do one of two options you can send us the file via email to (rcd061@mocs.utc.edu), or email the file to one of our associates.

We are trying to get everyone used to going through our Trac system, because it’s easy to lose an email. This way everything stays in one place. If you need any assistance, don’t hesitate to call and we can walk you through it.

ZenHub support



This is a quick note to let you know that our support team is working as quickly as possible to answer your question. We'll have a non-autogenerated answer (from a real live person!) for you within the same business day. :)

In the meantime, you can see if we've addressed your request here:

ZenHub Blog: https://www.zenhub.com/blog/
ZenHub's Public Repo on GitHub: http://github.com/zenhubio/support

Finally, get all our real-time service updates by following us on Twitter:

https://twitter.com/zenhubhq

Thursday, November 3, 2016

backboard scrolling bar, mac

The Scroll Bars have Disappeared While Using Blackboard Grade Center with a Mac. How do I fix this?

When using Blackboard Grade Center on a Mac, you may notice that the horizontal scroll bars have disappeared, preventing you from viewing the rest of your grade columns. This issue tends to only exist on versions of Mac OSX 10.7 and above. To fix this follow these steps:
On Mac OS X: 
  1. Open System Preferences, either from the Dock or from the Apple menu.
  2. In the System Preferences, select the General preference pane.
  3. The middle section of the General preference pane controls when scroll bars appear.
  4. Select "Always" from the Show Scroll Bars options.
Enabling the above feature will keep the scroll bar from automatically hiding. 
Search key words: scroll bars scrollbars scrolling scroll can't see all columns gradecenter grade center

Tuesday, November 1, 2016

*** github education discount



https://education.github.com/contact

https://education.github.com/discount_requests/new

Illumina iGenome FTP

Illumina Provided Genomes

Illumina provides a number of commonly used genomes at ftp.illumina.com along with a reference annotation:
Arabidopsis_thaliana
Bos_taurus
Caenorhabditis_elegans
Canis_familiaris
Drosophila_melanogaster
Equus_caballus
Escherichia_coli_K_12_DH10B
Escherichia_coli_K_12_MG1655
Gallus_gallus
Homo_sapiens
Mus_musculus
Mycobacterium_tuberculosis_H37RV
Pan_troglodytes
PhiX
Rattus_norvegicus
Saccharomyces_cerevisiae
Sus_scrofa
You can login using the following credentials:
Username: igenome
Password: G3nom3s4u
For example, download the FASTA, annotation, and bowtie index files for the human hg18 genome from the iGenomes repository with the following commands:
>wget --ftp-user=igenome --ftp-password=G3nom3s4u ftp://ftp.illumina.com/Homo_sapiens/UCSC/hg18/Homo_sapiens_UCSC_hg18.tar.gz
Unpack the tar file:
tar xvzf Homo_sapiens_UCSC_hg18.tar.gz
Unpacking will make its own folder
Homo_sapiens/UCSC/hg18


CR expression and protein abundance data

Dang lab: RNAseq and protein-mass data