## Wednesday, November 30, 2016

### linear and nonlinear ODEs

In essence, linear ODEs can be represented by dx_i/dt = matrix * X

nonlinear ODEs
http://eqworld.ipmnet.ru/en/solutions/ode/ode-toc3.htm

https://en.wikipedia.org/wiki/Linear_differential_equation

## Tuesday, November 29, 2016

### *** Control systems engineering, control theory, Laplace transform, observability,

A control system has an input, a process, and an output. It can be open loop or closed loop. Open loop systems do not monitor or correct the output. Closed loop systems can monitor output and make adjustments.

linear time-invariant differential equation

Transfer function is another way of mathematically modeling a system.  Transfer function can be derived from the linear, time-invariant differential equation using Transfer function can only be used for linear systems. (Lapalace transformation was developed as a technique to solve differential equations).

State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.

Test signals with different waveforms can be used to study systems.

The basic analysis of a system is to evaluate the time response of a system.

A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.

In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.

Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables

Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?

Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable.
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t).

RSA
project nolvety,

### genome compression

https://en.wikipedia.org/wiki/Compression_of_Genomic_Re-Sequencing_Data

Number theory, data compression for NGS data

Can RSA or other methods be used for NGS sequence compression?

### lab meeting

1a) DE gene lists for RNAseq project
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?

It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is

and

So, "Gene ID" is a standard NCBI number.

1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).

2) time-lapsed image analysis for yeast replicative lifespan
We can use ImageJ, MATlab or R.

## Saturday, November 26, 2016

$./rnaseq_pipeline.sh out =========Additional R packages #Please also run the following code to install all packages in R. This may take 10-12 hours. install.packages(new.packages()) #A prompt will ask for a mirror site. Any site from USA should work. ### big java sites ### Wiley rep Rep NameContact Details  MARY VANN - 0065  Phone: 6175041370 Email: ## Friday, November 18, 2016 ### bibtex doi bug in qin_network.bib, I added a reference with DOI field. This filed generates an error in *bbl file using$bibtex$. I removed the DOI fileds and the bug disappeared. ## Wednesday, November 16, 2016 ### toread, Graph Metrics for Temporal Networks - Springer http://www.springer.com/cda/content/document/cda_downloaddocument/9783642364600-c1.pdf?SGWID=0-0-45-1393604-p174915729 ### toread: An Introduction to Temporal Graph Data Management1 ### toread Path Problems in Temporal Graphs http://www.vldb.org/pvldb/vol7/p721-wu.pdf Path Problems in Temporal Graphs Huanhuan Wu∗, James Cheng∗ , Silu Huang∗, Yiping Ke#, Yi Lu∗, Yanyan Xu∗ ∗Department of Computer Science and Engineering, The Chinese University of Hong Kong {hhwu,jcheng,slhuang,ylu,yyxu}@cse.cuhk.edu.hk #Institute of High Performance Computing, Singapore ### safety training, UTC hazardous materials gasoline can be easily ignited, but diesel is not. universal waste: florescent lamp should be recycled. computer batteries. motor batteries Dot hazard marking Global harmonization container markings NFPA rating explanation guide, NFPA 704, HMIS 423 425 HELP ## Tuesday, November 15, 2016 ### integrating gene expression and network, a reference collection Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF. Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated. Ideker02 seems to combine K-means and simulated annealing for network clustering. Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289. Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272. Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233. Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221 Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes, Chao Wu, Jun Zhu and Xuegong Zhang http://www.biomedcentral.com/1471-2105/13/182 http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation http://www.biomedcentral.com/1755-8794/7/S2/S4 http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html From Ma, 2007 Bioinformatics CGI paper: Gene expression data and protein interaction data have been integrated for gene function prediction. For example, Ideker et al. (2002) used protein interaction data and gene expression data to screen for differentially expressed subnetworks between different conditions . In Tornow and Mewes (2003) and Segal et al. (2003), gene expression data and protein interactions are used to group genes into functional modules. These methods provide insights into the regulatory modules of the whole networks at the systems biology level. However, it is not clear how to adapt their methods to identify genes contributing to the phenotype of interest. Morrison et al. (2005) adapted the Google search engine to prioritize genes for a phenotype by integrating gene expression profiles and protein interaction data. However, the algorithm ignores the information from proteins linked to the target protein through other intermediate proteins, referred to in the rest of this paper as indirect neighbors. Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM. X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002 WGCNA: an R package for weighted correlation network analysis. ## Monday, November 14, 2016 ### RNAseq demo (hisat2, stringtie) error at GBitVec: index 7 out of bounds (size 7) (osX and linux) Data and codes are downloaded from ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/ Byte-5:sva hqin$ ps
PID TTY           TIME CMD
49035 ttys000    0:00.08 -bash
51361 ttys000    0:00.01 bash ./rnaseq_pipeline.sh out
51377 ttys000    0:00.03 bash ./rnaseq_pipeline.sh out
51378 ttys000    0:00.00 tee ./run.log
52036 ttys000    0:00.07 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52044 ttys000    0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52045 ttys000    0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52046 ttys000    3:02.29 /Users/hqin/bin/hisat2-align-s --wrapper basic-0 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisa
52047 ttys000    0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_2.fastq.gz
52048 ttys000    0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_1.fastq.gz
492 ttys001    0:00.21 -bash
51812 ttys002    0:00.02 -bash

51867 ttys002    0:11.01 tar xvfz hg38_tran.tar.gz

Byte-5:samtools hqin$cd Byte-5:~ hqin$ cd demo.lgf/
Byte-5:demo.lgf hqin$cd RNAseq.hisat2/ Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
Byte-5:RNAseq.hisat2 hqin$source /Users/hqin/.bash_profile Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] #> START:  ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] Processing sample: ERR188044_chrX
[2016-11-14 15:07:24]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:07:56]    * Alignments conversion (SAMTools)
[2016-11-14 15:08:40]    * Assemble transcripts (StringTie)
[2016-11-14 15:08:51] Processing sample: ERR188104_chrX
[2016-11-14 15:08:51]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:09:37]    * Alignments conversion (SAMTools)
[2016-11-14 15:10:24]    * Assemble transcripts (StringTie)
[2016-11-14 15:10:36] Processing sample: ERR188234_chrX
[2016-11-14 15:10:36]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:11:11]    * Alignments conversion (SAMTools)
[2016-11-14 15:12:23]    * Assemble transcripts (StringTie)
[2016-11-14 15:12:45] Processing sample: ERR188245_chrX
[2016-11-14 15:12:45]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:13:45]    * Alignments conversion (SAMTools)
[2016-11-14 15:14:40]    * Assemble transcripts (StringTie)
[2016-11-14 15:14:51] Processing sample: ERR188257_chrX
[2016-11-14 15:14:51]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:15:42]    * Alignments conversion (SAMTools)
[2016-11-14 15:16:50]    * Assemble transcripts (StringTie)
[2016-11-14 15:17:04] Processing sample: ERR188273_chrX
[2016-11-14 15:17:04]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:17:50]    * Alignments conversion (SAMTools)
[2016-11-14 15:18:34]    * Assemble transcripts (StringTie)
[2016-11-14 15:18:44] Processing sample: ERR188337_chrX
[2016-11-14 15:18:44]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:21:18]    * Alignments conversion (SAMTools)
[2016-11-14 15:22:44]    * Assemble transcripts (StringTie)
[2016-11-14 15:23:09] Processing sample: ERR188383_chrX
[2016-11-14 15:23:09]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:25:31]    * Alignments conversion (SAMTools)
[2016-11-14 15:27:13]    * Assemble transcripts (StringTie)
[2016-11-14 15:27:36] Processing sample: ERR188401_chrX
[2016-11-14 15:27:36]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:31:58]    * Alignments conversion (SAMTools)
[2016-11-14 15:33:53]    * Assemble transcripts (StringTie)
[2016-11-14 15:34:12] Processing sample: ERR188428_chrX
[2016-11-14 15:34:12]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:35:46]    * Alignments conversion (SAMTools)
[2016-11-14 15:36:44]    * Assemble transcripts (StringTie)
[2016-11-14 15:36:59] Processing sample: ERR188454_chrX
[2016-11-14 15:36:59]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:39:12]    * Alignments conversion (SAMTools)
[2016-11-14 15:40:34]    * Assemble transcripts (StringTie)
[2016-11-14 15:40:50] Processing sample: ERR204916_chrX
[2016-11-14 15:40:50]    * Alignment of reads to genome (HISAT2)
[2016-11-14 15:41:50]    * Alignments conversion (SAMTools)
[2016-11-14 15:43:03]    * Assemble transcripts (StringTie)
[2016-11-14 15:43:18] #> Merge all transcripts (StringTie)
[2016-11-14 15:43:29] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 7 out of bounds (size 7)
Byte-5:RNAseq.hisat2 hqin$Rerun the shell at Linux (ridgeside) same error: Error at GBitVec: index 7 out of bounds (size 7) ./rnaseq_pipeline.sh: line 82: 7126 Segmentation fault (core dumped)$STRINGTIE -e -B -p $NUMCPUS -G${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf${ALI
GNLOC}/${sample}.bam Download v.1.3.1b, rerun the shell script at osX ... ... [2016-11-15 14:04:53] #> Merge all transcripts (StringTie) [2016-11-15 14:04:57] #> Estimate abundance for each sample (StringTie) Error at GBitVec: index 9 out of bounds (size 9) ./rnaseq_pipeline.sh: line 82: 2422 Abort trap: 6$STRINGTIE -e -B -p $NUMCPUS -G${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf${ALIGNLOC}/${sample}.bam Byte-5:RNAseq.hisat2 hqin$ stringtie -v
Command line was:
stringtie -v

StringTie v1.3.1b usage:

### Hisat2 demo

install hisat2
install stringtie
install samtools

Byte-5:samtools-1.3.1 hqin$make prefix=/Users/hqin/bin/samtools install mkdir -p -m 755 /Users/hqin/bin/samtools/bin /Users/hqin/bin/samtools/share/man/man1 install -p samtools misc/ace2sam misc/maq2sam-long misc/maq2sam-short misc/md5fa misc/md5sum-lite misc/wgsim misc/blast2sam.pl misc/bowtie2sam.pl misc/export2sam.pl misc/interpolate_sam.pl misc/novo2sam.pl misc/plot-bamstats misc/psl2sam.pl misc/sam2vcf.pl misc/samtools.pl misc/seq_cache_populate.pl misc/soap2sam.pl misc/varfilter.py misc/wgsim_eval.pl misc/zoom2sam.pl /Users/hqin/bin/samtools/bin install -p -m 644 samtools.1 misc/wgsim.1 /Users/hqin/bin/samtools/share/man/man1 ### GFF GTF genome annotation file format ## Friday, November 11, 2016 ### sh make_grch38.sh hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh /home/hqin/tools/hisat2/hisat2-build --2016-11-11 10:38:25-- ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz => ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done. ==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done. Length: 881214344 (840M) (unauthoritative) Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M 22.6MB/s in 60s 2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344] Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome Settings: Output files: "genome.*.ht2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344 Local sequence overlap between two consecutive indexes: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void*:8, int:4, long:8, size_t:8 Input files DNA, FASTA: genome.fa Reading reference sizes Time reading reference sizes: 00:00:41 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:17 Time to read SNPs and splice sites: 00:00:00 Using parameters --bmax 552346700 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 552346700 --dcv 1024 Constructing suffix-array element generator Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples V-Sorting samples time: 00:00:24 Allocating rank array Ranking v-sort output Ranking v-sort output time: 00:00:14 Invoking Larsson-Sadakane on ranks Invoking Larsson-Sadakane on ranks time: 00:00:29 Sanity-checking and returning Building samples Reserving space for 12 sample suffixes Generating random suffixes QSorting 12 sample offsets, eliminating duplicates QSorting sample offsets, eliminating duplicates time: 00:00:00 Multikey QSorting 12 samples (Using difference cover) Multikey QSorting samples time: 00:00:00 Calculating bucket sizes Splitting and merging Splitting and merging time: 00:00:00 Split 1, merged 6; iterating... Splitting and merging Splitting and merging time: 00:00:00 Split 1, merged 0; iterating... Splitting and merging Splitting and merging time: 00:00:00 Avg bucket size: 3.68231e+08 (target: 552346699) Converting suffix-array elements to index image Allocating ftab, absorbFtab Entering GFM loop Getting block 1 of 8 Reserving size (552346700) for bucket 1 Calculating Z arrays for bucket 1 Entering block accumulator loop for bucket 1: Getting block 2 of 8 Getting block 3 of 8 Reserving size (552346700) for bucket 3 Getting block 4 of 8 Reserving size (552346700) for bucket 4 Reserving size (552346700) for bucket 2 Calculating Z arrays for bucket 3 Calculating Z arrays for bucket 4 Calculating Z arrays for bucket 2 Entering block accumulator loop for bucket 4: Entering block accumulator loop for bucket 3: Entering block accumulator loop for bucket 2: bucket 1: 10% bucket 2: 10% bucket 3: 10% bucket 4: 10% bucket 1: 20% bucket 2: 20% bucket 1: 30% bucket 3: 20% bucket 4: 20% bucket 1: 40% bucket 2: 30% bucket 1: 50% bucket 3: 30% bucket 2: 40% bucket 4: 30% bucket 1: 60% bucket 2: 50% bucket 3: 40% bucket 1: 70% bucket 4: 40% bucket 2: 60% bucket 1: 80% bucket 3: 50% bucket 1: 90% bucket 2: 70% bucket 4: 50% bucket 1: 100% Sorting block of length 291744419 for bucket 1 (Using difference cover) bucket 3: 60% bucket 2: 80% bucket 4: 60% bucket 3: 70% bucket 2: 90% bucket 4: 70% bucket 2: 100% Sorting block of length 399816717 for bucket 2 (Using difference cover) bucket 3: 80% bucket 4: 80% bucket 3: 90% bucket 3: 100% Sorting block of length 424570505 for bucket 3 (Using difference cover) bucket 4: 90% bucket 4: 100% Sorting block of length 480190664 for bucket 4 (Using difference cover) Sorting block time: 00:01:40 Returning block of 291744420 for bucket 1 Getting block 5 of 8 Reserving size (552346700) for bucket 5 Calculating Z arrays for bucket 5 Entering block accumulator loop for bucket 5: bucket 5: 10% bucket 5: 20% bucket 5: 30% Sorting block time: 00:02:23 Returning block of 399816718 for bucket 2 bucket 5: 40% bucket 5: 50% bucket 5: 60% Sorting block time: 00:02:29 Returning block of 424570506 for bucket 3 bucket 5: 70% bucket 5: 80% Getting block 6 of 8 Reserving size (552346700) for bucket 6 Calculating Z arrays for bucket 6 Entering block accumulator loop for bucket 6: bucket 5: 90% bucket 6: 10% bucket 5: 100% Sorting block of length 398074230 for bucket 5 (Using difference cover) bucket 6: 20% Sorting block time: 00:02:56 Returning block of 480190665 for bucket 4 bucket 6: 30% Getting block 7 of 8 Reserving size (552346700) for bucket 7 Calculating Z arrays for bucket 7 Entering block accumulator loop for bucket 7: bucket 6: 40% bucket 7: 10% bucket 6: 50% bucket 7: 20% bucket 6: 60% bucket 7: 30% bucket 6: 70% bucket 7: 40% Getting block 8 of 8 Reserving size (552346700) for bucket 8 Calculating Z arrays for bucket 8 Entering block accumulator loop for bucket 8: bucket 6: 80% bucket 8: 10% bucket 7: 50% bucket 8: 20% bucket 6: 90% bucket 7: 60% bucket 8: 30% bucket 6: 100% Sorting block of length 241117192 for bucket 6 (Using difference cover) bucket 8: 40% bucket 7: 70% bucket 8: 50% bucket 7: 80% bucket 8: 60% bucket 8: 70% bucket 7: 90% bucket 8: 80% bucket 7: 100% Sorting block of length 547672632 for bucket 7 (Using difference cover) bucket 8: 90% bucket 8: 100% Sorting block of length 162662701 for bucket 8 (Using difference cover) Sorting block time: 00:02:21 Returning block of 398074231 for bucket 5 Sorting block time: 00:01:25 Returning block of 241117193 for bucket 6 Sorting block time: 00:01:00 Returning block of 162662702 for bucket 8 Sorting block time: 00:03:07 Returning block of 547672633 for bucket 7 Exited GFM loop fchr[A]: 0 fchr[C]: 869653843 fchr[G]: 1470243264 fchr[T]: 2073417374 fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
len: 2945849067
gbwtLen: 2945849068
nodes: 2945849068
sz: 736462267
gbwtSz: 736462268
lineRate: 6
offRate: 4
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 184115567
offsSz: 736462268
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15342964
numLines: 15342964
gbwtTotLen: 981949696
gbwtTotSz: 981949696
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files

/* Qin: genome.1.ht2 etc are saved in scripts/ directory */

### sh make_grch38.sh

hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25--  ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
=> ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done.    ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)

Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M  22.6MB/s    in 60s

2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]

Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
Output files: "genome.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.fa
Calculating joined length
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:17
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
Passed!  Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:24
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:14
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (552346700) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
Reserving size (552346700) for bucket 3
Getting block 4 of 8
Reserving size (552346700) for bucket 4
Reserving size (552346700) for bucket 2
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 4
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 4:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 2:
bucket 1: 10%
bucket 2: 10%
bucket 3: 10%
bucket 4: 10%
bucket 1: 20%
bucket 2: 20%
bucket 1: 30%
bucket 3: 20%
bucket 4: 20%
bucket 1: 40%
bucket 2: 30%
bucket 1: 50%
bucket 3: 30%
bucket 2: 40%
bucket 4: 30%
bucket 1: 60%
bucket 2: 50%
bucket 3: 40%
bucket 1: 70%
bucket 4: 40%
bucket 2: 60%
bucket 1: 80%
bucket 3: 50%
bucket 1: 90%
bucket 2: 70%
bucket 4: 50%
bucket 1: 100%
Sorting block of length 291744419 for bucket 1
(Using difference cover)
bucket 3: 60%
bucket 2: 80%
bucket 4: 60%
bucket 3: 70%
bucket 2: 90%
bucket 4: 70%
bucket 2: 100%
Sorting block of length 399816717 for bucket 2
(Using difference cover)
bucket 3: 80%
bucket 4: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 424570505 for bucket 3
(Using difference cover)
bucket 4: 90%
bucket 4: 100%
Sorting block of length 480190664 for bucket 4
(Using difference cover)
Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
Reserving size (552346700) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
bucket 5: 70%
bucket 5: 80%
Getting block 6 of 8
Reserving size (552346700) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 90%
bucket 6: 10%
bucket 5: 100%
Sorting block of length 398074230 for bucket 5
(Using difference cover)
bucket 6: 20%
Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
bucket 6: 30%
Getting block 7 of 8
Reserving size (552346700) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 40%
bucket 7: 10%
bucket 6: 50%
bucket 7: 20%
bucket 6: 60%
bucket 7: 30%
bucket 6: 70%
bucket 7: 40%
Getting block 8 of 8
Reserving size (552346700) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 8: 10%
bucket 7: 50%
bucket 8: 20%
bucket 6: 90%
bucket 7: 60%
bucket 8: 30%
bucket 6: 100%
Sorting block of length 241117192 for bucket 6
(Using difference cover)
bucket 8: 40%
bucket 7: 70%
bucket 8: 50%
bucket 7: 80%
bucket 8: 60%
bucket 8: 70%
bucket 7: 90%
bucket 8: 80%
bucket 7: 100%
Sorting block of length 547672632 for bucket 7
(Using difference cover)
bucket 8: 90%
bucket 8: 100%
Sorting block of length 162662701 for bucket 8
(Using difference cover)
Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[\$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
len: 2945849067
gbwtLen: 2945849068
nodes: 2945849068
sz: 736462267
gbwtSz: 736462268
lineRate: 6
offRate: 4
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 184115567
offsSz: 736462268
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15342964
numLines: 15342964
gbwtTotLen: 981949696
gbwtTotSz: 981949696
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files

/* Qin: genome.1.ht2 etc are saved in scripts/ directory */

### hisat2 use log

cd scripts
sh make_grch38.sh

## Thursday, November 10, 2016

### physical activity, Cochrane public health

Michel 2014,
->money, increase attendance, but not physical activities.

smart app, wearable devices: not increase physical activities

Qui 2015, goal setting can be effective.

Brown 2016: social support is likely effective.

Evidence-informed decision making

### todo: Elastic net method

33 Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J
Roy Stat Soc B 67, 301-320, (2005).
34 Zou, H. & Zhang, H. H. On the Adaptive Elastic-Net with a Diverging Number of

Parameters. Ann Stat 37, 1733-1751, (2009).

https://www.r-bloggers.com/kickin-it-with-elastic-net-regression/
"Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm (euclidean distance) of the coefficient vector which results in “shrinking” the beta coefficients. The aggressiveness of the penalty is controlled by a parameter ."

"Lasso regression is a related regularization method. Instead of using the L2 norm, though, it penalizes the L1 norm (manhattan distance) of the coefficient vector."

"Elastic net regression is a hybrid approach that blends both penalization of the L2 and L1 norms."

### Barretina 2012 CCL enables predictive modeling of anticancer drug sensitivity

Barrentina 2012 Nature. The CancerCell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity

There are 8 point does-response across 479 cell lines. A logistical sigmoidal function with maximal effect A_max, concentration at half-maximal activity of the compound (EC50), and a Hill coefficient represeting the sigmoidal transition, and the concentration of an absolute inhibition of 50% (IC50).

947 cell lines were profiled at genomes and expression levels.

Amazingly, Barrentian12 used the same logistical model with Qin08PONE:
All dose-response data was reduced to a fitted model using a decision tree
methodology based on the NIH/NCGC assay guidelines
(http://assay.nih.gov/assay/index.php/Table_of_Contents). Models were generated for the
duplicate data points generated for each cell line run day. In brief, dose-response data was
fitted to one of three models depending on the statistical quality of the fits measured
using a Chi-squared test. One approach was the 4 parameter sigmoid model shown
below:

Alternatively, a constant model y = Ainf was employed; or a non-parametric spline
interpolation of the data points was performed (note that this last model represents less
than 5% of models). In these models, A0 and Ainf are the top and bottom asymptotes of the
response; EC50 is the inflection point of the curve; and Hill is the Hill slope, which
describes the steepness of the curve. Other key parameters derived from the models
include the IC50, the concentration where the fitted curve crosses -50%; and Amax, which

is the maximal activity value reached within a model. For the spline interpolation model,
For the spline interpolation model, IC50 and EC50 parameters were both set to the concentration where the fitted model first
crosses -50%. Additionally, we calculated two forms of the Activity area for each curve,
defined as the area between the response curve and a fixed reference Aref = 0 or a variable
reference Aref = max(0, Alow) where Alow is the activity at the lowest concentration, up to
the maximum tested concentration. In practice, the Activity area was calculated as the
sum of differences between the measured Ai at concentration i and the reference level.
Thus, using the fixed reference, Activity area = 0 corresponds to an inactive compound,
and 8 corresponds to a compound which had A = -100% at all eight concentrations
points. The variable reference form was introduced to adjust for curves with large
positive activities close to zero concentration, which are usually artifacts of imperfectly
corrected variations on the assay plate. For this measure, the median of all replicate
activity values was used regardless of cell line run day. To prevent confusion, the Activity
Area was calculated using Aref = 0 unless otherwise noted.

## Friday, November 4, 2016

### UTC printing

If you have used our system before you should already have your username and password to login. If you are a first time user or a returning customer and you have questions on how to use the online job submission tool, please call us and we can walk you through the process. If you have any questions, you can call me at x4092 and we will gladly guide you through on how to complete the job ticket. If you have files that need to be uploaded and are too large to upload.  You can do one of two options you can send us the file via email to (rcd061@mocs.utc.edu), or email the file to one of our associates.

We are trying to get everyone used to going through our Trac system, because it’s easy to lose an email. This way everything stays in one place. If you need any assistance, don’t hesitate to call and we can walk you through it.

### ZenHub support

This is a quick note to let you know that our support team is working as quickly as possible to answer your question. We'll have a non-autogenerated answer (from a real live person!) for you within the same business day. :)

In the meantime, you can see if we've addressed your request here:

ZenHub Blog: https://www.zenhub.com/blog/
ZenHub's Public Repo on GitHub: http://github.com/zenhubio/support

Finally, get all our real-time service updates by following us on Twitter:

# The Scroll Bars have Disappeared While Using Blackboard Grade Center with a Mac. How do I fix this?

When using Blackboard Grade Center on a Mac, you may notice that the horizontal scroll bars have disappeared, preventing you from viewing the rest of your grade columns. This issue tends to only exist on versions of Mac OSX 10.7 and above. To fix this follow these steps:
On Mac OS X:
1. Open System Preferences, either from the Dock or from the Apple menu.
2. In the System Preferences, select the General preference pane.
3. The middle section of the General preference pane controls when scroll bars appear.
4. Select "Always" from the Show Scroll Bars options.
Enabling the above feature will keep the scroll bar from automatically hiding.
Search key words: scroll bars scrollbars scrolling scroll can't see all columns gradecenter grade center

## Tuesday, November 1, 2016

### Illumina iGenome FTP

Illumina Provided Genomes

Illumina provides a number of commonly used genomes at ftp.illumina.com along with a reference annotation:
Arabidopsis_thaliana
Bos_taurus
Caenorhabditis_elegans
Canis_familiaris
Drosophila_melanogaster
Equus_caballus
Escherichia_coli_K_12_DH10B
Escherichia_coli_K_12_MG1655
Gallus_gallus
Homo_sapiens
Mus_musculus
Mycobacterium_tuberculosis_H37RV
Pan_troglodytes
PhiX
Rattus_norvegicus
Saccharomyces_cerevisiae
Sus_scrofa
You can login using the following credentials: