In essence, linear ODEs can be represented by dx_i/dt = matrix * X
nonlinear ODEs
http://eqworld.ipmnet.ru/en/solutions/ode/ode-toc3.htm
https://en.wikipedia.org/wiki/Linear_differential_equation
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Wednesday, November 30, 2016
Tuesday, November 29, 2016
*** Control systems engineering, control theory, Laplace transform, observability,
A control system has an input, a process, and an output. It can be open loop or closed loop. Open loop systems do not monitor or correct the output. Closed loop systems can monitor output and make adjustments.
linear time-invariant differential equation
Transfer function is another way of mathematically modeling a system. Transfer function can be derived from the linear, time-invariant differential equation using Laplace transform. Transfer function can only be used for linear systems. (Lapalace transformation was developed as a technique to solve differential equations).
State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.
Test signals with different waveforms can be used to study systems.
The basic analysis of a system is to evaluate the time response of a system.
A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.
In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.
Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables
Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?
Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable.
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t).
linear time-invariant differential equation
State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.
Test signals with different waveforms can be used to study systems.
The basic analysis of a system is to evaluate the time response of a system.
A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.
In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.
Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables
Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?
Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable.
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t).
genome compression
https://en.wikipedia.org/wiki/Compression_of_Genomic_Re-Sequencing_Data
Number theory, data compression for NGS data
Can RSA or other methods be used for NGS sequence compression?
lab meeting
1a) DE gene lists for RNAseq project
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?
It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is
So, "Gene ID" is a standard NCBI number.
1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).
2) time-lapsed image analysis for yeast replicative lifespan
We can use ImageJ, MATlab or R.
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?
It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is
and
So, "Gene ID" is a standard NCBI number.
1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).
2) time-lapsed image analysis for yeast replicative lifespan
We can use ImageJ, MATlab or R.
Monday, November 28, 2016
Sunday, November 27, 2016
Saturday, November 26, 2016
simcenter qinlab tools
"module load qinlab" can add these to $PATH
hqin@ridgeside[~/demo.lgf/
RNAseq.hisat2]->ls /usr/local/qinlab/
bin samtools-1.3.1.tar.bz2
hisat2 share
hisat2-2.0.5 stringtie
hisat2-2.0.5-Linux_x86_64.zip stringtie-1.3.1c.Linux_x86_64
samtools-1.3.1 stringtie-1.3.1c.Linux_x86_64. tar.gz
hqin@ridgeside[~/demo.lgf/
bin samtools-1.3.1.tar.bz2
hisat2 share
hisat2-2.0.5 stringtie
hisat2-2.0.5-Linux_x86_64.zip stringtie-1.3.1c.Linux_x86_64
samtools-1.3.1 stringtie-1.3.1c.Linux_x86_64.
Monday, November 21, 2016
SimCenter mailing address
University of Tennessee at Chattanooga
701 E. 701 ML King Blvd
Chattanooga, TN 37403
RNAseq software installation on qbert or Simcenter clusters
====================For hisat2 and supporting programs
Install hisat2
ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip
Install stringtie 1.3.1c
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.1c.Linux_x86_64.tar.gz
Install samtools
https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2
The above link is from http://www.htslib.org/download/
See also https://github.com/samtools/samtools/releases/
====================For R packages
Under shell, run R
Inside of R:
source("https://bioconductor.org/biocLite.R")
biocLite('ballgown')
install.packages('devtools') #A USA mirror site may be chosen
library(devtools)
devtools::install_github('alyssafrazee/RSkittleBrewer')
========== Testing the installation
Download the test files and codes from
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/
under shell
$ ./rnaseq_pipeline.config.sh
$./rnaseq_pipeline.sh out
=========Additional R packages
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
Install hisat2
ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip
Install stringtie 1.3.1c
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.1c.Linux_x86_64.tar.gz
https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2
The above link is from http://www.htslib.org/download/
See also https://github.com/samtools/samtools/releases/
====================For R packages
Under shell, run R
Inside of R:
source("https://bioconductor.org/biocLite.R")
biocLite('ballgown')
install.packages('devtools') #A USA mirror site may be chosen
library(devtools)
devtools::install_github('alyssafrazee/RSkittleBrewer')
========== Testing the installation
Download the test files and codes from
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/
under shell
$ ./rnaseq_pipeline.config.sh
$./rnaseq_pipeline.sh out
=========Additional R packages
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
#A prompt will ask for a mirror site. Any site from USA should work.
Friday, November 18, 2016
bibtex doi bug
in qin_network.bib, I added a reference with DOI field. This filed generates an error in *bbl file using $bibtex$. I removed the DOI fileds and the bug disappeared.
Wednesday, November 16, 2016
toread, Graph Metrics for Temporal Networks - Springer
http://www.springer.com/cda/content/document/cda_downloaddocument/9783642364600-c1.pdf?SGWID=0-0-45-1393604-p174915729
toread Path Problems in Temporal Graphs
http://www.vldb.org/pvldb/vol7/p721-wu.pdf
Path Problems in Temporal Graphs
Huanhuan Wu∗, James Cheng∗ , Silu Huang∗, Yiping Ke#, Yi Lu∗, Yanyan Xu∗ ∗Department of Computer Science and Engineering, The Chinese University of Hong Kong {hhwu,jcheng,slhuang,ylu,yyxu}@cse.cuhk.edu.hk #Institute of High Performance Computing, Singapore
safety training, UTC
hazardous materials
gasoline can be easily ignited, but diesel is not.
universal waste:
florescent lamp should be recycled.
computer batteries.
motor batteries
Dot hazard marking
Global harmonization container markings
NFPA rating explanation guide, NFPA 704, HMIS
423 425 HELP
gasoline can be easily ignited, but diesel is not.
universal waste:
florescent lamp should be recycled.
computer batteries.
motor batteries
Dot hazard marking
Global harmonization container markings
NFPA rating explanation guide, NFPA 704, HMIS
423 425 HELP
Tuesday, November 15, 2016
integrating gene expression and network, a reference collection
Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF.
Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated.
Ideker02 seems to combine K-means and simulated annealing for network clustering.
Ideker02 seems to combine K-means and simulated annealing for network clustering.
Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289.
Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272.
Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.
Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221
Integrating gene expression and protein-protein interaction network to prioritize cancer-associated
genes, Chao Wu, Jun Zhu and Xuegong Zhang
http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en
Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation
http://www.biomedcentral.com/1755-8794/7/S2/S4
http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html
http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html
http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html
From Ma, 2007 Bioinformatics CGI paper:
WGCNA: an R package for weighted correlation network analysis.
http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html
http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html
http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html
From Ma, 2007 Bioinformatics CGI paper:
Gene expression data and protein interaction data have been
integrated for gene function prediction. For example, Ideker
et al. (2002) used protein interaction data and gene expression
data to screen for differentially expressed subnetworks between
different conditions. In Tornow and Mewes (2003) and Segal
et al. (2003), gene expression data and protein interactions are
used to group genes into functional modules. These methods provide
insights into the regulatory modules of the whole networks at
the systems biology level. However, it is not clear how to adapt their
methods to identify genes contributing to the phenotype of interest.
Morrison et al. (2005) adapted the Google search engine to prioritize
genes for a phenotype by integrating gene expression profiles
and protein interaction data. However, the algorithm ignores the
information from proteins linked to the target protein through other
intermediate proteins, referred to in the rest of this paper as indirect
neighbors.
Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM.
X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002
WGCNA: an R package for weighted correlation network analysis.
Monday, November 14, 2016
RNAseq demo (hisat2, stringtie) error at GBitVec: index 7 out of bounds (size 7) (osX and linux)
Data and codes are downloaded from ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/
Byte-5:sva hqin$ ps
PID TTY TIME CMD
49035 ttys000 0:00.08 -bash
51361 ttys000 0:00.01 bash ./rnaseq_pipeline.sh out
51377 ttys000 0:00.03 bash ./rnaseq_pipeline.sh out
51378 ttys000 0:00.00 tee ./run.log
52036 ttys000 0:00.07 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52044 ttys000 0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52045 ttys000 0:00.49 perl /Users/hqin/bin/hisat2 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/indexes/
52046 ttys000 3:02.29 /Users/hqin/bin/hisat2-align-s --wrapper basic-0 -p 8 --dta -x /Users/hqin/demo.lgf/RNAseq.hisa
52047 ttys000 0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_2.fastq.gz
52048 ttys000 0:00.31 gzip -dc /Users/hqin/demo.lgf/RNAseq.hisat2/chrX_data/samples/ERR188401_chrX_1.fastq.gz
492 ttys001 0:00.21 -bash
51812 ttys002 0:00.02 -bash
51867 ttys002 0:11.01 tar xvfz hg38_tran.tar.gz
Byte-5:samtools hqin$ cd
Byte-5:~ hqin$ cd demo.lgf/
Byte-5:demo.lgf hqin$ cd RNAseq.hisat2/
Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
ERROR: samtools program not found, please edit the configuration script.
Byte-5:RNAseq.hisat2 hqin$ source /Users/hqin/.bash_profile
Byte-5:RNAseq.hisat2 hqin$ ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] #> START: ./rnaseq_pipeline.sh out
[2016-11-14 15:07:24] Processing sample: ERR188044_chrX
[2016-11-14 15:07:24] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:07:56] * Alignments conversion (SAMTools)
[2016-11-14 15:08:40] * Assemble transcripts (StringTie)
[2016-11-14 15:08:51] Processing sample: ERR188104_chrX
[2016-11-14 15:08:51] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:09:37] * Alignments conversion (SAMTools)
[2016-11-14 15:10:24] * Assemble transcripts (StringTie)
[2016-11-14 15:10:36] Processing sample: ERR188234_chrX
[2016-11-14 15:10:36] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:11:11] * Alignments conversion (SAMTools)
[2016-11-14 15:12:23] * Assemble transcripts (StringTie)
[2016-11-14 15:12:45] Processing sample: ERR188245_chrX
[2016-11-14 15:12:45] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:13:45] * Alignments conversion (SAMTools)
[2016-11-14 15:14:40] * Assemble transcripts (StringTie)
[2016-11-14 15:14:51] Processing sample: ERR188257_chrX
[2016-11-14 15:14:51] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:15:42] * Alignments conversion (SAMTools)
[2016-11-14 15:16:50] * Assemble transcripts (StringTie)
[2016-11-14 15:17:04] Processing sample: ERR188273_chrX
[2016-11-14 15:17:04] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:17:50] * Alignments conversion (SAMTools)
[2016-11-14 15:18:34] * Assemble transcripts (StringTie)
[2016-11-14 15:18:44] Processing sample: ERR188337_chrX
[2016-11-14 15:18:44] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:21:18] * Alignments conversion (SAMTools)
[2016-11-14 15:22:44] * Assemble transcripts (StringTie)
[2016-11-14 15:23:09] Processing sample: ERR188383_chrX
[2016-11-14 15:23:09] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:25:31] * Alignments conversion (SAMTools)
[2016-11-14 15:27:13] * Assemble transcripts (StringTie)
[2016-11-14 15:27:36] Processing sample: ERR188401_chrX
[2016-11-14 15:27:36] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:31:58] * Alignments conversion (SAMTools)
[2016-11-14 15:33:53] * Assemble transcripts (StringTie)
[2016-11-14 15:34:12] Processing sample: ERR188428_chrX
[2016-11-14 15:34:12] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:35:46] * Alignments conversion (SAMTools)
[2016-11-14 15:36:44] * Assemble transcripts (StringTie)
[2016-11-14 15:36:59] Processing sample: ERR188454_chrX
[2016-11-14 15:36:59] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:39:12] * Alignments conversion (SAMTools)
[2016-11-14 15:40:34] * Assemble transcripts (StringTie)
[2016-11-14 15:40:50] Processing sample: ERR204916_chrX
[2016-11-14 15:40:50] * Alignment of reads to genome (HISAT2)
[2016-11-14 15:41:50] * Alignments conversion (SAMTools)
[2016-11-14 15:43:03] * Assemble transcripts (StringTie)
[2016-11-14 15:43:18] #> Merge all transcripts (StringTie)
[2016-11-14 15:43:29] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 7 out of bounds (size 7)
Byte-5:RNAseq.hisat2 hqin$
Rerun the shell at Linux (ridgeside) same error:Error at GBitVec: index 7 out of bounds (size 7)
./rnaseq_pipeline.sh: line 82: 7126 Segmentation fault (core dumped) $STRINGTIE -e -B -p $
NUMCPUS -G ${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf ${ALI
GNLOC}/${sample}.bam
Download v.1.3.1b, rerun the shell script at osX
... ...
[2016-11-15 14:04:53] #> Merge all transcripts (StringTie)
[2016-11-15 14:04:57] #> Estimate abundance for each sample (StringTie)
Error at GBitVec: index 9 out of bounds (size 9)
./rnaseq_pipeline.sh: line 82: 2422 Abort trap: 6 $STRINGTIE -e -B -p $NUMCPUS -G ${BALLGOWNLOC}/stringtie_merged.gtf -o ${BALLGOWNLOC}/${dsample}/${dsample}.gtf ${ALIGNLOC}/${sample}.bam
Byte-5:RNAseq.hisat2 hqin$ stringtie -v
Command line was:
stringtie -v
StringTie v1.3.1b usage:
Hisat2 demo
install hisat2
install stringtie
install samtools
#make sure all program are in $PATH
install stringtie
install samtools
#make sure all program are in $PATH
Packages installation in R on ridgeside
Under shell, run R
Inside of R:
source("https://bioconductor.org/biocLite.R")
biocLite('ballgown')
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
Inside of R:
source("https://bioconductor.org/biocLite.R")
biocLite('ballgown')
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
#A prompt will ask for a mirror site. Any site from USA should work.
R install package on ridgeside (failed)
new.packages(repos="http://cran.us.r-projects.org")
install.packages( new.packages(repos="http://cran.us.r-projects.org") ) /*failed*/
install.packages( new.packages(), lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*still failed*/
biocLite( "ballgown", lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*failed*/
install.packages( new.packages(repos="http://cran.us.r-projects.org") ) /*failed*/
install.packages( new.packages(), lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*still failed*/
biocLite( "ballgown", lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*failed*/
source("https://bioconductor.org/biocLite.R")
biocLite()
install.packages("XML", lib.loc="/home/hqin/R/x86_64-pc-linux-gnu-library/3.3") /*failed*/
Install samtools locally on Linux (ridgeside), osX (byte)
Install htslib locally to /home/lib/
Install bcftools.
Edit .cshrc
Biostar suggestion works:
Download samtools-1.3.1 from www.htslib.org/download
https://www.biostars.org/p/173832/
make prefix=/home/hqin/bin
make prefix=/home/hqin/bin install
Official Samtools installation guide (Qin could not follow this one due to user limitation on ridgeside)
https://github.com/samtools/samtools/blob/develop/README.md
==========
osX, byte install
http://www.htslib.org/download/
Download samtools-1.3.1
cd /Downloads/samtools-1.3.1
Byte-5:samtools-1.3.1 hqin$ make prefix=/Users/hqin/bin/samtools
Byte-5:samtools-1.3.1 hqin$ make prefix=/Users/hqin/bin/samtools install
mkdir -p -m 755 /Users/hqin/bin/samtools/bin /Users/hqin/bin/samtools/share/man/man1
install -p samtools misc/ace2sam misc/maq2sam-long misc/maq2sam-short misc/md5fa misc/md5sum-lite misc/wgsim misc/blast2sam.pl misc/bowtie2sam.pl misc/export2sam.pl misc/interpolate_sam.pl misc/novo2sam.pl misc/plot-bamstats misc/psl2sam.pl misc/sam2vcf.pl misc/samtools.pl misc/seq_cache_populate.pl misc/soap2sam.pl misc/varfilter.py misc/wgsim_eval.pl misc/zoom2sam.pl /Users/hqin/bin/samtools/bin
install -p -m 644 samtools.1 misc/wgsim.1 /Users/hqin/bin/samtools/share/man/man1
Install bcftools.
Edit .cshrc
Biostar suggestion works:
Download samtools-1.3.1 from www.htslib.org/download
https://www.biostars.org/p/173832/
make prefix=/home/hqin/bin
make prefix=/home/hqin/bin install
Official Samtools installation guide (Qin could not follow this one due to user limitation on ridgeside)
https://github.com/samtools/samtools/blob/develop/README.md
==========
osX, byte install
http://www.htslib.org/download/
Download samtools-1.3.1
cd /Downloads/samtools-1.3.1
Byte-5:samtools-1.3.1 hqin$ make prefix=/Users/hqin/bin/samtools
Byte-5:samtools-1.3.1 hqin$ make prefix=/Users/hqin/bin/samtools install
mkdir -p -m 755 /Users/hqin/bin/samtools/bin /Users/hqin/bin/samtools/share/man/man1
install -p samtools misc/ace2sam misc/maq2sam-long misc/maq2sam-short misc/md5fa misc/md5sum-lite misc/wgsim misc/blast2sam.pl misc/bowtie2sam.pl misc/export2sam.pl misc/interpolate_sam.pl misc/novo2sam.pl misc/plot-bamstats misc/psl2sam.pl misc/sam2vcf.pl misc/samtools.pl misc/seq_cache_populate.pl misc/soap2sam.pl misc/varfilter.py misc/wgsim_eval.pl misc/zoom2sam.pl /Users/hqin/bin/samtools/bin
install -p -m 644 samtools.1 misc/wgsim.1 /Users/hqin/bin/samtools/share/man/man1
Friday, November 11, 2016
sh make_grch38.sh
hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25-- ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
=> ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)
Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M 22.6MB/s in 60s
2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]
Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
Output files: "genome.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.fa
Reading reference sizes
Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:17
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:24
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:14
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:29
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (552346700) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
Reserving size (552346700) for bucket 3
Getting block 4 of 8
Reserving size (552346700) for bucket 4
Reserving size (552346700) for bucket 2
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 4
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 4:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 2:
bucket 1: 10%
bucket 2: 10%
bucket 3: 10%
bucket 4: 10%
bucket 1: 20%
bucket 2: 20%
bucket 1: 30%
bucket 3: 20%
bucket 4: 20%
bucket 1: 40%
bucket 2: 30%
bucket 1: 50%
bucket 3: 30%
bucket 2: 40%
bucket 4: 30%
bucket 1: 60%
bucket 2: 50%
bucket 3: 40%
bucket 1: 70%
bucket 4: 40%
bucket 2: 60%
bucket 1: 80%
bucket 3: 50%
bucket 1: 90%
bucket 2: 70%
bucket 4: 50%
bucket 1: 100%
Sorting block of length 291744419 for bucket 1
(Using difference cover)
bucket 3: 60%
bucket 2: 80%
bucket 4: 60%
bucket 3: 70%
bucket 2: 90%
bucket 4: 70%
bucket 2: 100%
Sorting block of length 399816717 for bucket 2
(Using difference cover)
bucket 3: 80%
bucket 4: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 424570505 for bucket 3
(Using difference cover)
bucket 4: 90%
bucket 4: 100%
Sorting block of length 480190664 for bucket 4
(Using difference cover)
Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
Reserving size (552346700) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
bucket 5: 70%
bucket 5: 80%
Getting block 6 of 8
Reserving size (552346700) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 90%
bucket 6: 10%
bucket 5: 100%
Sorting block of length 398074230 for bucket 5
(Using difference cover)
bucket 6: 20%
Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
bucket 6: 30%
Getting block 7 of 8
Reserving size (552346700) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 40%
bucket 7: 10%
bucket 6: 50%
bucket 7: 20%
bucket 6: 60%
bucket 7: 30%
bucket 6: 70%
bucket 7: 40%
Getting block 8 of 8
Reserving size (552346700) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 8: 10%
bucket 7: 50%
bucket 8: 20%
bucket 6: 90%
bucket 7: 60%
bucket 8: 30%
bucket 6: 100%
Sorting block of length 241117192 for bucket 6
(Using difference cover)
bucket 8: 40%
bucket 7: 70%
bucket 8: 50%
bucket 7: 80%
bucket 8: 60%
bucket 8: 70%
bucket 7: 90%
bucket 8: 80%
bucket 7: 100%
Sorting block of length 547672632 for bucket 7
(Using difference cover)
bucket 8: 90%
bucket 8: 100%
Sorting block of length 162662701 for bucket 8
(Using difference cover)
Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 2945849067
gbwtLen: 2945849068
nodes: 2945849068
sz: 736462267
gbwtSz: 736462268
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 184115567
offsSz: 736462268
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15342964
numLines: 15342964
gbwtTotLen: 981949696
gbwtTotSz: 981949696
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files
/* Qin: genome.1.ht2 etc are saved in scripts/ directory */
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25-- ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
=> ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)
Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M 22.6MB/s in 60s
2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]
Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
Output files: "genome.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.fa
Reading reference sizes
Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:17
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:24
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:14
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:29
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (552346700) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
Reserving size (552346700) for bucket 3
Getting block 4 of 8
Reserving size (552346700) for bucket 4
Reserving size (552346700) for bucket 2
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 4
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 4:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 2:
bucket 1: 10%
bucket 2: 10%
bucket 3: 10%
bucket 4: 10%
bucket 1: 20%
bucket 2: 20%
bucket 1: 30%
bucket 3: 20%
bucket 4: 20%
bucket 1: 40%
bucket 2: 30%
bucket 1: 50%
bucket 3: 30%
bucket 2: 40%
bucket 4: 30%
bucket 1: 60%
bucket 2: 50%
bucket 3: 40%
bucket 1: 70%
bucket 4: 40%
bucket 2: 60%
bucket 1: 80%
bucket 3: 50%
bucket 1: 90%
bucket 2: 70%
bucket 4: 50%
bucket 1: 100%
Sorting block of length 291744419 for bucket 1
(Using difference cover)
bucket 3: 60%
bucket 2: 80%
bucket 4: 60%
bucket 3: 70%
bucket 2: 90%
bucket 4: 70%
bucket 2: 100%
Sorting block of length 399816717 for bucket 2
(Using difference cover)
bucket 3: 80%
bucket 4: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 424570505 for bucket 3
(Using difference cover)
bucket 4: 90%
bucket 4: 100%
Sorting block of length 480190664 for bucket 4
(Using difference cover)
Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
Reserving size (552346700) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
bucket 5: 70%
bucket 5: 80%
Getting block 6 of 8
Reserving size (552346700) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 90%
bucket 6: 10%
bucket 5: 100%
Sorting block of length 398074230 for bucket 5
(Using difference cover)
bucket 6: 20%
Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
bucket 6: 30%
Getting block 7 of 8
Reserving size (552346700) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 40%
bucket 7: 10%
bucket 6: 50%
bucket 7: 20%
bucket 6: 60%
bucket 7: 30%
bucket 6: 70%
bucket 7: 40%
Getting block 8 of 8
Reserving size (552346700) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 8: 10%
bucket 7: 50%
bucket 8: 20%
bucket 6: 90%
bucket 7: 60%
bucket 8: 30%
bucket 6: 100%
Sorting block of length 241117192 for bucket 6
(Using difference cover)
bucket 8: 40%
bucket 7: 70%
bucket 8: 50%
bucket 7: 80%
bucket 8: 60%
bucket 8: 70%
bucket 7: 90%
bucket 8: 80%
bucket 7: 100%
Sorting block of length 547672632 for bucket 7
(Using difference cover)
bucket 8: 90%
bucket 8: 100%
Sorting block of length 162662701 for bucket 8
(Using difference cover)
Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 2945849067
gbwtLen: 2945849068
nodes: 2945849068
sz: 736462267
gbwtSz: 736462268
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 184115567
offsSz: 736462268
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15342964
numLines: 15342964
gbwtTotLen: 981949696
gbwtTotSz: 981949696
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files
/* Qin: genome.1.ht2 etc are saved in scripts/ directory */
sh make_grch38.sh
hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25-- ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
=> ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)
Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M 22.6MB/s in 60s
2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]
Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
Output files: "genome.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.fa
Reading reference sizes
Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:17
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:24
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:14
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:29
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (552346700) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
Reserving size (552346700) for bucket 3
Getting block 4 of 8
Reserving size (552346700) for bucket 4
Reserving size (552346700) for bucket 2
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 4
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 4:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 2:
bucket 1: 10%
bucket 2: 10%
bucket 3: 10%
bucket 4: 10%
bucket 1: 20%
bucket 2: 20%
bucket 1: 30%
bucket 3: 20%
bucket 4: 20%
bucket 1: 40%
bucket 2: 30%
bucket 1: 50%
bucket 3: 30%
bucket 2: 40%
bucket 4: 30%
bucket 1: 60%
bucket 2: 50%
bucket 3: 40%
bucket 1: 70%
bucket 4: 40%
bucket 2: 60%
bucket 1: 80%
bucket 3: 50%
bucket 1: 90%
bucket 2: 70%
bucket 4: 50%
bucket 1: 100%
Sorting block of length 291744419 for bucket 1
(Using difference cover)
bucket 3: 60%
bucket 2: 80%
bucket 4: 60%
bucket 3: 70%
bucket 2: 90%
bucket 4: 70%
bucket 2: 100%
Sorting block of length 399816717 for bucket 2
(Using difference cover)
bucket 3: 80%
bucket 4: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 424570505 for bucket 3
(Using difference cover)
bucket 4: 90%
bucket 4: 100%
Sorting block of length 480190664 for bucket 4
(Using difference cover)
Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
Reserving size (552346700) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
bucket 5: 70%
bucket 5: 80%
Getting block 6 of 8
Reserving size (552346700) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 90%
bucket 6: 10%
bucket 5: 100%
Sorting block of length 398074230 for bucket 5
(Using difference cover)
bucket 6: 20%
Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
bucket 6: 30%
Getting block 7 of 8
Reserving size (552346700) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 40%
bucket 7: 10%
bucket 6: 50%
bucket 7: 20%
bucket 6: 60%
bucket 7: 30%
bucket 6: 70%
bucket 7: 40%
Getting block 8 of 8
Reserving size (552346700) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 8: 10%
bucket 7: 50%
bucket 8: 20%
bucket 6: 90%
bucket 7: 60%
bucket 8: 30%
bucket 6: 100%
Sorting block of length 241117192 for bucket 6
(Using difference cover)
bucket 8: 40%
bucket 7: 70%
bucket 8: 50%
bucket 7: 80%
bucket 8: 60%
bucket 8: 70%
bucket 7: 90%
bucket 8: 80%
bucket 7: 100%
Sorting block of length 547672632 for bucket 7
(Using difference cover)
bucket 8: 90%
bucket 8: 100%
Sorting block of length 162662701 for bucket 8
(Using difference cover)
Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 2945849067
gbwtLen: 2945849068
nodes: 2945849068
sz: 736462267
gbwtSz: 736462268
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 184115567
offsSz: 736462268
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15342964
numLines: 15342964
gbwtTotLen: 981949696
gbwtTotSz: 981949696
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files
/* Qin: genome.1.ht2 etc are saved in scripts/ directory */
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25-- ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
=> ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)
Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M 22.6MB/s in 60s
2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]
Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
Output files: "genome.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.fa
Reading reference sizes
Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:17
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:24
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:14
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:29
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (552346700) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
Reserving size (552346700) for bucket 3
Getting block 4 of 8
Reserving size (552346700) for bucket 4
Reserving size (552346700) for bucket 2
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 4
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 4:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 2:
bucket 1: 10%
bucket 2: 10%
bucket 3: 10%
bucket 4: 10%
bucket 1: 20%
bucket 2: 20%
bucket 1: 30%
bucket 3: 20%
bucket 4: 20%
bucket 1: 40%
bucket 2: 30%
bucket 1: 50%
bucket 3: 30%
bucket 2: 40%
bucket 4: 30%
bucket 1: 60%
bucket 2: 50%
bucket 3: 40%
bucket 1: 70%
bucket 4: 40%
bucket 2: 60%
bucket 1: 80%
bucket 3: 50%
bucket 1: 90%
bucket 2: 70%
bucket 4: 50%
bucket 1: 100%
Sorting block of length 291744419 for bucket 1
(Using difference cover)
bucket 3: 60%
bucket 2: 80%
bucket 4: 60%
bucket 3: 70%
bucket 2: 90%
bucket 4: 70%
bucket 2: 100%
Sorting block of length 399816717 for bucket 2
(Using difference cover)
bucket 3: 80%
bucket 4: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 424570505 for bucket 3
(Using difference cover)
bucket 4: 90%
bucket 4: 100%
Sorting block of length 480190664 for bucket 4
(Using difference cover)
Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
Reserving size (552346700) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
bucket 5: 70%
bucket 5: 80%
Getting block 6 of 8
Reserving size (552346700) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 90%
bucket 6: 10%
bucket 5: 100%
Sorting block of length 398074230 for bucket 5
(Using difference cover)
bucket 6: 20%
Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
bucket 6: 30%
Getting block 7 of 8
Reserving size (552346700) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 40%
bucket 7: 10%
bucket 6: 50%
bucket 7: 20%
bucket 6: 60%
bucket 7: 30%
bucket 6: 70%
bucket 7: 40%
Getting block 8 of 8
Reserving size (552346700) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 8: 10%
bucket 7: 50%
bucket 8: 20%
bucket 6: 90%
bucket 7: 60%
bucket 8: 30%
bucket 6: 100%
Sorting block of length 241117192 for bucket 6
(Using difference cover)
bucket 8: 40%
bucket 7: 70%
bucket 8: 50%
bucket 7: 80%
bucket 8: 60%
bucket 8: 70%
bucket 7: 90%
bucket 8: 80%
bucket 7: 100%
Sorting block of length 547672632 for bucket 7
(Using difference cover)
bucket 8: 90%
bucket 8: 100%
Sorting block of length 162662701 for bucket 8
(Using difference cover)
Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 2945849067
gbwtLen: 2945849068
nodes: 2945849068
sz: 736462267
gbwtSz: 736462268
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 184115567
offsSz: 736462268
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 15342964
numLines: 15342964
gbwtTotLen: 981949696
gbwtTotSz: 981949696
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files
/* Qin: genome.1.ht2 etc are saved in scripts/ directory */
Thursday, November 10, 2016
physical activity, Cochrane public health
Michel 2014,
->money, increase attendance, but not physical activities.
smart app, wearable devices: not increase physical activities
Qui 2015, goal setting can be effective.
Brown 2016: social support is likely effective.
Evidence-informed decision making
->money, increase attendance, but not physical activities.
smart app, wearable devices: not increase physical activities
Qui 2015, goal setting can be effective.
Brown 2016: social support is likely effective.
Evidence-informed decision making
todo: Elastic net method
33 Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J
Roy Stat Soc B 67, 301-320, (2005).
34 Zou, H. & Zhang, H. H. On the Adaptive Elastic-Net with a Diverging Number of
Parameters. Ann Stat 37, 1733-1751, (2009).
https://www.r-bloggers.com/kickin-it-with-elastic-net-regression/
"Elastic net regression is a hybrid approach that blends both penalization of the L2 and L1 norms."
https://www.r-bloggers.com/kickin-it-with-elastic-net-regression/
"Ridge regression is a really effective technique for thwarting overfitting. It does this by penalizing the L2 norm (euclidean distance) of the coefficient vector which results in “shrinking” the beta coefficients. The aggressiveness of the penalty is controlled by a parameter ."
"Lasso regression is a related regularization method. Instead of using the L2 norm, though, it penalizes the L1 norm (manhattan distance) of the coefficient vector."
Barretina 2012 CCL enables predictive modeling of anticancer drug sensitivity
Barrentina 2012 Nature. The CancerCell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
There are 8 point does-response across 479 cell lines. A logistical sigmoidal function with maximal effect A_max, concentration at half-maximal activity of the compound (EC50), and a Hill coefficient represeting the sigmoidal transition, and the concentration of an absolute inhibition of 50% (IC50).
947 cell lines were profiled at genomes and expression levels.
Amazingly, Barrentian12 used the same logistical model with Qin08PONE:
There are 8 point does-response across 479 cell lines. A logistical sigmoidal function with maximal effect A_max, concentration at half-maximal activity of the compound (EC50), and a Hill coefficient represeting the sigmoidal transition, and the concentration of an absolute inhibition of 50% (IC50).
947 cell lines were profiled at genomes and expression levels.
Amazingly, Barrentian12 used the same logistical model with Qin08PONE:
All dose-response data was reduced to a fitted model using a decision tree
methodology based on the NIH/NCGC assay guidelines
(http://assay.nih.gov/assay/index.php/Table_of_Contents). Models were generated for the
duplicate data points generated for each cell line run day. In brief, dose-response data was
fitted to one of three models depending on the statistical quality of the fits measured
using a Chi-squared test. One approach was the 4 parameter sigmoid model shown
below:
Alternatively, a constant model y = Ainf was employed; or a non-parametric spline
interpolation of the data points was performed (note that this last model represents less
than 5% of models). In these models, A0 and Ainf are the top and bottom asymptotes of the
response; EC50 is the inflection point of the curve; and Hill is the Hill slope, which
describes the steepness of the curve. Other key parameters derived from the models
include the IC50, the concentration where the fitted curve crosses -50%; and Amax, which
is the maximal activity value reached within a model. For the spline interpolation model,
For the spline interpolation model, IC50 and EC50 parameters were both set to the concentration where the fitted model first
crosses -50%. Additionally, we calculated two forms of the Activity area for each curve,
defined as the area between the response curve and a fixed reference Aref = 0 or a variable
reference Aref = max(0, Alow) where Alow is the activity at the lowest concentration, up to
the maximum tested concentration. In practice, the Activity area was calculated as the
sum of differences between the measured Ai at concentration i and the reference level.
Thus, using the fixed reference, Activity area = 0 corresponds to an inactive compound,
and 8 corresponds to a compound which had A = -100% at all eight concentrations
points. The variable reference form was introduced to adjust for curves with large
positive activities close to zero concentration, which are usually artifacts of imperfectly
corrected variations on the assay plate. For this measure, the median of all replicate
activity values was used regardless of cell line run day. To prevent confusion, the Activity
Area was calculated using Aref = 0 unless otherwise noted.
Friday, November 4, 2016
UTC printing
To better serve you we request that you use our online TRAC system to submit your job. If you have previously submitted a job you should already be in our system. Most usernames are your UTC ID and the password you entered at the time you set up the account. Most users leave the default password which was “password,” both the username and password, are case sensitive. If you have yet to use our online website to submit a job, you can do so by creating a new user account by visiting the web address (https://utc.ricohtrac.com) or you can access our page through the UTC Website, once on the site simply click on the search engine and type in Graphic and Mail Services. Click on our link and you will be redirected to our website. Once on our site, click the tab named “Support”. It’s one of the blue and gold icons on the left side of the screen. Scroll down to the middle of the page and click on, “Submit a job-Go to Trac now,” this will take you to another screen that allows you to create a new user account. Once you have entered your new user info, we will receive the request for a new user account, we can then approve your request electronically and you will receive an email confirming the approval.
If you have used our system before you should already have your username and password to login. If you are a first time user or a returning customer and you have questions on how to use the online job submission tool, please call us and we can walk you through the process. If you have any questions, you can call me at x4092 and we will gladly guide you through on how to complete the job ticket. If you have files that need to be uploaded and are too large to upload. You can do one of two options you can send us the file via email to (rcd061@mocs.utc.edu), or email the file to one of our associates.
We are trying to get everyone used to going through our Trac system, because it’s easy to lose an email. This way everything stays in one place. If you need any assistance, don’t hesitate to call and we can walk you through it.
ZenHub support
This is a quick note to let you know that our support team is working as quickly as possible to answer your question. We'll have a non-autogenerated answer (from a real live person!) for you within the same business day. :)
In the meantime, you can see if we've addressed your request here:
ZenHub Blog: https://www.zenhub.com/blog/
ZenHub's Public Repo on GitHub: http://github.com/zenhubio/
Finally, get all our real-time service updates by following us on Twitter:
https://twitter.com/zenhubhq
Thursday, November 3, 2016
backboard scrolling bar, mac
The Scroll Bars have Disappeared While Using Blackboard Grade Center with a Mac. How do I fix this?
When using Blackboard Grade Center on a Mac, you may notice that the horizontal scroll bars have disappeared, preventing you from viewing the rest of your grade columns. This issue tends to only exist on versions of Mac OSX 10.7 and above. To fix this follow these steps:
On Mac OS X:
- Open System Preferences, either from the Dock or from the Apple menu.
- In the System Preferences, select the General preference pane.
- The middle section of the General preference pane controls when scroll bars appear.
- Select "Always" from the Show Scroll Bars options.
Enabling the above feature will keep the scroll bar from automatically hiding.
Search key words: scroll bars scrollbars scrolling scroll can't see all columns gradecenter grade center
Tuesday, November 1, 2016
Illumina iGenome FTP
Illumina Provided Genomes
Illumina provides a number of commonly used genomes at ftp.illumina.com along with a reference annotation:
• Arabidopsis_thaliana
• Bos_taurus
• Caenorhabditis_elegans
• Canis_familiaris
• Drosophila_melanogaster
• Equus_caballus
• Escherichia_coli_K_12_DH10B
• Escherichia_coli_K_12_MG1655
• Gallus_gallus
• Homo_sapiens
• Mus_musculus
• Mycobacterium_tuberculosis_H37RV
• Pan_troglodytes
• PhiX
• Rattus_norvegicus
• Saccharomyces_cerevisiae
• Sus_scrofa
You can login using the following credentials:
• Username: igenome
• Password: G3nom3s4u
For example, download the FASTA, annotation, and bowtie index files for the human hg18 genome from the iGenomes repository with the following commands:
>wget --ftp-user=igenome --ftp-password=G3nom3s4u ftp://ftp.illumina.com/Homo_sapiens/UCSC/hg18/Homo_sapiens_UCSC_hg18.tar.gz
Unpack the tar file:
tar xvzf Homo_sapiens_UCSC_hg18.tar.gz
Unpacking will make its own folder
Homo_sapiens/UCSC/hg18
Illumina provides a number of commonly used genomes at ftp.illumina.com along with a reference annotation:
• Arabidopsis_thaliana
• Bos_taurus
• Caenorhabditis_elegans
• Canis_familiaris
• Drosophila_melanogaster
• Equus_caballus
• Escherichia_coli_K_12_DH10B
• Escherichia_coli_K_12_MG1655
• Gallus_gallus
• Homo_sapiens
• Mus_musculus
• Mycobacterium_tuberculosis_H37RV
• Pan_troglodytes
• PhiX
• Rattus_norvegicus
• Saccharomyces_cerevisiae
• Sus_scrofa
You can login using the following credentials:
• Username: igenome
• Password: G3nom3s4u
For example, download the FASTA, annotation, and bowtie index files for the human hg18 genome from the iGenomes repository with the following commands:
>wget --ftp-user=igenome --ftp-password=G3nom3s4u ftp://ftp.illumina.com/Homo_sapiens/UCSC/hg18/Homo_sapiens_UCSC_hg18.tar.gz
Unpack the tar file:
tar xvzf Homo_sapiens_UCSC_hg18.tar.gz
Unpacking will make its own folder
Homo_sapiens/UCSC/hg18
Subscribe to:
Posts (Atom)