Friday, November 11, 2016

sh make_grch38.sh

hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh  
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25--  ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
          => ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done.    ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)

Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M  22.6MB/s    in 60s    

2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]

Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
 Output files: "genome.*.ht2"
 Line rate: 6 (line is 64 bytes)
 Lines per side: 1 (side is 64 bytes)
 Offset rate: 4 (one in 16)
 FTable chars: 10
 Strings: unpacked
 Local offset rate: 3 (one in 8)
 Local fTable chars: 6
 Local sequence length: 57344
 Local sequence overlap between two consecutive indexes: 1024
 Endianness: little
 Actual local endianness: little
 Sanity checking: disabled
 Assertions: disabled
 Random seed: 0
 Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
 genome.fa
Reading reference sizes
 Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
 Time to join reference sequences: 00:00:17
 Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
 Doing ahead-of-time memory usage test
 Passed!  Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
 Building sPrime
 Building sPrimeOrder
 V-Sorting samples
 V-Sorting samples time: 00:00:24
 Allocating rank array
 Ranking v-sort output
 Ranking v-sort output time: 00:00:14
 Invoking Larsson-Sadakane on ranks
 Invoking Larsson-Sadakane on ranks time: 00:00:29
 Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
 (Using difference cover)
 Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
 Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
 Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
 Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
 Reserving size (552346700) for bucket 1
 Calculating Z arrays for bucket 1
 Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
 Reserving size (552346700) for bucket 3
Getting block 4 of 8
 Reserving size (552346700) for bucket 4
 Reserving size (552346700) for bucket 2
 Calculating Z arrays for bucket 3
 Calculating Z arrays for bucket 4
 Calculating Z arrays for bucket 2
 Entering block accumulator loop for bucket 4:
 Entering block accumulator loop for bucket 3:
 Entering block accumulator loop for bucket 2:
 bucket 1: 10%
 bucket 2: 10%
 bucket 3: 10%
 bucket 4: 10%
 bucket 1: 20%
 bucket 2: 20%
 bucket 1: 30%
 bucket 3: 20%
 bucket 4: 20%
 bucket 1: 40%
 bucket 2: 30%
 bucket 1: 50%
 bucket 3: 30%
 bucket 2: 40%
 bucket 4: 30%
 bucket 1: 60%
 bucket 2: 50%
 bucket 3: 40%
 bucket 1: 70%
 bucket 4: 40%
 bucket 2: 60%
 bucket 1: 80%
 bucket 3: 50%
 bucket 1: 90%
 bucket 2: 70%
 bucket 4: 50%
 bucket 1: 100%
 Sorting block of length 291744419 for bucket 1
 (Using difference cover)
 bucket 3: 60%
 bucket 2: 80%
 bucket 4: 60%
 bucket 3: 70%
 bucket 2: 90%
 bucket 4: 70%
 bucket 2: 100%
 Sorting block of length 399816717 for bucket 2
 (Using difference cover)
 bucket 3: 80%
 bucket 4: 80%
 bucket 3: 90%
 bucket 3: 100%
 Sorting block of length 424570505 for bucket 3
 (Using difference cover)
 bucket 4: 90%
 bucket 4: 100%
 Sorting block of length 480190664 for bucket 4
 (Using difference cover)
 Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
 Reserving size (552346700) for bucket 5
 Calculating Z arrays for bucket 5
 Entering block accumulator loop for bucket 5:
 bucket 5: 10%
 bucket 5: 20%
 bucket 5: 30%
 Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
 bucket 5: 40%
 bucket 5: 50%
 bucket 5: 60%
 Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
 bucket 5: 70%
 bucket 5: 80%
Getting block 6 of 8
 Reserving size (552346700) for bucket 6
 Calculating Z arrays for bucket 6
 Entering block accumulator loop for bucket 6:
 bucket 5: 90%
 bucket 6: 10%
 bucket 5: 100%
 Sorting block of length 398074230 for bucket 5
 (Using difference cover)
 bucket 6: 20%
 Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
 bucket 6: 30%
Getting block 7 of 8
 Reserving size (552346700) for bucket 7
 Calculating Z arrays for bucket 7
 Entering block accumulator loop for bucket 7:
 bucket 6: 40%
 bucket 7: 10%
 bucket 6: 50%
 bucket 7: 20%
 bucket 6: 60%
 bucket 7: 30%
 bucket 6: 70%
 bucket 7: 40%
Getting block 8 of 8
 Reserving size (552346700) for bucket 8
 Calculating Z arrays for bucket 8
 Entering block accumulator loop for bucket 8:
 bucket 6: 80%
 bucket 8: 10%
 bucket 7: 50%
 bucket 8: 20%
 bucket 6: 90%
 bucket 7: 60%
 bucket 8: 30%
 bucket 6: 100%
 Sorting block of length 241117192 for bucket 6
 (Using difference cover)
 bucket 8: 40%
 bucket 7: 70%
 bucket 8: 50%
 bucket 7: 80%
 bucket 8: 60%
 bucket 8: 70%
 bucket 7: 90%
 bucket 8: 80%
 bucket 7: 100%
 Sorting block of length 547672632 for bucket 7
 (Using difference cover)
 bucket 8: 90%
 bucket 8: 100%
 Sorting block of length 162662701 for bucket 8
 (Using difference cover)
 Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
 Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
 Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
 Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
   len: 2945849067
   gbwtLen: 2945849068
   nodes: 2945849068
   sz: 736462267
   gbwtSz: 736462268
   lineRate: 6
   offRate: 4
   offMask: 0xfffffff0
   ftabChars: 10
   eftabLen: 0
   eftabSz: 0
   ftabLen: 1048577
   ftabSz: 4194308
   offsLen: 184115567
   offsSz: 736462268
   lineSz: 64
   sideSz: 64
   sideGbwtSz: 48
   sideGbwtLen: 192
   numSides: 15342964
   numLines: 15342964
   gbwtTotLen: 981949696
   gbwtTotSz: 981949696
   reverse: 0
   linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files




/* Qin: genome.1.ht2 etc are saved in scripts/ directory */

No comments:

Post a Comment