Open Notebook: grch38

Showing posts with label grch38. Show all posts

Friday, November 11, 2016

sh make_grch38.sh

hqin@ridgeside[~/tools/hisat2/scripts]->sh make_grch38.sh
/home/hqin/tools/hisat2/hisat2-build
--2016-11-11 10:38:25-- ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
          => ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-84/fasta/homo_sapiens/dna ... done.
==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214344
==> PASV ... done.    ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done.
Length: 881214344 (840M) (unauthoritative)

Homo_sapiens.GRCh38.dna.primary_asse 100%[===================================================================>] 840.39M 22.6MB/s    in 60s

2016-11-11 10:39:32 (13.9 MB/s) - ‘Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz’ saved [881214344]

Running /home/hqin/tools/hisat2/hisat2-build -p 4 genome.fa genome
Settings:
Output files: "genome.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.fa
Reading reference sizes
Time reading reference sizes: 00:00:41
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:17
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 552346700 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 552346700 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:24
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:14
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:29
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Split 1, merged 0; iterating...
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 3.68231e+08 (target: 552346699)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (552346700) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
Getting block 2 of 8
Getting block 3 of 8
Reserving size (552346700) for bucket 3
Getting block 4 of 8
Reserving size (552346700) for bucket 4
Reserving size (552346700) for bucket 2
Calculating Z arrays for bucket 3
Calculating Z arrays for bucket 4
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 4:
Entering block accumulator loop for bucket 3:
Entering block accumulator loop for bucket 2:
bucket 1: 10%
bucket 2: 10%
bucket 3: 10%
bucket 4: 10%
bucket 1: 20%
bucket 2: 20%
bucket 1: 30%
bucket 3: 20%
bucket 4: 20%
bucket 1: 40%
bucket 2: 30%
bucket 1: 50%
bucket 3: 30%
bucket 2: 40%
bucket 4: 30%
bucket 1: 60%
bucket 2: 50%
bucket 3: 40%
bucket 1: 70%
bucket 4: 40%
bucket 2: 60%
bucket 1: 80%
bucket 3: 50%
bucket 1: 90%
bucket 2: 70%
bucket 4: 50%
bucket 1: 100%
Sorting block of length 291744419 for bucket 1
(Using difference cover)
bucket 3: 60%
bucket 2: 80%
bucket 4: 60%
bucket 3: 70%
bucket 2: 90%
bucket 4: 70%
bucket 2: 100%
Sorting block of length 399816717 for bucket 2
(Using difference cover)
bucket 3: 80%
bucket 4: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 424570505 for bucket 3
(Using difference cover)
bucket 4: 90%
bucket 4: 100%
Sorting block of length 480190664 for bucket 4
(Using difference cover)
Sorting block time: 00:01:40
Returning block of 291744420 for bucket 1
Getting block 5 of 8
Reserving size (552346700) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
Sorting block time: 00:02:23
Returning block of 399816718 for bucket 2
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
Sorting block time: 00:02:29
Returning block of 424570506 for bucket 3
bucket 5: 70%
bucket 5: 80%
Getting block 6 of 8
Reserving size (552346700) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 5: 90%
bucket 6: 10%
bucket 5: 100%
Sorting block of length 398074230 for bucket 5
(Using difference cover)
bucket 6: 20%
Sorting block time: 00:02:56
Returning block of 480190665 for bucket 4
bucket 6: 30%
Getting block 7 of 8
Reserving size (552346700) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 6: 40%
bucket 7: 10%
bucket 6: 50%
bucket 7: 20%
bucket 6: 60%
bucket 7: 30%
bucket 6: 70%
bucket 7: 40%
Getting block 8 of 8
Reserving size (552346700) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 6: 80%
bucket 8: 10%
bucket 7: 50%
bucket 8: 20%
bucket 6: 90%
bucket 7: 60%
bucket 8: 30%
bucket 6: 100%
Sorting block of length 241117192 for bucket 6
(Using difference cover)
bucket 8: 40%
bucket 7: 70%
bucket 8: 50%
bucket 7: 80%
bucket 8: 60%
bucket 8: 70%
bucket 7: 90%
bucket 8: 80%
bucket 7: 100%
Sorting block of length 547672632 for bucket 7
(Using difference cover)
bucket 8: 90%
bucket 8: 100%
Sorting block of length 162662701 for bucket 8
(Using difference cover)
Sorting block time: 00:02:21
Returning block of 398074231 for bucket 5
Sorting block time: 00:01:25
Returning block of 241117193 for bucket 6
Sorting block time: 00:01:00
Returning block of 162662702 for bucket 8
Sorting block time: 00:03:07
Returning block of 547672633 for bucket 7
Exited GFM loop
fchr[A]: 0
fchr[C]: 869653843
fchr[G]: 1470243264
fchr[T]: 2073417374
fchr[$]: 2945849067
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 986172031 bytes to primary GFM file: genome.1.ht2
Wrote 736462272 bytes to secondary GFM file: genome.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 1295322177 bytes to primary GFM file: genome.5.ht2
Wrote 749943562 bytes to secondary GFM file: genome.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
   len: 2945849067
   gbwtLen: 2945849068
   nodes: 2945849068
   sz: 736462267
   gbwtSz: 736462268
   lineRate: 6
   offRate: 4
   offMask: 0xfffffff0
   ftabChars: 10
   eftabLen: 0
   eftabSz: 0
   ftabLen: 1048577
   ftabSz: 4194308
   offsLen: 184115567
   offsSz: 736462268
   lineSz: 64
   sideSz: 64
   sideGbwtSz: 48
   sideGbwtLen: 192
   numSides: 15342964
   numLines: 15342964
   gbwtTotLen: 981949696
   gbwtTotSz: 981949696
   reverse: 0
   linearFM: Yes
Total time for call to driver() for forward index: 00:18:09
genome index built; you may remove fasta files

/* Qin: genome.1.ht2 etc are saved in scripts/ directory */