At blacklight, pull from github.
On Byte:
Download zip file from https://github.com/hongqin/mactower-network-failure-simulation
$ scp mactower-network-failure-simulation-master.zip hqin2@data.psc.xsede.org:./.
... ...
mactower-network-failure-simulation-master.zip 7% 56MB 3.5MB/s 03:25 ETA
On blacklight
hqin2@tg-login1:/brashear/hqin2> pwd
/brashear/hqin2
hqin2@tg-login1:/brashear/hqin2> which unzip
/usr/bin/unzip
hqin2@tg-login1:/brashear/hqin2> unzip /arc/users/hqin2/mactower-network-failure-simulation-master.zip
Archive: /arc/users/hqin2/mactower-network-failure-simulation-master.zip
... ...1:55pm. This zip file is not a git repository. So, I try to git clone using the command line at blacklight. See https://help.github.com/articles/importing-a-git-repository-using-the-command-line/
git clone --bare https://github.com/hongqin/mactower-network-failure-simulation.git
/*this does not work on blacklight, even though it works on Byte*/
/*try directory for input file through qsub */
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll
-rw-r--r-- 1 hqin2 mc48o9p 199 2015-06-23 14:20 R.pbs
-rw-r--r-- 1 hqin2 mc48o9p 1193 2015-06-23 14:18 test1.R
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat R.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00
source /usr/share/modules/init/bash
module load R
cd $PBS_O_WORKDIR
echo hostname
ja
R --slave CMD BATCH test1.R /*not right?*/
ja -chlst
2:27pm
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qsub R.pbs
461387.tg-login1.blacklight.psc.teragrid.org
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461387.tg-login1 hqin2 batch_r R.pbs -- -- 16 -- 00:03 Q --
R2.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00
source /usr/share/modules/init/bash
module load R
cd $SCRATCH
ja
R --slave CMD BATCH ./test1.R
ja -chlst
/*It took about 28 minutes for the job to finish */
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll
total 496
-rw-r--r-- 1 hqin2 mc48o9p 176 2015-06-23 14:45 R2.pbs
-rw------- 1 hqin2 mc48o9p 0 2015-06-23 15:02 R2.pbs.e461389
-rw------- 1 hqin2 mc48o9p 5002 2015-06-23 15:03 R2.pbs.o461389
-rw-r--r-- 1 hqin2 mc48o9p 195 2015-06-23 14:24 R.pbs
-rw------- 1 hqin2 mc48o9p 0 2015-06-23 15:02 R.pbs.e461387
-rw------- 1 hqin2 mc48o9p 5206 2015-06-23 15:03 R.pbs.o461387
-rw-r--r-- 1 hqin2 mc48o9p 160 2015-06-23 18:18 test1.R
-rw------- 1 hqin2 mc48o9p 986 2015-06-23 15:03 test1.Rout
I then use test1.R, test2.R, and test3.R to generate more ms02 network models.
I need to pass these parameter through command line parameters to R.
I forgot to change wall time for the two job submssions.
how to delete a qsub job?
hqin2@tg-login1:/brashear/hqin2/blacklight> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461426.tg-login1 hqin2 batch_r test1.R -- -- 16 -- 00:10 Q --
461427.tg-login1 hqin2 batch_r test2.pbs -- -- 16 -- 00:03 Q --
461435.tg-login1 hqin2 batch_r test3.pbs -- -- 16 -- 01:00 Q --
hqin2@tg-login1:/brashear/hqin2/blacklight> qdel 461426.tg-login1
qdel: illegally formed job identifier: 461426.tg-login1
hqin2@tg-login1:/brashear/hqin2/blacklight> qdel 461426
hqin2@tg-login1:/brashear/hqin2/blacklight> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461427.tg-login1 hqin2 batch_r test2.pbs -- -- 16 -- 00:03 Q --
461435.tg-login1 hqin2 batch_r test3.pbs -- -- 16 -- 01:00 Q --
/*I then changed wall time and resubmit the first 2 jobs */
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461435.tg-login1 hqin2 batch_r test3.pbs -- -- 16 -- 01:00 Q --
461438.tg-login1 hqin2 batch_r test1.pbs -- -- 16 -- 01:00 Q --
461439.tg-login1 hqin2 batch_r test2.pbs -- -- 16 -- 01:00 Q --
There are problems with the write.csv(). Relative directory did not work in qsub.
00:40am, I added explicit path for the outuput file in test2.R.
00:44am qsub test2.pbs
3am. job were run. After 1.5 hours in the queue.
hqin2@tg-login1:/brashear/hqin2> ll
total 32
-rw------- 1 hqin2 mc48o9p 69 2015-06-23 23:15 test1.Rout
-rw------- 1 hqin2 mc48o9p 69 2015-06-24 03:35 test2.Rout
-rw------- 1 hqin2 mc48o9p 69 2015-06-24 03:35 test3.Rout
hqin2@tg-login1:/brashear/hqin2> cat test2.Rout
Fatal error: cannot open file './test2.R': No such file or directory
/* So, my pbs job submission file has path problems */
4pm. Still no-output files in my intended directory.
4:36pm
/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat test1.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:05:00
source /usr/share/modules/init/bash
module load R
echo hostname
pwd
cd $SCRATCH/mactower-network-failure-simulation-master/ms02GINPPI
pwd
ja
R -f test1.R > test1.dump.txt
ja -chlst
4:38pm
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qsub test1.pbs
461579.tg-login1.blacklight.psc.teragrid.org
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461579.tg-login1 hqin2 batch_r test1.pbs -- -- 16 -- 00:05 Q --
This worked.
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll -ht
total 532K
drwxr-xr-x 102 hqin2 mc48o9p 4.0K 2015-06-24 17:47 dipgin.ms02.output
-rw------- 1 hqin2 mc48o9p 1.9K 2015-06-24 17:47 test1.dump.txt
-rw------- 1 hqin2 mc48o9p 4.8K 2015-06-24 17:47 test1.pbs.o461579
-rw------- 1 hqin2 mc48o9p 39 2015-06-24 17:47 test1.pbs.e461579
-rw-r--r-- 1 hqin2 mc48o9p 255 2015-06-24 16:36 test1.pbs
-rw-r--r-- 1 hqin2 mc48o9p 784 2015-06-24 16:31 test1.R
June25, 2015
I wrote a new ms02 script that can take parameters in command line. I scp this script to blacklight.
I wrote a new ms02 script that can take parameters in command line. I scp this script to blacklight.
-rw-r--r-- 1 hqin2 mc48o9p 1.9K 2015-06-25 00:39 ms02-2015June24.R
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat ms02.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:30:00
source /usr/share/modules/init/bash
module load R
echo hostname
pwd
cd $SCRATCH/mactower-network-failure-simulation-master/ms02GINPPI
pwd
ja
R -f ms02-2015June24.R --args 302 500
ja -chlst
00:49am
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qsub ms02.pbs
461610.tg-login1.blacklight.psc.teragrid.org
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461610.tg-login1 hqin2 batch_r ms02.pbs 177131 -- 16 -- 00:30 R --
Total cpus requested from running jobs: 16
I also created two more submission ms02b.pbs and ms02c.pbs.
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll -th
total 564K
-rw------- 1 hqin2 mc48o9p 94 2015-06-25 03:31 ms02c.pbs.e461613
-rw------- 1 hqin2 mc48o9p 94 2015-06-25 03:31 ms02b.pbs.e461612
drwxr-xr-x 210 hqin2 mc48o9p 4.0K 2015-06-25 03:30 dipgin.ms02.output
-rw------- 1 hqin2 mc48o9p 94 2015-06-25 01:20 ms02.pbs.e461610
-rw------- 1 hqin2 mc48o9p 2.7K 2015-06-25 01:00 ms02c.pbs.o461613
-rw-r--r-- 1 hqin2 mc48o9p 263 2015-06-25 01:00 ms02c.pbs
-rw------- 1 hqin2 mc48o9p 2.7K 2015-06-25 01:00 ms02b.pbs.o461612
-rw-r--r-- 1 hqin2 mc48o9p 262 2015-06-25 00:59 ms02b.pbs
-rw------- 1 hqin2 mc48o9p 2.7K 2015-06-25 00:49 ms02.pbs.o461610
-rw-r--r-- 1 hqin2 mc48o9p 262 2015-06-25 00:47 ms02.pbs
-rw-r--r-- 1 hqin2 mc48o9p 1.9K 2015-06-25 00:45 ms02-2015June24.R
It looks like my wall time is too short.
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat ms02.pbs.e461610
[Previously saved workspace restored]
=>> PBS: job killed: walltime 1865 exceeded limit 1800
My estimations:
30 minutes is for 15 ms02 models.
150 minutes is for 100 ms02 models
Based on these estimations, I submitted 8 jobs, with each requesting 4 hours of walltime.
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> grep args *pbs
ms02b.pbs:R -f ms02-2015June24.R --args 310 400
ms02c.pbs:R -f ms02-2015June24.R --args 401 500
ms02d.pbs:R -f ms02-2015June24.R --args 501 600
ms02e.pbs:R -f ms02-2015June24.R --args 601 700
ms02f.pbs:R -f ms02-2015June24.R --args 799 800
ms02g.pbs:R -f ms02-2015June24.R --args 800 900
ms02h.pbs:R -f ms02-2015June24.R --args 900 1000
ms02.pbs:R -f ms02-2015June24.R --args 100 200
No comments:
Post a Comment