Showing posts with label blacklight. Show all posts
Showing posts with label blacklight. Show all posts

Monday, July 20, 2015

backup $SCRATCH from blacklight to byte

log into blacklight through xsede

hqin2@tg-login1:/brashear/hqin2> ls /arc/users/hqin2

0.ginppi.tar.gz  0.tar  mactower-network-failure-simulation-master.zip
/*I moved these files into a new folder /old */

qin2@tg-login1:/arc/users/hqin2> cd $SCRATCH
hqin2@tg-login1:/brashear/hqin2> pwd

/brashear/hqin
hqin2@tg-login1:/brashear/hqin2> tar cvf mactower-network-failure-simulation-master.20150720.tar mactower-network-failure-simulation-master/ &

cp mactower-network-failure-simulation-master.20150720.tar /arc/users/hqin2/.
/* this seems freezes my terminal. */

On byte:
Byte-2:blacklight hqin$ pwd

/Users/hqin/github/mactower-network-failure-simulation/blacklight
scp "hqin2@data.psc.xsede.org:mactower-network-failure-simulation-master.20150720.tar.gz" .







References:
http://hongqinlab.blogspot.com/2015/06/20150623tue-0624wed-0625thu-blacklight.html

blacklight --> greenfield --> bridges,

Notice email:

PSC is preparing to introduce its next-generation XSEDE-allocated system. Bridges is planned to enter production in January 2016 (see http://psc.edu/bridges). 

Blacklight will be decommissioned on August 15, 2015.

For the transition period, PSC will provide Greenfield, a new resource that, like Blacklight, features large shared memory. We are developing the user guide at http://www.psc.edu/index.php/resources-for-users/computing-resources/greenfield. Note that the content of this document is evolving.

While the computational capacity of Greenfield is less than that of Blacklight, we believe that your project can make good use of Greenfield and prepare you to continue on Bridges. Your accounts on Blacklight will remain active until 11pm EDT on August 15. Any files left on Blacklight’s $SCRATCH filesystem after August 15 will be lost. If you have an allocation on the Data Supercell (DSC), it will remain active for the remainder of your current XSEDE grant. DSC will be accessible from Greenfield and then from Bridges.

If you wish to discuss other options, or if you have any questions, please contact remarks@psc.edu
at your earliest convenience.

Friday, June 26, 2015

Tuesday, June 23, 2015

test github on blacklight (does not seem to work)



hqin2@tg-login1:/brashear/hqin2> mkdir blacklight
hqin2@tg-login1:/brashear/hqin2> cd blacklight/
hqin2@tg-login1:/brashear/hqin2/blacklight> touch blacklight-test.txt
hqin2@tg-login1:/brashear/hqin2/blacklight> git init
Initialized empty Git repository in /brashear/hqin2/blacklight/.git/
hqin2@tg-login1:/brashear/hqin2/blacklight> git add *
hqin2@tg-login1:/brashear/hqin2/blacklight> git commit -m "blacklight first"
[master (root-commit) f6be4d0] blacklight first
 Committer: Hong Qin <hqin2@tg-login1.blacklight.psc.teragrid.org>
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly:

    git config --global user.name "Your Name"
    git config --global user.email you@example.com

After doing this, you may fix the identity used for this commit with:

    git commit --amend --reset-author

 1 file changed, 0 insertions(+), 0 deletions(-)

 create mode 100644 blacklight-test.txt


hqin2@tg-login1:/brashear/hqin2/blacklight> git remote add origin https://github.com/hongqin/blacklight.git
hqin2@tg-login1:/brashear/hqin2/blacklight> git push --force origin master 
/* nothing happens after runnign the above lines on blacklight */




20150623Tue, 0624Wed, 0625Thu blacklight ms02 network permutation runs

Plan: Edit files locally, push to github.
          At blacklight, pull from github.

On Byte:
Download zip file from  https://github.com/hongqin/mactower-network-failure-simulation

$ scp mactower-network-failure-simulation-master.zip hqin2@data.psc.xsede.org:./.
... ...
mactower-network-failure-simulation-master.zip                                   7%   56MB   3.5MB/s   03:25 ETA

On blacklight
hqin2@tg-login1:/brashear/hqin2> pwd
/brashear/hqin2
hqin2@tg-login1:/brashear/hqin2> which unzip
/usr/bin/unzip
hqin2@tg-login1:/brashear/hqin2>  unzip /arc/users/hqin2/mactower-network-failure-simulation-master.zip

Archive:  /arc/users/hqin2/mactower-network-failure-simulation-master.zip
... ...

1:55pm. This zip file is not a git repository. So, I try to git clone using the command line at blacklight.  See https://help.github.com/articles/importing-a-git-repository-using-the-command-line/

git clone --bare https://github.com/hongqin/mactower-network-failure-simulation.git 
/*this does not work on blacklight, even though it works on Byte*/

/*try directory for input file through qsub */
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll
-rw-r--r--   1 hqin2 mc48o9p    199 2015-06-23 14:20 R.pbs
-rw-r--r--   1 hqin2 mc48o9p   1193 2015-06-23 14:18 test1.R

hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat R.pbs 
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00

source /usr/share/modules/init/bash
module load R
cd $PBS_O_WORKDIR

echo hostname

ja
R --slave CMD BATCH test1.R /*not right?*/
ja -chlst

2:27pm
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qsub R.pbs 
461387.tg-login1.blacklight.psc.teragrid.org

hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org: 
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461387.tg-login1     hqin2    batch_r  R.pbs          --   --    16    --  00:03 Q   -- 


R2.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00

source /usr/share/modules/init/bash
module load R

cd $SCRATCH
ja
R --slave CMD BATCH ./test1.R
ja -chlst

/*It took about 28 minutes for the job to finish */
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll
total 496
-rw-r--r--   1 hqin2 mc48o9p    176 2015-06-23 14:45 R2.pbs
-rw-------   1 hqin2 mc48o9p      0 2015-06-23 15:02 R2.pbs.e461389
-rw-------   1 hqin2 mc48o9p   5002 2015-06-23 15:03 R2.pbs.o461389
-rw-r--r--   1 hqin2 mc48o9p    195 2015-06-23 14:24 R.pbs
-rw-------   1 hqin2 mc48o9p      0 2015-06-23 15:02 R.pbs.e461387
-rw-------   1 hqin2 mc48o9p   5206 2015-06-23 15:03 R.pbs.o461387
-rw-r--r--   1 hqin2 mc48o9p    160 2015-06-23 18:18 test1.R
-rw-------   1 hqin2 mc48o9p    986 2015-06-23 15:03 test1.Rout


I then use test1.R, test2.R, and test3.R to generate more ms02 network models. 
I need to pass these parameter through command line parameters to R. 

I forgot to change wall time for the two job submssions. 
how to delete a qsub job?

hqin2@tg-login1:/brashear/hqin2/blacklight> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org: 
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461426.tg-login1     hqin2    batch_r  test1.R        --   --    16    --  00:10 Q   -- 
461427.tg-login1     hqin2    batch_r  test2.pbs      --   --    16    --  00:03 Q   -- 
461435.tg-login1     hqin2    batch_r  test3.pbs      --   --    16    --  01:00 Q   -- 
hqin2@tg-login1:/brashear/hqin2/blacklight> qdel 461426.tg-login1
qdel: illegally formed job identifier: 461426.tg-login1
hqin2@tg-login1:/brashear/hqin2/blacklight> qdel 461426
hqin2@tg-login1:/brashear/hqin2/blacklight> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org: 
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461427.tg-login1     hqin2    batch_r  test2.pbs      --   --    16    --  00:03 Q   -- 

461435.tg-login1     hqin2    batch_r  test3.pbs      --   --    16    --  01:00 Q   -- 


/*I then changed wall time and resubmit the first 2 jobs */
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org: 
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461435.tg-login1     hqin2    batch_r  test3.pbs      --   --    16    --  01:00 Q   -- 
461438.tg-login1     hqin2    batch_r  test1.pbs      --   --    16    --  01:00 Q   -- 

461439.tg-login1     hqin2    batch_r  test2.pbs      --   --    16    --  01:00 Q   -- 

There are problems with the write.csv().  Relative directory did not work in qsub.
00:40am, I added explicit path for the outuput file in test2.R.
00:44am qsub test2.pbs

3am. job were run. After 1.5 hours in the queue.
hqin2@tg-login1:/brashear/hqin2> ll
total 32
-rw-------  1 hqin2 mc48o9p   69 2015-06-23 23:15 test1.Rout
-rw-------  1 hqin2 mc48o9p   69 2015-06-24 03:35 test2.Rout

-rw-------  1 hqin2 mc48o9p   69 2015-06-24 03:35 test3.Rout
hqin2@tg-login1:/brashear/hqin2> cat test2.Rout 
Fatal error: cannot open file './test2.R': No such file or directory
/* So, my pbs job submission file has path problems */

4pm. Still no-output files in my intended directory.

/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI

4:36pm
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat test1.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:05:00

source /usr/share/modules/init/bash
module load R

echo hostname

pwd
cd $SCRATCH/mactower-network-failure-simulation-master/ms02GINPPI
pwd

ja
R -f  test1.R > test1.dump.txt
ja -chlst

4:38pm
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qsub test1.pbs
461579.tg-login1.blacklight.psc.teragrid.org
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org: 
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----

461579.tg-login1     hqin2    batch_r  test1.pbs      --   --    16    --  00:05 Q   --

This worked.
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll -ht
total 532K
drwxr-xr-x 102 hqin2 mc48o9p 4.0K 2015-06-24 17:47 dipgin.ms02.output
-rw-------   1 hqin2 mc48o9p 1.9K 2015-06-24 17:47 test1.dump.txt
-rw-------   1 hqin2 mc48o9p 4.8K 2015-06-24 17:47 test1.pbs.o461579
-rw-------   1 hqin2 mc48o9p   39 2015-06-24 17:47 test1.pbs.e461579
-rw-r--r--   1 hqin2 mc48o9p  255 2015-06-24 16:36 test1.pbs

-rw-r--r--   1 hqin2 mc48o9p  784 2015-06-24 16:31 test1.R

June25, 2015
I wrote a new ms02 script that can take parameters in command line. I scp this script to blacklight.
-rw-r--r--   1 hqin2 mc48o9p 1.9K 2015-06-25 00:39 ms02-2015June24.R

hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat ms02.pbs 
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:30:00

source /usr/share/modules/init/bash
module load R

echo hostname

pwd
cd $SCRATCH/mactower-network-failure-simulation-master/ms02GINPPI
pwd

ja
R -f ms02-2015June24.R --args 302 500

ja -chlst

00:49am 
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qsub ms02.pbs 

461610.tg-login1.blacklight.psc.teragrid.org

hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> qstat -u hqin2

tg-login1.blacklight.psc.teragrid.org: 
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
461610.tg-login1     hqin2    batch_r  ms02.pbs    177131  --    16    --  00:30 R   -- 


Total cpus requested from running jobs: 16

I also created two more submission ms02b.pbs and ms02c.pbs.

hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> ll -th 
total 564K
-rw-------   1 hqin2 mc48o9p   94 2015-06-25 03:31 ms02c.pbs.e461613
-rw-------   1 hqin2 mc48o9p   94 2015-06-25 03:31 ms02b.pbs.e461612
drwxr-xr-x 210 hqin2 mc48o9p 4.0K 2015-06-25 03:30 dipgin.ms02.output
-rw-------   1 hqin2 mc48o9p   94 2015-06-25 01:20 ms02.pbs.e461610
-rw-------   1 hqin2 mc48o9p 2.7K 2015-06-25 01:00 ms02c.pbs.o461613
-rw-r--r--   1 hqin2 mc48o9p  263 2015-06-25 01:00 ms02c.pbs
-rw-------   1 hqin2 mc48o9p 2.7K 2015-06-25 01:00 ms02b.pbs.o461612
-rw-r--r--   1 hqin2 mc48o9p  262 2015-06-25 00:59 ms02b.pbs
-rw-------   1 hqin2 mc48o9p 2.7K 2015-06-25 00:49 ms02.pbs.o461610
-rw-r--r--   1 hqin2 mc48o9p  262 2015-06-25 00:47 ms02.pbs
-rw-r--r--   1 hqin2 mc48o9p 1.9K 2015-06-25 00:45 ms02-2015June24.R

It looks like my wall time is too short. 





hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> cat ms02.pbs.e461610
[Previously saved workspace restored]


=>> PBS: job killed: walltime 1865 exceeded limit 1800

My estimations: 
30 minutes is for 15 ms02 models.
150 minutes is for 100 ms02 models  

Based on these estimations, I submitted 8 jobs, with each requesting 4 hours of walltime. 
hqin2@tg-login1:/brashear/hqin2/mactower-network-failure-simulation-master/ms02GINPPI> grep args *pbs
ms02b.pbs:R -f ms02-2015June24.R --args 310 400
ms02c.pbs:R -f ms02-2015June24.R --args 401 500
ms02d.pbs:R -f ms02-2015June24.R --args 501 600
ms02e.pbs:R -f ms02-2015June24.R --args 601 700
ms02f.pbs:R -f ms02-2015June24.R --args 799 800
ms02g.pbs:R -f ms02-2015June24.R --args 800 900
ms02h.pbs:R -f ms02-2015June24.R --args 900 1000

ms02.pbs:R -f ms02-2015June24.R --args 100 200


transfer files to blacklight 20150607 and 20150608

Sunday 20150607

After VPN into Spelman network, at helen.spelman.edu, scp to data.psc.edu using my xsede login works.







At helen scp test.txt hqin2@blacklight.psc.xsede.org:./.








6pm. Somehow, "mv" and "cp" from the login node to $SCRATCH freezes my shell.


================
Monday 20150608

From http://www.psc.edu/index.php/resources-for-users/computing-resources/blacklight
A sample set of commands on your local machine would be
    tar cf sourcedir.tar sourcedir
    scp sourcedir.tar joeuser@data.psc.xsede.org
For 'joeuser' you substitute your PSC userid. You can compress your tarball before you transfer it to speed up your transfer times. Then you could login to blacklight and issue the commands
    cd $SCRATCH
   tar xf /arc/users/joeuser/sourcedir.tar
Again for 'joeuser' you substitute your userid. This will unroll your tar file in your scratch directory.


On my byte laptop without VPN, try scp transfer which will take ~3.5 minutes.
Byte-2:projects hqin$ scp 0.ginppi.tar.gz hqin2@blacklight.psc.xsede.org:./.hqin2@blacklight.psc.xsede.org's password:
0.ginppi.tar.gz                                                                                               100%  386MB   1.9MB/s   03:23

Byte-2:projects hqin$ scp 0.ginppi.tar.gz hqin2@data.psc.edu:./.hqin2@data.psc.edu's password:
0.ginppi.tar.gz                                                                                               100%  386MB   2.1MB/s   03:02



On blacklight, I found data.psc.edu or data.psc.xsede.org is linked to /arc/users/hqin2
hqin2@tg-login1:/brashear/hqin2> ls -lh /arc/users/hqin2total 2.6G
-rw-r--r-- 1 hqin2 mc48o9p 3.6G 2015-06-07 16:50 0.tar
So, this is an entry and exit point for transferring data between my computer and blacklight.

hqin2@tg-login1:~> cd $SCRATCH  (This is probably is a key step)
hqin2@tg-login1:/brashear/hqin2> ls
qin  test.txt
hqin2@tg-login1:/brashear/hqin2> mkdir tmp
hqin2@tg-login1:/brashear/hqin2> cd tmp/
hqin2@tg-login1:/brashear/hqin2/tmp> tar xf /arc/users/hqin2/0.tar

hqin2@tg-login1:/brashear/hqin2/tmp> ll
total 4
drwxr-xr-x 15 hqin2 mc48o9p 4096 2014-07-20 10:39 0.ginppi.reliability.simulation
hqin2@tg-login1:/brashear/hqin2/tmp> du -sh
3.5G    .

 hqin2@tg-login1:/brashear/hqin2> tar xvfz /arc/users/hqin2/0.ginppi.tar.gz

OK, I now know how to transfer files to blacklight.




Sunday, June 7, 2015

qsub PBS on blacklight

$ qsub -I
/*this can takes a while for the job to start. Just have to wait it out.*/