Instructions:
"Once you login you will be in your $HOME directory
(/usr/users/1/hqin2) which is backed up but has a quota of 5 Gbytes.
You also have access to a $SCRATCH directory (/brashear/hqin2) which
has essentially unlimited storage and is not backed up. Files in
$SCRATCH may be removed, oldest first, to make room when needed,
though we try to keep them for 2-weeks at least.
There is a file archiver, you can access it as the directory /arc/users/hqin2/ from the login node, where you can store whatever you need to keep long-term (while your allocation is active, of course). You can also connect to the archiver via sftp, at data.psc.edu. You can use Fugu or any other graphical user interface if you prefer. This is the simplest way to transfer files to PSC, you can see them in the /arc directory from the login node and copy them to/from the $HOME or $SCRATCH directory as needed.
When you run and write data, we prefer that you write to $SCRATCH, which is a distributed file system and can handle the load, and not to $HOME."
There is a file archiver, you can access it as the directory /arc/users/hqin2/ from the login node, where you can store whatever you need to keep long-term (while your allocation is active, of course). You can also connect to the archiver via sftp, at data.psc.edu. You can use Fugu or any other graphical user interface if you prefer. This is the simplest way to transfer files to PSC, you can see them in the /arc directory from the login node and copy them to/from the $HOME or $SCRATCH directory as needed.
When you run and write data, we prefer that you write to $SCRATCH, which is a distributed file system and can handle the load, and not to $HOME."
hqin2@tg-login1:~> echo $HOME
/usr/users/1/hqin2
hqin2@tg-login1:~> echo $SCRATCH
/brashear/hqin2
hqin2@tg-login1:~> du /arc/users/hqin2
2 /arc/users/hqin2
hqin2@tg-login1:~> df /arc/users/hqin2
Filesystem 1K-blocks Used Available Use% Mounted on
/arc 3656882477312 2021505932032 1635376545280 56% /arc
hqin2@tg-login1:~> df -h /arc/users/hqin2
Filesystem Size Used Avail Use% Mounted on
/arc 3.4P 1.9P 1.5P 56% /arc
Instructions:
"Look at this webpage:
http://www.psc.edu/index.php/ computing-resources/blacklight
it has examples of scripts for running batch jobs, in particular I think you will want to run an 'interactive batch job' to check that your code works.
qsub -I -l ncpus=16 -l walltime=0:30:00 -q debug
once you get a prompt, you are on the 'backend', or 'compute node', i.e. Blacklight proper, and everything runs there, not on the login node.
Let's say I have a trivial R example:
y <- rnorm(10)
print(y)
this is saved in a file (example.R), and I want to run it. So I type the 'qsub ....' command above, and after I got an interactive prompt, enter the following;
source /usr/share/modules/init/bash
module load R
R --slave CMD BATCH ./example.R
and the output appears in 'example.Rout'. OK, so I'm done. To get out of the 'compute node', I type 'exit' and press enter.
The first two lines (source ... ; module ...) load the definition of the 'module' command, the second uses the module command to put (a version of) R in my path, and the last executes the R script in batch mode.
Once I have figured out that everything is working, I can run the script in full batch mode (non-interactively) by putting this into a PBS script, i.e. a file, let's call it 'R.pbs':
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00
source /usr/share/modules/init/bash
module load R
cd $PBS_O_WORKDIR
ja
R --slave CMD BATCH ./example.R
ja -chlst
So you are just entering the commands you typed interactively, after a line that indicates what 'shell' you want to run under, and some options to the batch scheduler (the number of cores, and the minutes, which you had entered on the command line before). What is new is the "cd $PBS_O_WORKDIR" which makes the script start on whatever directory you were when you submitted the command. Also, the couple of lines "ja" and "ja -chlst" surrounding the call to R. They are not essential, but collect useful information on the job (maximum amount of memory, time spent, cpu time used, etc.)
So you have this script called 'R.pbs', and you can submit it to the scheduler with the command
qsub R.pbs
The scheduler will reply with something like:
394363.tg-login1.blacklight. psc.teragrid.org
the number is the 'job ID' of your PBS job, which you can use to ask for more information from the scheduler. You can always ask it 'what jobs do I have in the queue' like this:
qstat -u hqin2
and it will list them all, together with the state (R means running, Q means it still in the queue). If it lists nothing, it means all your jobs completed. After the job completed, there should appear a couple of files in the directory where you put the script. Since I didn't use any option to give the job a name, the files would be named {script name}.e#### and {script name}.o####, in the example that would be R.pbs.o########## and R.pbs.e#######. The 'o' file has any output that the job would write to the standard output, the 'e' file anything that would normally go to the standard error file. You can also redirect output from any command in the job script to a file. "
http://www.psc.edu/index.php/
it has examples of scripts for running batch jobs, in particular I think you will want to run an 'interactive batch job' to check that your code works.
qsub -I -l ncpus=16 -l walltime=0:30:00 -q debug
once you get a prompt, you are on the 'backend', or 'compute node', i.e. Blacklight proper, and everything runs there, not on the login node.
Let's say I have a trivial R example:
y <- rnorm(10)
print(y)
this is saved in a file (example.R), and I want to run it. So I type the 'qsub ....' command above, and after I got an interactive prompt, enter the following;
source /usr/share/modules/init/bash
module load R
R --slave CMD BATCH ./example.R
and the output appears in 'example.Rout'. OK, so I'm done. To get out of the 'compute node', I type 'exit' and press enter.
The first two lines (source ... ; module ...) load the definition of the 'module' command, the second uses the module command to put (a version of) R in my path, and the last executes the R script in batch mode.
Once I have figured out that everything is working, I can run the script in full batch mode (non-interactively) by putting this into a PBS script, i.e. a file, let's call it 'R.pbs':
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00
source /usr/share/modules/init/bash
module load R
cd $PBS_O_WORKDIR
ja
R --slave CMD BATCH ./example.R
ja -chlst
So you are just entering the commands you typed interactively, after a line that indicates what 'shell' you want to run under, and some options to the batch scheduler (the number of cores, and the minutes, which you had entered on the command line before). What is new is the "cd $PBS_O_WORKDIR" which makes the script start on whatever directory you were when you submitted the command. Also, the couple of lines "ja" and "ja -chlst" surrounding the call to R. They are not essential, but collect useful information on the job (maximum amount of memory, time spent, cpu time used, etc.)
So you have this script called 'R.pbs', and you can submit it to the scheduler with the command
qsub R.pbs
The scheduler will reply with something like:
394363.tg-login1.blacklight.
the number is the 'job ID' of your PBS job, which you can use to ask for more information from the scheduler. You can always ask it 'what jobs do I have in the queue' like this:
qstat -u hqin2
and it will list them all, together with the state (R means running, Q means it still in the queue). If it lists nothing, it means all your jobs completed. After the job completed, there should appear a couple of files in the directory where you put the script. Since I didn't use any option to give the job a name, the files would be named {script name}.e#### and {script name}.o####, in the example that would be R.pbs.o########## and R.pbs.e#######. The 'o' file has any output that the job would write to the standard output, the 'e' file anything that would normally go to the standard error file. You can also redirect output from any command in the job script to a file. "
source /usr/share/modules/init/bash
module load R
module load R
R --slave CMD BATCH ./example.R
hqin2@tg-login1:~> ll example.R* #output is example.Rout
-rw-r--r-- 1 hqin2 mc48o9p 24 2015-01-13 20:47 example.R
-rw-r--r-- 1 hqin2 mc48o9p 942 2015-01-13 20:48 example.Rout
hqin2@tg-login1:~> nano -w R.pbs
hqin2@tg-login1:~> pwd
/usr/users/1/hqin2
hqin2@tg-login1:~> qsub R.pbs
418673.tg-login1.blacklight.psc.teragrid.org
hqin2@tg-login1:~> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
418673.tg-login1 hqin2 batch_r R.pbs -- -- 16 -- 00:03 Q --
hqin2@tg-login1:~>
Nothing was in the output file. So, I modified the running line to "R -f example.R"
hqin2@tg-login1:~/test> ls
example.R R2.pbs
hqin2@tg-login1:~/test> ll
total 8
-rw-r--r-- 1 hqin2 mc48o9p 24 2015-01-13 22:33 example.R
-rw-r--r-- 1 hqin2 mc48o9p 199 2015-01-13 22:33 R2.pbs
hqin2@tg-login1:~/test> qsub R2.pbs
418692.tg-login1.blacklight.psc.teragrid.org
hqin2@tg-login1:~/test> qstat -u hqin2
tg-login1.blacklight.psc.teragrid.org:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
418692.tg-login1 hqin2 batch_r R2.pbs -- -- 16 -- 00:03 Q --
hqin2@tg-login1:~/test> cat R2.pbs
#!/bin/bash
#PBS -q batch
#PBS -l ncpus=16
#PBS -l walltime=0:03:00
source /usr/share/modules/init/bash
module load R
cd $PBS_O_WORKDIR
ja
#R --slave CMD BATCH ./example.R
R -f example.R
ja -chlst
hqin2@tg-login1:~/test> ll
total 16
-rw-r--r-- 1 hqin2 mc48o9p 24 2015-01-13 22:33 example.R
-rw-r--r-- 1 hqin2 mc48o9p 199 2015-01-13 22:33 R2.pbs
-rw------- 1 hqin2 mc48o9p 0 2015-01-13 23:13 R2.pbs.e418692
-rw------- 1 hqin2 mc48o9p 4905 2015-01-13 23:13 R2.pbs.o418692
hqin2@tg-login1:~/test> cat R2.pbs.o418692
R version 2.15.3 (2013-03-01) -- "Security Blanket"
Copyright (C) 2013 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> y = rnorm(10)
> print (y)
[1] -0.46271891 0.34547494 -0.97556883 -0.64659599 0.01052027 0.06472313
[7] 0.43858725 0.83961732 -0.74945123 0.15012829
>
Job Accounting - Command Report
===============================
Command Started Elapsed User CPU Sys CPU CPU Block I/O Swap In CPU MEM Characters Logical I/O CoreMem VirtMem Ex
Name At Seconds Seconds Seconds Delay Secs Delay Secs Delay Secs Avg Mbytes Read Written Read Write HiValue HiValue St Ni Fl SBU's
=============== ======== ========== ========== ========== ========== ========== ========== ========== ========= ========= ======== ======== ======== ======== === === == =======
# CFG ON( 1) ( 7) 23:13:32 01/13/2015 System: Linux bl0.psc.teragrid.org 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64
ja 23:13:32 0.31 0.00 0.00 0.00 0.00 0.00 0.85 0.019 0.000 19 3 1064 23780 0 0 0.00
uname 23:13:32 0.00 0.00 0.00 0.00 0.00 0.00 12.64 0.004 0.000 8 1 664 5316 0 0 0.00
R 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.000 0.000 0 1 884 12616 0 0 F 0.00
sed 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.004 0.000 10 1 816 5396 0 0 0.00
R 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.000 0.000 0 1 888 12616 0 0 F 0.00
sed 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.004 0.000 10 1 812 5396 0 0 0.00
R 23:13:32 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.000 0.000 0 0 856 12612 0 0 F 0.00
rm 23:13:33 0.01 0.00 0.00 0.00 0.00 0.00 0.96 0.012 0.000 20 0 712 5336 0 0 0.00
R 23:13:33 0.35 0.22 0.08 0.00 0.00 0.00 70.16 4.166 0.001 190 25 32412 75240 0 0 0.00
Job CSA Accounting - Summary Report
====================================
Job Accounting File Name : /dev/tmpfs/418692/.jacct65df3
Operating System : Linux bl0.psc.teragrid.org 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64
User Name (ID) : hqin2 (51231)
Group Name (ID) : mc48o9p (15132)
Project Name (ID) : ? (0)
Job ID : 0x65df3
Report Starts : 01/13/15 23:13:32
Report Ends : 01/13/15 23:13:33
Elapsed Time : 1 Seconds
User CPU Time : 0.2200 Seconds
System CPU Time : 0.1090 Seconds
CPU Time Core Memory Integral : 5.2741 Mbyte-seconds
CPU Time Virtual Memory Integral : 15.2699 Mbyte-seconds
Maximum Core Memory Used : 31.6523 Mbytes
Maximum Virtual Memory Used : 73.4766 Mbytes
Characters Read : 4.2103 Mbytes
Characters Written : 0.0012 Mbytes
Logical I/O Read Requests : 257
Logical I/O Write Requests : 33
CPU Delay : 0.0030 Seconds
Block I/O Delay : 0.0002 Seconds
Swap In Delay : 0.0000 Seconds
Number of Commands : 9
System Billing Units : 0.0000
hqin2@tg-login1:~/test>
Note: I compared today's R.pbs with job1.sh on 20150112
the line "source /usr/share/modules/init/bash" seems to be critical. It make sure that "module" can be recognized.
the line "source /usr/share/modules/init/bash" seems to be critical. It make sure that "module" can be recognized.
See: XSEDE note on 20150111 http://hongqinlab.blogspot.com/2015/01/qsub-usage.html
No comments:
Post a Comment