====== Submitting jobs on the batch system of Tier3 ======

This page summarizes the basic commands to submit jobs on the batch system of Tier3.

Find more information on the PSI's CMS Tier3 twiki:
  * [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToSubmitJobs| How to submit jobs ]]
  * [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToDebugJobs| How to debug jobs ]]
  * [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToAccessSe| How to access the storage element for large files]]

 ===== Simple example =====

Say you have a [[computing:batch:scriptexample|python script]] in ''~/test'' that requires a ''CMSSW environment'' and some input ''foo'' and you would normally execute from the command line with

<code bash>
python myAnalysis.py foo
</code>

to produce some output file ''tree_foo.root''. Then the most straightforward way to submit it as a job is with a shell script of the form

<code python submitAnalysis.sh>
#!/bin/bash

# set variables
INPUT=$1
OUTFILE="tree_${INPUT}.root"
JOBDIR=$OUTFILE
BASEDIR=$HOME/test
OUTDIR=$BASEDIR/$JOBDIR
TOPWORKDIR=/scratch/$USER
WORKDIR=$TOPWORKDIR/$JOBDIR

# set CMSSW environment for script
source /afs/cern.ch/cms/cmsset_default.sh
cd $BASEDIR/CMSSW_5_3_24/src
eval `scram runtime -sh`

# run script on working node's scratch
mkdir -p $WORKDIR
cd $WORKDIR
python $BASEDIR/myAnalysis.py $INPUT

# copy output back
mkdir -p $OUTDIR
cp $WORKDIR/$OUTFILE $OUTDIR/$OUTFILE
rm -rf $WORKDIR
exit 0
</code>

which you submit to run on the batch system with
<code bash>
qsub submitAnalysis.sh foo 
</code>

<wrap tip>__Note for beginners__</wrap>: ''shell'' variables are assigned //without// spaces, and can be accessed with the ''$'' sign. Arguments passed to a shell script are accessed with ''$1'', ''$2'', ... Example of basic syntax:
<code bash>
VAR="Hello"
echo $VAR
echo "$VAR, World"
echo "${VAR}_World"

MY_NAME=`whoami`
echo "$MY_NAME, you are here: `pwd`"
</code>

<wrap tip>__Note for beginners__</wrap>: You need the ''CMSSW'' environment to have ''ROOT''. To get some release, e.g. ''CMSSW_5_3_24'', and initiate it do
<code bash>
source $VO_CMS_SW_DIR/cmsset_default.sh
cmsrel CMSSW_5_3_24
cd CMSSW_5_3_24/src
cmsenv
</code>


 ===== Queues =====

Depending on how long your script needs to run, you might need to choose a different [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToSubmitJobs#Queues|queue]]. Specify the queue with the ''-q'' option. The default queue is ''all.q''.
<code bash>
qsub -q long.q submitAnalysis.sh
</code>
 ===== Managing jobs =====

The jobs can be monitored with the command ''qstat'':
  * ''r'' means running,
  * ''qw'' means waiting in queue,
  * ''E'' means in error state.

More details on a job can be found with ''qstat -j <//jobid//>''. If your jobs are named, you can also use ''qstat -j <//jobname//>'' which might even contain a wildcard '*'.
 ==== Change order of jobs ====

Change the order of submission of the jobs waiting in the queue with ''qalter -js <//jobshare//> <//jobid//>''. The default job share is ''0'' and any integer value (e.g. ''100'') will give the specified job a higher priority. The higher the value, the higher the priority.


 ==== Delete ====


Furthermore jobs can be deleted with ''qdel <//jobid//>''. To delete //all// your jobs, use ''qdel -u <//username//>''.

<wrap tip>__Protip__</wrap>: Quickly count the number of job that are running or in queue with ''grep'':
<code bash>
qstat | grep " r " | wc -l
qstat | grep wq | wc -l
</code>

To learn more details about one specific job, use ''qstat -j <//jobid//>''. In case it is in an error state, use ''qstat -explain E -j <//jobid//>''.
 ===== Debugging =====

 ==== Debugging jobs interactively ====

The T3 TWiki has a page with information on [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToDebugJobs|debugging jobs interactively]] with the ''qlogin'' command:
<code bash>
qlogin -q debug.q -l hostname=t3wn22 -l h_vmem=400M
</code>
 ==== Redirecting standard output and error streams ===
If you want to isolate and save the standard output and the standard error streams (//stdout// and //stderr//) of your main script that would normally prompt in the Terminal window, you can redirect it as usual with ''>>'' and ''2>>'':

<code>
python $BASEDIR/myAnalysis.py $INPUT >> myout.txt 2>>myerr.txt
</code>

and then copy the text files back to where you want them.

 ==== Log files location ===

By default, ''qsub'' will create two log files; one with the stdout and one with stderr of the submission script. It will be saved in your home directory ''~/'' with names of the form ''<//jobname//>.o<//jobid//>'' and ''<//jobname//>.e<//jobid//>'', respectively.
To save these log files in a custom location, use ''-o <//path//>'' and ''-e <//path//>'', for example:

<code>
qsub -o $HOME/jobs_reports -e $HOME/jobs_reports submitAnalysis.sh foo
</code>

To keep your submission command short, you can also put the ''-o'' and ''-e'' options in your submit script instead, using the following syntax with ''#$'':
<code>
#$ -o /shome/myusername/jobs_reports/
#$ -e /shome/myusername/jobs_reports/
</code>

<wrap important>__Nota bene!__</wrap>
Make sure these directory already exists before running the script with ''qsub''. Also, use absolute paths without anything that needs to be expanded or evaluated (like ''~/'', ''$HOME'' or ''`whoami`'').


 ==== Job name ===
Another ''qsub'' option that is very useful to keep track of your logfiles is the ''-N <//jobname//>'' option. This flag allows you to set the name of each job, which shows up in the ''qstat'' output and in the name of the log files described above. By default, the jobname is the name of the shell script, ''submitAnalysis.sh'' in the example above.
 ===== Storage element =====

You can also copy large output files to the storage element using the copy command **''lcg-cp''** or the recommended **''xrdcp''** from XROOTD, see [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToAccessSe|the TWiki on how To Acces SE]] or [[computing:storage|our page on the storage element]].

<code>
USER_SE_HOME="srm://t3se01.psi.ch:8443/srm/managerv2?SFN=/pnfs/psi.ch/cms/trivcat/store/user/$USER"
SERESULTDIR=$USER_SE_HOME/"analysis"
lcg-cp -b -D srmv2 file:$WORKDIR/$OUTFILE $SERESULTDIR/$OUTFILE
</code>

or

<code>
USER_SE_HOME="root://t3dcachedb.psi.ch:1094//pnfs/psi.ch/cms/trivcat/store/user/$USER"
SERESULTDIR=$USER_SE_HOME/"analysis"
xrdcp -f $WORKDIR/$OUTFILE $SERESULTDIR/$OUTFILE
</code>

Note that in these examples you might need to create the necessary parent directories (''analysis'' in this example) on you SE home if it doesn't exist yet:
<code>
gfal-mkdir -p gsiftp://t3se01.psi.ch//pnfs/psi.ch/cms/trivcat/store/user/$USER/analysis
</code>
or more generally
<code>
gfal-mkdir -p gsiftp://t3se01.psi.ch/`echo $SERESULTDIR | grep -o '/pnfs/psi.ch/.*'`
</code>
 ===== Complete example =====

A complete example of a bash script to submit the jobs is [[computing:batch:jobscriptexample|submitExample.sh]]

Variables to be configured:
  *  **SEOUTFILES** : name of the output file. It has to be the same as in the python script that you want to run with cmsRun
  * **HN_NAME** : your username as set in your psi account
  * **CMSSW_DIR** : CMSSW directory
  * **CMSSW_CONFIG_FILE** : the path and the name of the python script you want to run with cmsRun

The bash script can be run with **''qsub''**: 

  qsub example_job.sh


 ===== Splitting jobs =====

If you want to split the events on several jobs you can do it manually as in the example [[computing:batch:splitjobexample|example_splitjobs.py]] and run it with **python**. In this example the command line inputs are maxEvents, firstEvent, inputFileNames and the seed for PU simulation. This works if you first made the CMSSW python script configurable. That can be done following this link [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCommandLineParsing|Command line option parsing]].


 ===== Monitoring busyness on the batch system =====


<wrap tip>__Protip__</wrap>:You can see how busy the batch system is due to other users with this command:<code bash>
qstat -u \* | tail -n +3 | awk '{if($5=="r"){r[$4]++} j[$4]++} END { for(n in j){ if(r[n]==""){ r[n]=0 } printf "%7s / %-5s - %s\n",r[n],j[n],n }}'
</code>