====== Getting started with Ganga ======
===== Introduction =====
Ganga is a system for job submission to different backends.
===== Where to run =====
Ganga can be run on every lxplus node. The preferred option however is to run it locally from Zurich on grid-ui, as there are no (less) disk quota constraints.
===== Ganga configuration on Zurich Grid Cluster =====
Before starting Ganga the first time on the Zurich cluster, download and install the following [[http://lhcb.physik.uzh.ch/wiki/.gangarc|.gangarc]] file in your home directory. This will create the correct Ganga environement for you.
===== Starting Ganga on lxplus / grid-ui =====
To set the environment:
SetupProject Ganga
The following line is optional. Only needed if you want a fresh .gangarc file.
ganga -g
To start Ganga:
ganga
You can also start Ganga with a GUI:
ganga --gui
===== Create JobTemplate in Ganga =====
To create a JobTemplate object called "t", and enter the DaVinci version you are using:
t = JobTemplate( application = DaVinci( version = "v20r3" ))
The next code is optional in case you want to get packages:
t.application.getpack( "Tutorial/Analysis v7r6" )
t.application.getpack( "Phys/DaVinci v20r3" )
Define your masterpackage:
t.application.masterpackage = "Phys/DaVinci"
Define your options file:
t.application.optsfile = [ "~/cmtuser/DaVinci_v20r3/Tutorial/Analysis/options/BsKK2Feb09.py" ]
Compile the application (only needed if you need to compile the libraries for a different architecture or you have uncompiled code):
t.application.make()
===== Create a job in Ganga =====
Create a job with your JobTemplate "t", specify the backend (where you want to run your job):
* Interactive() to run interactively
* Local() to run on the local lxplus node
* LSF() to run on lxbatch -> specify a queue otherwise your job gets eventually killed before it is finished
* PBS() to run on the batch system in Zurich
* Dirac() to run on the GRID
possible LSF queues:
* 8nm (8 normalised minutes)
* 1nh (1 normalised hour)
* 8nh (8 normalised hours)
* 1nd (1 normalised day)
* 1nw (1 normalised week)
j = Job( t, backend = LSF( queue = '1nd' ) )
submit the job:
j.submit()
===== Forcing a job into status completed or failed =====
For example for job 396:
jobs(396).force_status('failed', force=True)
Instead of 'failed' one can choose 'completed'.
===== How to resubmit a stalled job or subjob =====
Copy the job or subjob and submit it.
j = jobs(jobNumber).subjobs(subJobNumber).Copy()
j.submit()
===== How to resubmit all failed subjobs =====
Sometimes it is also useful to ban a site and resubmit the jobs.
for js in jobs(24).subjobs.select(status='failed').ids():
js.backend.settings['BannedSites'] = ['LCG.CERN.ch'] #optional
js.resubmit()
or
for job in jobs.select(5,11):
for js in job.subjobs:
if js.status=='failed':
js.backend.settings['BannedSites'] = ['LCG.CERN.ch'] #optional
js.resubmit()
===== How to reset the monitoring =====
Sometimes the monitoring can get stuck and your jobs stays (f.ex.) in the ''completing'' state forever. You can reset the monitoring for a specific job with:
jobs(x).subjobs(y).backend.reset()
===== Writing a script for Ganga =====
Instead of typing the above lines in the interactive mode in Ganga, you can also write a (python)-script, that does all this for you, e.g.:
#This is gangasub.py
t = JobTemplate( application = DaVinci( version = "v20r3" ))
t.application.masterpackage = "Phys/DaVinci"
t.application.optsfile = [ "~/cmtuser/DaVinci_v20r3/Tutorial/Analysis/options/BsKK2Feb09.py" ]
t.application.make()
j = Job( t, backend = LSF( queue = '1nd' ) )
j.submit()
You then can execute the script from commandline by typing:
ganga -i gangasub.py
The ''-i'' is for interactive and will start the interactive ganga shell after the job submission.
Alternatively you can execute the script from within a running ganga session:
%run gangasub.py
(This sometimes may lead to trouble...)
===== Using inputdata =====
The recommended way of specifying the data files to run over is by using inputdata.
If you want to select the data in the bookkeeping use:
j.inputdata = browseBK()
Alternatively you can load data from a file:
%run inputData.py
j.inputdata = inputPFN
# or
#j.inputdata = inputLFN
With
# for Physical File Names
inputPFN = [
"PFN:castor://castorlhcb.cern.ch:9002//castor/cern.ch/grid/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst",
"PFN:castor://castorlhcb.cern.ch:9002//castor/cern.ch/grid/lhcb/MC/MC09/DST/00004831/0000/00004831_00000002_1.dst",
]
# or for Logical File Names
inputLFN = [
"LFN:/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst",
"LFN:/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst",
]
===== Input Sandbox =====
If you need to send a file with the job (f.ex. a new private database), you can specify these files in the input sandbox.
j.inputsandbox = ['myfile1.txt','myfile2.py']
where j is your job. You can access these files in your job-options without giving a subdirectory, for example:
myclass.input = "myfile1.txt"
Note: You don't need to put your private libraries (if you have modified a package) in the input sandbox. Ganga does this automatically for you.
===== Output =====
There are two sort of output locations:
***Sandbox**: Used for small output (N-tuples, text-files). These files are normally sent back to the user no matter where the job was executed.
***Output**: Used for larger files, e.g. DSTs.
If the output of your job is a root-file created with NTupleSvc (in a Gaudi (DaVinci, Gauss) job), it will be automatically stored in the Output sandbox. This is also true for all files created with HistogramPersistencySvc and MicroDSTStream. Files created with GaussTape, DigiWriter and DstWriter are automatically added to the output.
To change this list, open your .gangarc-file in your home-directory, uncomment the corresponding line and edit the list.
**Note**: In DIRAC, all files larger than 10 MB are put into the output, no matter what was specified before.
**Note**: With backend = LSF, if you want to have e.g. the nTuple **only** in the output-directory: Remove the ''NTupleSvc'' in the list ''outputsandbox_types'' in your .gangarc-file (Is this true??? Should be like this, but is not yet true (10th of May 2010) but to get around this do job.outputdata = ["*.root"] then the rootfile goes always to castor no matter how small it is).
===== Additional Output =====
If you want to save an output that is not written by the mentioned services above (e.g. a txt-file, written by a self-written configurable), you have to add it either to the output or the sandbox. For instance:
j.outputdata = [ 'mytextfile1.txt' ] # if you want to return data via the outputdir
j.outputsandbox = [ 'mytextfile2.txt' ] # if you want to return data via the outputsandbox
**Note**: If you have a list of files in outputdata and not all of them exist, Dirac will not let you download any of the existing.
* From Ganga v505r14 on you can use wildcards to in the outputdata, f.ex.
j.outputdata = ["*.root", "*.dst"]
===== Setting the Output Directory =====
When running on the **LSF**, your files will be stored under ''$CASTOR_HOME/gangadir//outputdata/''. You can change this by modifing the corresponding line in the .gangarc-file.
When running with **Dirac**, your files will be registered in the LFC (logical file catalog) and are stored under LFN:/lhcb/user////.
The can be obtained by typing ''jobs(jobnumber).backend.id''
More information on how to retrieve your file having a logical filename [[grid:findOnGrid | here]].
===== Where to find the output =====
The sandbox output is by default in ~/gangadir/workspace///username///LocalAMGA///jobID///output/.
Here an example for user abuechle and jobID = 120:
~/gangadir/workspace/abuechle/LocalAMGA/120/output/
In the output folder you find typically the following things:
* (optional) foo.root: some root file which was created by your job for example holding an NTuple
* jobstatus: having the information about the start, stop, the queue and the exit code of your job
* stderr: lists the errors
* stdout: lists the output (not when you run with Interactive backend)
===== Retrieving the output from Dirac =====
In ganga, you can get the output from Dirac the following way:
jobs(yourJobNumber).backend().X
where X can be:
* getOutputData(): Retrieve data stored on SE and download it to the job output workspace (in your gangadir-folder).
* getOutputDataLFNs().getReplicas(): Get a list of outputdata that has been uploaded by Dirac.
* getOutputSandbox(): Retrieve output sandbox and download it to the job output workspace.
This information can also be found when typing help(Dirac) in the ganga prompt.
To store all the LFNs of the data in a file, write the following in a script (f.ex. ''myScript.py'')
import sys
jobNumber = int(sys.argv[1])
length = len(jobs(jobNumber).subjobs)
file = open('LFNs.txt','w')
for i in range(0,length):
output = jobs(jobNumber).subjobs(i).backend.getOutputDataLFNs()
if( output.hasLFNs() == True):
s = str(jobs(jobNumber).subjobs(i).backend.getOutputDataLFNs().getReplicas().keys())
print s
file.write(s)
file.write('\n')
file.close()
and start ganga with:
ganga -i myScript.py jobnumber
which will write the output in the file ''LFNs.txt''.
You can then download the files with the bash-script copyGrid2.sh, which allows to download multiple files per job (/home/hep/decianm/scripts/copyGrid2.sh)
You can also download the output directly in ganga using:
import os
for sj in jobs(jobnumber).subjobs:
if sj.status == "completed":
if False in map(lambda x:os.path.exists(sj.outputdir+x),sj.outputdata.files):
sj.backend.getOutputData()
===== Copying output from Grid-jobs to CASTOR =====
For every job / subjob, do:
ds=j.backend.getOutputDataLFNs()
ds.replicate('CERN-USER')
The files will then be under ''/castor/cern.ch/grid/lhcb/user/initial/name/''
===== User defined methods =====
You can create a file called ''.ganga.py'', place it in your home directory and define methods therein. For example
def resubmit_jobs(jobnumber, BannedSites):
for js in jobs(jobnumber).subjobs:
if js.status=='failed':
js.backend.settings['BannedSites'] = [ BannedSites ] #optional
js.resubmit()
will resubmit all failed jobs, but not to the banned site. In ganga, you can call the method with:
resubmit_jobs(123, 'LCG.CERN.CH')
If you want to be really fancy, you can also overwrite predefined functions of ganga...
===== More useful commands =====
To execute a shell command from the ganga command line just put a **!** in front of your command:
! ls ~/gangadir/workspace/abuechle/LocalAMGA/120/output/
To look what your jobs are doing:
print jobs
For getting all the information about one particular job (i.e. job 120):
full_print(jobs(120))
To peek at stdout or stderr of a job (i.e. job 120) from the Ganga command line:
jobs(120).peek('stdout')
To find our what valid possibilities exists for example for backends:
plugins("backends")
To select a range of jobs, rather than only one, use (the ranges are inclusive):
jobs.select(lowerrange, upperrange)
===== Extra Options =====
You can also specify some extra options which override the options in your python options file.
j.application.extraopts='ApplicationMgr().EvtMax = 1000'
or multiple extraopts for example:
t.application.extraopts = 'DaVinci().EvtMax = 300;''DaVinci().PrintFreq = 1;'
===== Extra Option Files =====
You can specify more then one option file
j.application.optsfile = [ File( "~/example/myOpts.py"), File( "~/example/moreOpts.py") ]
j.application.optsfile.append( File( "~/example/evenMoreOpts.py") )
j.application.optsfile[3:] = [ File( "~/example/optsFour.py"), File( "~/example/optsFive.py") ]
===== Show CMT dependencies =====
To show in which directories ganga looks to check the packages the job depends on, type:
j.application.cmt('show uses')
===== Clearing the jobs()-list =====
If you have a lot of old jobs displayed, when typing jobs() at the ganga prompt, you can remove them with ''jobs(jobnumber).remove()''. To do this for a large list, you can use standard-python, e.g.:
mylist = range(firstnumber, lastnumber)
for i in mylist: jobs(i).remove()
Of course, you can also do:
jobs.select(firstnumber,lastnumber).remove()
===== Splitting Jobs =====
When you have to process a large amount of data, it is possible to split your job into several subjobs (rather than submitting several jobs with different input).
To split a job, you can use:
j.splitter=SplitByFiles(filesPerJob=25, maxFiles=500)
which will process 500 inputfiles in total where every subjob processed 25 input files. If the second argument is not specified, all available files will be used.
===== Submitting a Job on the local batch system in Zurich (PBS) =====
You can also submit a job to the local batch system (SLC5) in Zurich. Mandatory are:
j = Job( t, backend = PBS() )
j.backend.extraopts = "-l cput=hh:mm:ss"
where cput is the CPU time in hours:minutes:seconds. The other options should be identical to submitting to the LSF.
You can also pass a variable to your job script (which may be more useful when submitting directly via qsub, but anyway):
j.backend.extraopts = "-v myVariable=13"
where you set the variable "myVariable" in your job script to 13.
===== Submitting a Job with DIRAC =====
To submit a job with DIRAC on the GRID, you have to
*add:
t.application.platform = "slc4_ia32_gcc34"
for SLC4,
or
t.application.platform = "x86_64-slc5-gcc43-opt"
for SLC5
**NOTE:** Always make sure that you have compiled your own private libraries (for modified packages) for the correct platform, or do a ''t.application.make()'' in ganga. Otherwise your job will be submitted without complaint, but it will not find the libraries and therefore crash quite spectacularly.
Also **note**: There are two types of SLC4 libraries: ''slc4_ia32_gcc34'' and ''slc4_amd64_gcc34''. They are not the same nor compatible and your job won't run.
When submitting the job.
*restrict your dataset to 100 files.
*use LFNs (instead of PFNs) for your input files.
You may also add the minimum CPU time (in seconds!) that your job will be using. For this, write:
j = Job( t, backend = Dirac( CPUTime=600 ) )
Jobs with a lower CPUTime should start faster than other ones.
Dirac does not use ''SplitByFiles'' but a ''DiracSplitter''.
j.splitter=DiracSplitter(filesPerJob=20)
which will take 20 files per job.
===== Submitting a ROOT-job =====
It can be useful to submit ROOT-jobs to the Grid via Dirac. You can submit ''.C''-ROOT-scripts or python scripts. If your script needs external libraries, they can be added to the inputsandox like
j = Job()
j.name = 'my great ROOT job'
j.inputsandbox = [ File ( name = '/somePath/lib1.so'), File ( name = '/somePath2/lib2.so')]
You can also pass arguments to a script. Suppose you have a script ''myScript.C'' like:
void myScript(int a, int b){
std::cout << "a: " << a " b: " << b << std::endl;
// -- Do something useful
}
and you would like to pass intput arguments for a and b. This can be done with the following piece of code:
j.application = Root (
version = '5.34.00' ,
usepython = False ,
args = [1,2],
script = File ( name = '/somePath/myScript.C' ) )
Furthermore it is very useful to run subjobs with different arguments for the script. For this you need the ArgSplitter:
argList = [[1,2], [2,3]]
argSplit = ArgSplitter( args = argList )
j.splitter = argSplit
which will generate two subjobs with the arguments (1,2) and (2,3).
A complete example therefore would look like:
argList=[]
for ia in range(1,5):
for ib in range(1,7):
argList.append([ ia, ib])
j = Job()
j.name = 'my great ROOT job'
j.inputsandbox = [ File ( name = '/somePath/lib1.so'), File ( name = '/somePath2/lib2.so')]
j.application = Root (
version = '5.34.00' ,
usepython = False ,
script = File ( name = '/somePath/myScript.C' ) )
argSplit = ArgSplitter( args = argList )
j.splitter = argSplit
j.submit()
*Note: Due to a weird bug (1.8.2012) 0 is not allowed as an argument...
*Note: When submitting a ROOT-job to the LSF, make sure the Ganga version in the shell you submit the ganga-job from and the one requested in ''j.application'' agree. You can set the ROOT version for Ganga in the shell with (for example): ''SetupProject Ganga v508r6 ROOT -v 5.34.00''
===== Forcing a job to run on a specific site =====
For this, do:
j.backend.diracOpts = 'j.setDestination("LCG.CERN.ch")'
Note that this is not the desired behaviour, as it is not what grid computing is about, but can be useful sometimes.
===== Job fails after many reschedulings =====
If your grid-certificate is close to expiring, it is possible that no proxy can be created for the full job duration. If you already have a new grid cert, you have to upload it to Dirac. This is done via:
SetupProject LHCbDirac
dirac-proxy-init -g lhcb_user
===== Bookkeeping information within Ganga =====
A simple method that can be added to the ~/.ganga.py to access information directly from the BK can be seen below:
def getBKInfo ( evttype ) :
from subprocess import Popen, PIPE
serr = open ( '/dev/null' )
pipe = Popen ( [ 'get_bookkeeping_info' , str(evttype) ] ,
env = os.environ ,
stdout = PIPE ,
stderr = serr )
stdout = pipe.stdout
ts = {}
result = {}
for line in stdout :
try :
value = eval ( line )
except :
continue
if not isinstance ( value , tuple ) : continue
if not isinstance ( value[0] , str ) : continue
if not isinstance ( value[1] , str ) : continue
if not isinstance ( value[2] , str ) : continue
if result.has_key ( value[0] ) : continue
result [ value[0] ] = value[1:]
return result
In this case two additional files 'get_bookkeeping_info' and 'dirac-bookkeeping-get-prodinfo-eventtype.py' are required to be saved locally in your ~/bin/ directory.
More info can be found here: [[https://groups.cern.ch/group/lhcb-bender/Lists/Archive/DispForm.aspx?ID=551]]
and here: [[https://groups.cern.ch/group/lhcb-bender/Lists/Archive/DispForm.aspx?ID=695]]
===== Help =====
To see the documentation, type ''help()'' in the interactive mode. To get help about a specific topic, e.g. Dirac, type ''help(Dirac)'' in the interactive mode.
===== The N Commandments when working with Ganga =====
*Thou shall be patient.
*Thou shall never use PFNs when thou needest LFNs.
*Thou shall write an email to ''lhcb-distributed-analysis@cern.ch'', if thou art desperate. Thou shall wait patiently for a reply and thy wisdom will flourish.
*Thou shall start a test-job before running the full job. The ways of making mistakes are manifold.
*Thou shall always check if thy output will be bearable for thy quota.
===== Links =====
Information for using ganga in LHCb (and where I stole most of the information from): [[http://ganga.web.cern.ch/ganga/user/html/LHCb/]]
Ganga/Dirac mailing list archive: [[https://groups.cern.ch/group/lhcb-distributed-analysis/Lists/Archive/100.aspx]]