Ganga is a system for job submission to different backends.
Ganga can be run on every lxplus node. The preferred option however is to run it locally from Zurich on grid-ui, as there are no (less) disk quota constraints.
Before starting Ganga the first time on the Zurich cluster, download and install the following .gangarc file in your home directory. This will create the correct Ganga environement for you.
To set the environment:
SetupProject Ganga
The following line is optional. Only needed if you want a fresh .gangarc file.
ganga -g
To start Ganga:
ganga
You can also start Ganga with a GUI:
ganga --gui
To create a JobTemplate object called “t”, and enter the DaVinci version you are using:
t = JobTemplate( application = DaVinci( version = "v20r3" ))
The next code is optional in case you want to get packages:
t.application.getpack( "Tutorial/Analysis v7r6" ) t.application.getpack( "Phys/DaVinci v20r3" )
Define your masterpackage:
t.application.masterpackage = "Phys/DaVinci"
Define your options file:
t.application.optsfile = [ "~/cmtuser/DaVinci_v20r3/Tutorial/Analysis/options/BsKK2Feb09.py" ]
Compile the application (only needed if you need to compile the libraries for a different architecture or you have uncompiled code):
t.application.make()
Create a job with your JobTemplate “t”, specify the backend (where you want to run your job):
possible LSF queues:
j = Job( t, backend = LSF( queue = '1nd' ) )
submit the job:
j.submit()
For example for job 396:
jobs(396).force_status('failed', force=True)
Instead of 'failed' one can choose 'completed'.
Copy the job or subjob and submit it.
j = jobs(jobNumber).subjobs(subJobNumber).Copy() j.submit()
Sometimes it is also useful to ban a site and resubmit the jobs.
for js in jobs(24).subjobs.select(status='failed').ids(): js.backend.settings['BannedSites'] = ['LCG.CERN.ch'] #optional js.resubmit()
or
for job in jobs.select(5,11): for js in job.subjobs: if js.status=='failed': js.backend.settings['BannedSites'] = ['LCG.CERN.ch'] #optional js.resubmit()
Sometimes the monitoring can get stuck and your jobs stays (f.ex.) in the completing
state forever. You can reset the monitoring for a specific job with:
jobs(x).subjobs(y).backend.reset()
Instead of typing the above lines in the interactive mode in Ganga, you can also write a (python)-script, that does all this for you, e.g.:
#This is gangasub.py t = JobTemplate( application = DaVinci( version = "v20r3" )) t.application.masterpackage = "Phys/DaVinci" t.application.optsfile = [ "~/cmtuser/DaVinci_v20r3/Tutorial/Analysis/options/BsKK2Feb09.py" ] t.application.make() j = Job( t, backend = LSF( queue = '1nd' ) ) j.submit()
You then can execute the script from commandline by typing:
ganga -i gangasub.py
The -i
is for interactive and will start the interactive ganga shell after the job submission.
Alternatively you can execute the script from within a running ganga session:
%run gangasub.py
(This sometimes may lead to trouble…)
The recommended way of specifying the data files to run over is by using inputdata. If you want to select the data in the bookkeeping use:
j.inputdata = browseBK()
Alternatively you can load data from a file:
%run inputData.py j.inputdata = inputPFN # or #j.inputdata = inputLFN
With
# for Physical File Names inputPFN = [ "PFN:castor://castorlhcb.cern.ch:9002//castor/cern.ch/grid/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst", "PFN:castor://castorlhcb.cern.ch:9002//castor/cern.ch/grid/lhcb/MC/MC09/DST/00004831/0000/00004831_00000002_1.dst", ] # or for Logical File Names inputLFN = [ "LFN:/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst", "LFN:/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst", ]
If you need to send a file with the job (f.ex. a new private database), you can specify these files in the input sandbox.
j.inputsandbox = ['myfile1.txt','myfile2.py']
where j is your job. You can access these files in your job-options without giving a subdirectory, for example:
myclass.input = "myfile1.txt"
Note: You don't need to put your private libraries (if you have modified a package) in the input sandbox. Ganga does this automatically for you.
There are two sort of output locations:
If the output of your job is a root-file created with NTupleSvc (in a Gaudi (DaVinci, Gauss) job), it will be automatically stored in the Output sandbox. This is also true for all files created with HistogramPersistencySvc and MicroDSTStream. Files created with GaussTape, DigiWriter and DstWriter are automatically added to the output. To change this list, open your .gangarc-file in your home-directory, uncomment the corresponding line and edit the list.
Note: In DIRAC, all files larger than 10 MB are put into the output, no matter what was specified before.
Note: With backend = LSF, if you want to have e.g. the nTuple only in the output-directory: Remove the NTupleSvc
in the list outputsandbox_types
in your .gangarc-file (Is this true??? Should be like this, but is not yet true (10th of May 2010) but to get around this do job.outputdata = [“*.root”] then the rootfile goes always to castor no matter how small it is).
If you want to save an output that is not written by the mentioned services above (e.g. a txt-file, written by a self-written configurable), you have to add it either to the output or the sandbox. For instance:
j.outputdata = [ 'mytextfile1.txt' ] # if you want to return data via the outputdir j.outputsandbox = [ 'mytextfile2.txt' ] # if you want to return data via the outputsandbox
Note: If you have a list of files in outputdata and not all of them exist, Dirac will not let you download any of the existing.
j.outputdata = ["*.root", "*.dst"]
When running on the LSF, your files will be stored under $CASTOR_HOME/gangadir/<j.id>/outputdata/
. You can change this by modifing the corresponding line in the .gangarc-file.
When running with Dirac, your files will be registered in the LFC (logical file catalog) and are stored under LFN:/lhcb/user/<initial>/<username>/<diracid>/<filename>.
The <diracid> can be obtained by typing jobs(jobnumber).backend.id
More information on how to retrieve your file having a logical filename here.
The sandbox output is by default in ~/gangadir/workspace/username/LocalAMGA/jobID/output/. Here an example for user abuechle and jobID = 120:
~/gangadir/workspace/abuechle/LocalAMGA/120/output/
In the output folder you find typically the following things:
In ganga, you can get the output from Dirac the following way:
jobs(yourJobNumber).backend().X
where X can be:
This information can also be found when typing help(Dirac) in the ganga prompt.
To store all the LFNs of the data in a file, write the following in a script (f.ex. myScript.py
)
import sys jobNumber = int(sys.argv[1]) length = len(jobs(jobNumber).subjobs) file = open('LFNs.txt','w') for i in range(0,length): output = jobs(jobNumber).subjobs(i).backend.getOutputDataLFNs() if( output.hasLFNs() == True): s = str(jobs(jobNumber).subjobs(i).backend.getOutputDataLFNs().getReplicas().keys()) print s file.write(s) file.write('\n') file.close()
and start ganga with:
ganga -i myScript.py jobnumber
which will write the output in the file LFNs.txt
.
You can then download the files with the bash-script copyGrid2.sh, which allows to download multiple files per job (/home/hep/decianm/scripts/copyGrid2.sh)
You can also download the output directly in ganga using:
import os for sj in jobs(jobnumber).subjobs: if sj.status == "completed": if False in map(lambda x:os.path.exists(sj.outputdir+x),sj.outputdata.files): sj.backend.getOutputData()
For every job / subjob, do:
ds=j.backend.getOutputDataLFNs() ds.replicate('CERN-USER')
The files will then be under /castor/cern.ch/grid/lhcb/user/initial/name/
You can create a file called .ganga.py
, place it in your home directory and define methods therein. For example
def resubmit_jobs(jobnumber, BannedSites): for js in jobs(jobnumber).subjobs: if js.status=='failed': js.backend.settings['BannedSites'] = [ BannedSites ] #optional js.resubmit()
will resubmit all failed jobs, but not to the banned site. In ganga, you can call the method with:
resubmit_jobs(123, 'LCG.CERN.CH')
If you want to be really fancy, you can also overwrite predefined functions of ganga…
To execute a shell command from the ganga command line just put a ! in front of your command:
! ls ~/gangadir/workspace/abuechle/LocalAMGA/120/output/
To look what your jobs are doing:
print jobs
For getting all the information about one particular job (i.e. job 120):
full_print(jobs(120))
To peek at stdout or stderr of a job (i.e. job 120) from the Ganga command line:
jobs(120).peek('stdout')
To find our what valid possibilities exists for example for backends:
plugins("backends")
To select a range of jobs, rather than only one, use (the ranges are inclusive):
jobs.select(lowerrange, upperrange)
You can also specify some extra options which override the options in your python options file.
j.application.extraopts='ApplicationMgr().EvtMax = 1000'
or multiple extraopts for example:
t.application.extraopts = 'DaVinci().EvtMax = 300;''DaVinci().PrintFreq = 1;'
You can specify more then one option file
j.application.optsfile = [ File( "~/example/myOpts.py"), File( "~/example/moreOpts.py") ] j.application.optsfile.append( File( "~/example/evenMoreOpts.py") ) j.application.optsfile[3:] = [ File( "~/example/optsFour.py"), File( "~/example/optsFive.py") ]
To show in which directories ganga looks to check the packages the job depends on, type:
j.application.cmt('show uses')
If you have a lot of old jobs displayed, when typing jobs() at the ganga prompt, you can remove them with jobs(jobnumber).remove()
. To do this for a large list, you can use standard-python, e.g.:
mylist = range(firstnumber, lastnumber) for i in mylist: jobs(i).remove()
Of course, you can also do:
jobs.select(firstnumber,lastnumber).remove()
When you have to process a large amount of data, it is possible to split your job into several subjobs (rather than submitting several jobs with different input). To split a job, you can use:
j.splitter=SplitByFiles(filesPerJob=25, maxFiles=500)
which will process 500 inputfiles in total where every subjob processed 25 input files. If the second argument is not specified, all available files will be used.
You can also submit a job to the local batch system (SLC5) in Zurich. Mandatory are:
j = Job( t, backend = PBS() ) j.backend.extraopts = "-l cput=hh:mm:ss"
where cput is the CPU time in hours:minutes:seconds. The other options should be identical to submitting to the LSF.
You can also pass a variable to your job script (which may be more useful when submitting directly via qsub, but anyway):
j.backend.extraopts = "-v myVariable=13"
where you set the variable “myVariable” in your job script to 13.
To submit a job with DIRAC on the GRID, you have to
t.application.platform = "slc4_ia32_gcc34"
for SLC4, or
t.application.platform = "x86_64-slc5-gcc43-opt"
for SLC5
NOTE: Always make sure that you have compiled your own private libraries (for modified packages) for the correct platform, or do a t.application.make()
in ganga. Otherwise your job will be submitted without complaint, but it will not find the libraries and therefore crash quite spectacularly.
Also note: There are two types of SLC4 libraries: slc4_ia32_gcc34
and slc4_amd64_gcc34
. They are not the same nor compatible and your job won't run.
When submitting the job.
You may also add the minimum CPU time (in seconds!) that your job will be using. For this, write:
j = Job( t, backend = Dirac( CPUTime=600 ) )
Jobs with a lower CPUTime should start faster than other ones.
Dirac does not use SplitByFiles
but a DiracSplitter
.
j.splitter=DiracSplitter(filesPerJob=20)
which will take 20 files per job.
It can be useful to submit ROOT-jobs to the Grid via Dirac. You can submit .C
-ROOT-scripts or python scripts. If your script needs external libraries, they can be added to the inputsandox like
j = Job() j.name = 'my great ROOT job' j.inputsandbox = [ File ( name = '/somePath/lib1.so'), File ( name = '/somePath2/lib2.so')]
You can also pass arguments to a script. Suppose you have a script myScript.C
like:
void myScript(int a, int b){ std::cout << "a: " << a " b: " << b << std::endl; // -- Do something useful }
and you would like to pass intput arguments for a and b. This can be done with the following piece of code:
j.application = Root ( version = '5.34.00' , usepython = False , args = [1,2], script = File ( name = '/somePath/myScript.C' ) )
Furthermore it is very useful to run subjobs with different arguments for the script. For this you need the ArgSplitter:
argList = [[1,2], [2,3]] argSplit = ArgSplitter( args = argList ) j.splitter = argSplit
which will generate two subjobs with the arguments (1,2) and (2,3).
A complete example therefore would look like:
argList=[] for ia in range(1,5): for ib in range(1,7): argList.append([ ia, ib]) j = Job() j.name = 'my great ROOT job' j.inputsandbox = [ File ( name = '/somePath/lib1.so'), File ( name = '/somePath2/lib2.so')] j.application = Root ( version = '5.34.00' , usepython = False , script = File ( name = '/somePath/myScript.C' ) ) argSplit = ArgSplitter( args = argList ) j.splitter = argSplit j.submit()
j.application
agree. You can set the ROOT version for Ganga in the shell with (for example): SetupProject Ganga v508r6 ROOT -v 5.34.00
For this, do:
j.backend.diracOpts = 'j.setDestination("LCG.CERN.ch")'
Note that this is not the desired behaviour, as it is not what grid computing is about, but can be useful sometimes.
If your grid-certificate is close to expiring, it is possible that no proxy can be created for the full job duration. If you already have a new grid cert, you have to upload it to Dirac. This is done via:
SetupProject LHCbDirac dirac-proxy-init -g lhcb_user
A simple method that can be added to the ~/.ganga.py to access information directly from the BK can be seen below:
def getBKInfo ( evttype ) : from subprocess import Popen, PIPE serr = open ( '/dev/null' ) pipe = Popen ( [ 'get_bookkeeping_info' , str(evttype) ] , env = os.environ , stdout = PIPE , stderr = serr ) stdout = pipe.stdout ts = {} result = {} for line in stdout : try : value = eval ( line ) except : continue if not isinstance ( value , tuple ) : continue if not isinstance ( value[0] , str ) : continue if not isinstance ( value[1] , str ) : continue if not isinstance ( value[2] , str ) : continue if result.has_key ( value[0] ) : continue result [ value[0] ] = value[1:] return result
In this case two additional files 'get_bookkeeping_info' and 'dirac-bookkeeping-get-prodinfo-eventtype.py' are required to be saved locally in your ~/bin/ directory.
More info can be found here: https://groups.cern.ch/group/lhcb-bender/Lists/Archive/DispForm.aspx?ID=551 and here: https://groups.cern.ch/group/lhcb-bender/Lists/Archive/DispForm.aspx?ID=695
To see the documentation, type help()
in the interactive mode. To get help about a specific topic, e.g. Dirac, type help(Dirac)
in the interactive mode.
lhcb-distributed-analysis@cern.ch
, if thou art desperate. Thou shall wait patiently for a reply and thy wisdom will flourish.Information for using ganga in LHCb (and where I stole most of the information from): http://ganga.web.cern.ch/ganga/user/html/LHCb/
Ganga/Dirac mailing list archive: https://groups.cern.ch/group/lhcb-distributed-analysis/Lists/Archive/100.aspx