Getting started with Ganga

Getting started with Ganga

Introduction

Ganga is a system for job submission to different backends.

Where to run

Ganga can be run on every lxplus node. The preferred option however is to run it locally from Zurich on grid-ui, as there are no (less) disk quota constraints.

Ganga configuration on Zurich Grid Cluster

Before starting Ganga the first time on the Zurich cluster, download and install the following .gangarc file in your home directory. This will create the correct Ganga environement for you.

Starting Ganga on lxplus / grid-ui

To set the environment:

SetupProject Ganga

The following line is optional. Only needed if you want a fresh .gangarc file.

ganga -g

To start Ganga:

ganga

You can also start Ganga with a GUI:

ganga --gui

Create JobTemplate in Ganga

To create a JobTemplate object called “t”, and enter the DaVinci version you are using:

t = JobTemplate( application = DaVinci( version = "v20r3" ))

The next code is optional in case you want to get packages:

t.application.getpack( "Tutorial/Analysis v7r6" )
t.application.getpack( "Phys/DaVinci v20r3" )

Define your masterpackage:

t.application.masterpackage = "Phys/DaVinci"

Define your options file:

t.application.optsfile = [ "~/cmtuser/DaVinci_v20r3/Tutorial/Analysis/options/BsKK2Feb09.py" ]

Compile the application (only needed if you need to compile the libraries for a different architecture or you have uncompiled code):

t.application.make()

Create a job in Ganga

Create a job with your JobTemplate “t”, specify the backend (where you want to run your job):

Interactive() to run interactively
Local() to run on the local lxplus node
LSF() to run on lxbatch → specify a queue otherwise your job gets eventually killed before it is finished
PBS() to run on the batch system in Zurich
Dirac() to run on the GRID

possible LSF queues:

8nm (8 normalised minutes)
1nh (1 normalised hour)
8nh (8 normalised hours)
1nd (1 normalised day)
1nw (1 normalised week)

j = Job( t, backend = LSF( queue = '1nd' ) )

submit the job:

j.submit()

Forcing a job into status completed or failed

For example for job 396:

jobs(396).force_status('failed', force=True)

Instead of 'failed' one can choose 'completed'.

How to resubmit a stalled job or subjob

Copy the job or subjob and submit it.

j = jobs(jobNumber).subjobs(subJobNumber).Copy()
j.submit()

How to resubmit all failed subjobs

Sometimes it is also useful to ban a site and resubmit the jobs.

for js in jobs(24).subjobs.select(status='failed').ids():
   js.backend.settings['BannedSites'] = ['LCG.CERN.ch'] #optional
   js.resubmit()

for job in jobs.select(5,11):
   for js in job.subjobs:
      if js.status=='failed':
         js.backend.settings['BannedSites'] = ['LCG.CERN.ch'] #optional
         js.resubmit()

How to reset the monitoring

Sometimes the monitoring can get stuck and your jobs stays (f.ex.) in the completing state forever. You can reset the monitoring for a specific job with:

jobs(x).subjobs(y).backend.reset()

Writing a script for Ganga

Instead of typing the above lines in the interactive mode in Ganga, you can also write a (python)-script, that does all this for you, e.g.:

gangasub.py

#This is gangasub.py
t = JobTemplate( application = DaVinci( version = "v20r3" ))
t.application.masterpackage = "Phys/DaVinci"
t.application.optsfile = [ "~/cmtuser/DaVinci_v20r3/Tutorial/Analysis/options/BsKK2Feb09.py" ]
t.application.make()
j = Job( t, backend = LSF( queue = '1nd' ) )
j.submit()

You then can execute the script from commandline by typing:

ganga -i gangasub.py

The -i is for interactive and will start the interactive ganga shell after the job submission.

Alternatively you can execute the script from within a running ganga session:

%run gangasub.py

(This sometimes may lead to trouble…)

Using inputdata

The recommended way of specifying the data files to run over is by using inputdata. If you want to select the data in the bookkeeping use:

j.inputdata = browseBK()

Alternatively you can load data from a file:

%run inputData.py
j.inputdata = inputPFN
# or 
#j.inputdata = inputLFN

With

inputData.py

# for Physical File Names
inputPFN = [
  "PFN:castor://castorlhcb.cern.ch:9002//castor/cern.ch/grid/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst",
  "PFN:castor://castorlhcb.cern.ch:9002//castor/cern.ch/grid/lhcb/MC/MC09/DST/00004831/0000/00004831_00000002_1.dst",
 ]
# or for Logical File Names
inputLFN = [
  "LFN:/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst",
  "LFN:/lhcb/MC/MC09/DST/00004831/0000/00004831_00000001_1.dst",
]

Input Sandbox

If you need to send a file with the job (f.ex. a new private database), you can specify these files in the input sandbox.

j.inputsandbox = ['myfile1.txt','myfile2.py']

where j is your job. You can access these files in your job-options without giving a subdirectory, for example:

myclass.input = "myfile1.txt"

Note: You don't need to put your private libraries (if you have modified a package) in the input sandbox. Ganga does this automatically for you.

Output

There are two sort of output locations:

Sandbox: Used for small output (N-tuples, text-files). These files are normally sent back to the user no matter where the job was executed.
Output: Used for larger files, e.g. DSTs.

If the output of your job is a root-file created with NTupleSvc (in a Gaudi (DaVinci, Gauss) job), it will be automatically stored in the Output sandbox. This is also true for all files created with HistogramPersistencySvc and MicroDSTStream. Files created with GaussTape, DigiWriter and DstWriter are automatically added to the output. To change this list, open your .gangarc-file in your home-directory, uncomment the corresponding line and edit the list.

Note: In DIRAC, all files larger than 10 MB are put into the output, no matter what was specified before.

Note: With backend = LSF, if you want to have e.g. the nTuple only in the output-directory: Remove the NTupleSvc in the list outputsandbox_types in your .gangarc-file (Is this true??? Should be like this, but is not yet true (10th of May 2010) but to get around this do job.outputdata = [“*.root”] then the rootfile goes always to castor no matter how small it is).

Additional Output

If you want to save an output that is not written by the mentioned services above (e.g. a txt-file, written by a self-written configurable), you have to add it either to the output or the sandbox. For instance:

j.outputdata = [ 'mytextfile1.txt' ] # if you want to return data via the outputdir
j.outputsandbox = [ 'mytextfile2.txt' ] # if you want to return data via the outputsandbox

Note: If you have a list of files in outputdata and not all of them exist, Dirac will not let you download any of the existing.

From Ganga v505r14 on you can use wildcards to in the outputdata, f.ex.

j.outputdata = ["*.root", "*.dst"]

Setting the Output Directory

When running on the LSF, your files will be stored under $CASTOR_HOME/gangadir/<j.id>/outputdata/. You can change this by modifing the corresponding line in the .gangarc-file.

When running with Dirac, your files will be registered in the LFC (logical file catalog) and are stored under LFN:/lhcb/user/<initial>/<username>/<diracid>/<filename>. The <diracid> can be obtained by typing jobs(jobnumber).backend.id

More information on how to retrieve your file having a logical filename here.

Where to find the output

The sandbox output is by default in ~/gangadir/workspace/username/LocalAMGA/jobID/output/. Here an example for user abuechle and jobID = 120:

~/gangadir/workspace/abuechle/LocalAMGA/120/output/

In the output folder you find typically the following things:

(optional) foo.root: some root file which was created by your job for example holding an NTuple
jobstatus: having the information about the start, stop, the queue and the exit code of your job
stderr: lists the errors
stdout: lists the output (not when you run with Interactive backend)

Retrieving the output from Dirac

In ganga, you can get the output from Dirac the following way:

jobs(yourJobNumber).backend().X

where X can be:

getOutputData(): Retrieve data stored on SE and download it to the job output workspace (in your gangadir-folder).
getOutputDataLFNs().getReplicas(): Get a list of outputdata that has been uploaded by Dirac.
getOutputSandbox(): Retrieve output sandbox and download it to the job output workspace.

This information can also be found when typing help(Dirac) in the ganga prompt.

To store all the LFNs of the data in a file, write the following in a script (f.ex. myScript.py)

import sys
jobNumber = int(sys.argv[1])
length = len(jobs(jobNumber).subjobs)
file = open('LFNs.txt','w')
for i in range(0,length):
    output = jobs(jobNumber).subjobs(i).backend.getOutputDataLFNs()
    if( output.hasLFNs() == True):
        s = str(jobs(jobNumber).subjobs(i).backend.getOutputDataLFNs().getReplicas().keys())
        print s
        file.write(s)
        file.write('\n')
file.close()

and start ganga with:

ganga -i myScript.py jobnumber

which will write the output in the file LFNs.txt.

You can then download the files with the bash-script copyGrid2.sh, which allows to download multiple files per job (/home/hep/decianm/scripts/copyGrid2.sh)

You can also download the output directly in ganga using:

import os
for sj in jobs(jobnumber).subjobs:
 if sj.status == "completed": 
   if False in map(lambda x:os.path.exists(sj.outputdir+x),sj.outputdata.files):
    sj.backend.getOutputData()

Copying output from Grid-jobs to CASTOR

For every job / subjob, do:

ds=j.backend.getOutputDataLFNs()
ds.replicate('CERN-USER')

The files will then be under /castor/cern.ch/grid/lhcb/user/initial/name/

User defined methods

You can create a file called .ganga.py, place it in your home directory and define methods therein. For example

def resubmit_jobs(jobnumber, BannedSites):
    for js in jobs(jobnumber).subjobs:
        if js.status=='failed':
            js.backend.settings['BannedSites'] = [ BannedSites ] #optional
            js.resubmit()

will resubmit all failed jobs, but not to the banned site. In ganga, you can call the method with:

resubmit_jobs(123, 'LCG.CERN.CH')

If you want to be really fancy, you can also overwrite predefined functions of ganga…

More useful commands

To execute a shell command from the ganga command line just put a ! in front of your command:

! ls ~/gangadir/workspace/abuechle/LocalAMGA/120/output/

To look what your jobs are doing:

print jobs

For getting all the information about one particular job (i.e. job 120):

full_print(jobs(120))

To peek at stdout or stderr of a job (i.e. job 120) from the Ganga command line:

jobs(120).peek('stdout')

To find our what valid possibilities exists for example for backends:

plugins("backends")

To select a range of jobs, rather than only one, use (the ranges are inclusive):

jobs.select(lowerrange, upperrange)

Extra Options

You can also specify some extra options which override the options in your python options file.

j.application.extraopts='ApplicationMgr().EvtMax = 1000'

or multiple extraopts for example:

t.application.extraopts = 'DaVinci().EvtMax = 300;''DaVinci().PrintFreq = 1;'

Extra Option Files

You can specify more then one option file

j.application.optsfile = [ File( "~/example/myOpts.py"), File( "~/example/moreOpts.py") ]
j.application.optsfile.append( File( "~/example/evenMoreOpts.py") )
j.application.optsfile[3:] = [ File( "~/example/optsFour.py"), File( "~/example/optsFive.py") ]

Show CMT dependencies

To show in which directories ganga looks to check the packages the job depends on, type:

j.application.cmt('show uses')

Clearing the jobs()-list

If you have a lot of old jobs displayed, when typing jobs() at the ganga prompt, you can remove them with jobs(jobnumber).remove(). To do this for a large list, you can use standard-python, e.g.:

mylist = range(firstnumber, lastnumber)
for i in mylist: jobs(i).remove()

Of course, you can also do:

jobs.select(firstnumber,lastnumber).remove()

Splitting Jobs

When you have to process a large amount of data, it is possible to split your job into several subjobs (rather than submitting several jobs with different input). To split a job, you can use:

j.splitter=SplitByFiles(filesPerJob=25, maxFiles=500)

which will process 500 inputfiles in total where every subjob processed 25 input files. If the second argument is not specified, all available files will be used.

Submitting a Job on the local batch system in Zurich (PBS)

You can also submit a job to the local batch system (SLC5) in Zurich. Mandatory are:

j = Job( t, backend = PBS() )
j.backend.extraopts = "-l cput=hh:mm:ss"

where cput is the CPU time in hours:minutes:seconds. The other options should be identical to submitting to the LSF.

You can also pass a variable to your job script (which may be more useful when submitting directly via qsub, but anyway):

j.backend.extraopts = "-v myVariable=13"

where you set the variable “myVariable” in your job script to 13.

Submitting a Job with DIRAC

To submit a job with DIRAC on the GRID, you have to

add:

t.application.platform = "slc4_ia32_gcc34"

for SLC4, or

t.application.platform = "x86_64-slc5-gcc43-opt"

for SLC5

NOTE: Always make sure that you have compiled your own private libraries (for modified packages) for the correct platform, or do a t.application.make() in ganga. Otherwise your job will be submitted without complaint, but it will not find the libraries and therefore crash quite spectacularly.

Also note: There are two types of SLC4 libraries: slc4_ia32_gcc34 and slc4_amd64_gcc34. They are not the same nor compatible and your job won't run.

When submitting the job.

restrict your dataset to 100 files.
use LFNs (instead of PFNs) for your input files.

You may also add the minimum CPU time (in seconds!) that your job will be using. For this, write:

j = Job( t, backend = Dirac( CPUTime=600 ) )

Jobs with a lower CPUTime should start faster than other ones.

Dirac does not use SplitByFiles but a DiracSplitter.

j.splitter=DiracSplitter(filesPerJob=20)

which will take 20 files per job.

Submitting a ROOT-job

It can be useful to submit ROOT-jobs to the Grid via Dirac. You can submit .C-ROOT-scripts or python scripts. If your script needs external libraries, they can be added to the inputsandox like

j = Job()
j.name = 'my great ROOT job'
j.inputsandbox = [ File ( name = '/somePath/lib1.so'), File ( name = '/somePath2/lib2.so')]

You can also pass arguments to a script. Suppose you have a script myScript.C like:

void myScript(int a, int b){
     std::cout << "a: " << a " b: " << b << std::endl;
     // -- Do something useful
     }

and you would like to pass intput arguments for a and b. This can be done with the following piece of code:

j.application = Root (
    version = '5.34.00' ,
    usepython = False ,
    args = [1,2],
    script = File ( name = '/somePath/myScript.C' ) )

Furthermore it is very useful to run subjobs with different arguments for the script. For this you need the ArgSplitter:

argList = [[1,2], [2,3]]
argSplit = ArgSplitter( args = argList )
j.splitter = argSplit

which will generate two subjobs with the arguments (1,2) and (2,3).

A complete example therefore would look like:

argList=[]
for ia in range(1,5): 
    for ib in range(1,7):
       argList.append([ ia, ib])
 
j = Job()
j.name = 'my great ROOT job'
j.inputsandbox = [ File ( name = '/somePath/lib1.so'), File ( name = '/somePath2/lib2.so')]
j.application = Root (
    version = '5.34.00' ,
    usepython = False ,
    script = File ( name = '/somePath/myScript.C' ) )
argSplit = ArgSplitter( args = argList )
j.splitter = argSplit
j.submit()

Note: Due to a weird bug (1.8.2012) 0 is not allowed as an argument…
Note: When submitting a ROOT-job to the LSF, make sure the Ganga version in the shell you submit the ganga-job from and the one requested in j.application agree. You can set the ROOT version for Ganga in the shell with (for example): SetupProject Ganga v508r6 ROOT -v 5.34.00

Forcing a job to run on a specific site

For this, do:

j.backend.diracOpts = 'j.setDestination("LCG.CERN.ch")'

Note that this is not the desired behaviour, as it is not what grid computing is about, but can be useful sometimes.

Job fails after many reschedulings

If your grid-certificate is close to expiring, it is possible that no proxy can be created for the full job duration. If you already have a new grid cert, you have to upload it to Dirac. This is done via:

SetupProject LHCbDirac
dirac-proxy-init -g lhcb_user

Bookkeeping information within Ganga

A simple method that can be added to the ~/.ganga.py to access information directly from the BK can be seen below:

def getBKInfo ( evttype )  :
    from subprocess import Popen, PIPE
 
    serr = open ( '/dev/null' )
    pipe = Popen ( [ 'get_bookkeeping_info'  , str(evttype) ] ,
                   env    = os.environ ,
                   stdout = PIPE       ,
                   stderr = serr       )
 
    stdout = pipe.stdout
    ts = {} 
    result = {} 
 
    for line in stdout :
 
        try :
            value = eval ( line )
        except :
            continue
 
        if not isinstance ( value    , tuple ) : continue
        if not isinstance ( value[0] , str   ) : continue
        if not isinstance ( value[1] , str   ) : continue
        if not isinstance ( value[2] , str   ) : continue
 
        if result.has_key ( value[0] ) : continue
        result [ value[0] ] = value[1:]
 
    return result

In this case two additional files 'get_bookkeeping_info' and 'dirac-bookkeeping-get-prodinfo-eventtype.py' are required to be saved locally in your ~/bin/ directory.

More info can be found here: https://groups.cern.ch/group/lhcb-bender/Lists/Archive/DispForm.aspx?ID=551 and here: https://groups.cern.ch/group/lhcb-bender/Lists/Archive/DispForm.aspx?ID=695

Help

To see the documentation, type help() in the interactive mode. To get help about a specific topic, e.g. Dirac, type help(Dirac) in the interactive mode.

The N Commandments when working with Ganga

Thou shall be patient.
Thou shall never use PFNs when thou needest LFNs.
Thou shall write an email to lhcb-distributed-analysis@cern.ch, if thou art desperate. Thou shall wait patiently for a reply and thy wisdom will flourish.
Thou shall start a test-job before running the full job. The ways of making mistakes are manifold.
Thou shall always check if thy output will be bearable for thy quota.

Links

Information for using ganga in LHCb (and where I stole most of the information from): http://ganga.web.cern.ch/ganga/user/html/LHCb/

Ganga/Dirac mailing list archive: https://groups.cern.ch/group/lhcb-distributed-analysis/Lists/Archive/100.aspx

Table of Contents