User Tools

Site Tools




CRAB2 has been superseded by CRAB3.

Setup local environment

In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS

# lxplus
source /afs/ cmsenv
source /afs/ voms-proxy-init -voms cms

# tier3
source /swshare/psit3/etc/profile.d/ cmsenv
source /swshare/CRAB/CRAB_2_9_1/ voms-proxy-init -voms cms

By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, you just source the standard CMS setup file (and this line is valid on any LCG Grid site in the world):

source $VO_CMS_SW_DIR/

If you don't want to setup a CMSSW environment and want to use ROOT:

source /swshare/ROOT/

To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute

kinit YourCERNAFSName@CERN.CH aklog

CRAB setup

import FWCore.ParameterSet.Config as cms

A CMSSW configuration file is necessary to tell what the job should do. For example,

process = cms.Process('Slurp') [1]:
process.source = cms.Source("PoolSource", fileNames = process.maxEvents = cms.untracked.PSet( input = process.options = cms.untracked.PSet( wantSummary =
cms.untracked.vstring()) cms.untracked.int32(10) ) cms.untracked.bool(True) )
process.output = cms.OutputModule("PoolOutputModule", outputCommands = cms.untracked.vstring("drop *", "keep
fileName = cms.untracked.string('outfile.root'), )
process.out_step = cms.EndPath(process.output)

CRAB configuration file for Monte Carlo data

The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:

total_number_of_events  = 10
= 5
= /RelValProdTTbar/JobRobot-
output_file              = outfile.root

= 0
 = 1
= T2_IT_Legnaro
 = TutGridSchool
scheduler = remoteGlidein
jobtype                 = cmssw

Analyse published results

To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g.


Note: As dbs_url, the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g. Writing: rvlet Reading: 

Local jobs

It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need “scheduler = sge” instead of “scheduler = remoteGlidein”. Only have an [SGE] section when you actually use the 'sge' scheduler.

Example of local jobs

Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.

Non local jobs

To run “normal” CRAB jobs anywhere the [SGE] section needs to be removed and it is necessary tostage out to the T2 ().

Example of remote jobs (change the ''storage_element'' variable)

Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3. If it is only very few jobs (say <50 or so) with small output files it is also possible to stage out directly to the T3 (just put storage_element = T3_CH_PSI).

# CRAB cfg file used for tH analysis
jobtype = cmssw
scheduler = remoteGlidein [CMSSW]
datasetpath = /TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_LHEv3/dknowlto-TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_AODSIMv3-836fdeb709619a3677b02c7f68269\
dbs_url =
pset = events_per_job = 2
total_number_of_events = 10
# dbs_url =
[USER] return_data = 0 copy_data storage_element storage_path
= 1
= 1
/srm/managerv2?SFN=/pnfs/ #user_remote_dir = /prova
publish_data = 0

List of configuration parameters

The list of the main parameters you need to specify on your crab.cfg:

  • pset: the CMSSW configuration file name;
  • output_file: the output file name produced by your pset; if in the CMSSW pset the

output is defined in TFileService, the file is automatically handled by CRAB, and there is no need to specify it on this parameter;

  • datasetpath: the full dataset name you want to analyze;
  • Jobs splitting:

By event: only for MC data. You need to specify 2 of these parameters: total_number_of_events, number_of_jobs, events_per_job

  • specify the total_number_of_events and the number_of_jobs: this will assign to each job a number of events equal to total_number_of_events/number_of_jobs
  • specify the total_number_of_events and the events_per_job: this will assign to each job events_per_job events and will calculate the number of jobs by total_number_of_events/events_per_job
  • or you can specify the number_of_jobs and the events_per_job
  • By lumi: real data require it. You need to specify 2 of these parameters: total_number_of_lumis, lumis_per_job, number_of_jobs
  • because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are “advice” to CRAB rather than determinative.
  • specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job
  • or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and

lumis to process. CRAB will skip luminosity blocks not listed in the file.

  • return_data: this can be 0 or 1; if it is one you will retrieve your output files to your

local working area;

  • copy_data: this can be 0 or 1; if it is one you will copy your output files to a

remote Storage Element;

  • local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg
  • publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;
  • use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;
  • scheduler: the name of the scheduler you want to use;
computing/crab.txt · Last modified: 2019/10/24 17:17 by iwn