User Tools

Site Tools


computing:crab

This is an old revision of the document!


CRAB3

CRAB2

CRAB2 has been superseded by CRAB3.

Setup local environment

In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS

# lxplus
source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh cmsenv
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh voms-proxy-init -voms cms

# tier3
source /swshare/psit3/etc/profile.d/cms_ui_env.sh cmsenv
source /swshare/CRAB/CRAB_2_9_1/crab.sh voms-proxy-init -voms cms

By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, you just source the standard CMS setup file (and this line is valid on any LCG Grid site in the world):

source $VO_CMS_SW_DIR/cmsset_default.sh

If you don't want to setup a CMSSW environment and want to use ROOT:

source /swshare/ROOT/thisroot.sh

To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute

kinit YourCERNAFSName@CERN.CH aklog cern.ch



CRAB setup

import FWCore.ParameterSet.Config as cms

A CMSSW configuration file is necessary to tell what the job should do. For example,

process = cms.Process('Slurp') tutorial.py [1]:
process.source = cms.Source("PoolSource", fileNames = process.maxEvents = cms.untracked.PSet( input = process.options = cms.untracked.PSet( wantSummary =
cms.untracked.vstring()) cms.untracked.int32(10) ) cms.untracked.bool(True) )
process.output = cms.OutputModule("PoolOutputModule", outputCommands = cms.untracked.vstring("drop *", "keep
recoTracks_*_*_*"),
fileName = cms.untracked.string('outfile.root'), )
process.out_step = cms.EndPath(process.output)

CRAB configuration file for Monte Carlo data

The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:

[CMSSW]
total_number_of_events  = 10
number_of_jobs
pset
datasetpath
MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO
= 5
= tutorial.py
= /RelValProdTTbar/JobRobot-
output_file              = outfile.root
[USER]

return_data
copy_data
storage_element
user_remote_dir
= 0
 = 1
= T2_IT_Legnaro
 = TutGridSchool
[CRAB]
scheduler = remoteGlidein
jobtype                 = cmssw

Analyse published results

To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g.

[CMSSW]
....
datasetpath=your_dataset_path
dbs_url=url_local_dbs

Note: As dbs_url, the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g. Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet 

Local jobs

It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need “scheduler = sge” instead of “scheduler = remoteGlidein”. Only have an [SGE] section when you actually use the 'sge' scheduler.

Example of local jobs

https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg

Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.

Non local jobs

To run “normal” CRAB jobs anywhere the [SGE] section needs to be removed and it is necessary tostage out to the T2 ().

Example of remote jobs (change the ''storage_element'' variable)

https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configuration_file_for_CMS

Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3. If it is only very few jobs (say <50 or so) with small output files it is also possible to stage out directly to the T3 (just put storage_element = T3_CH_PSI).

# CRAB cfg file used for tH analysis
[CRAB]
jobtype = cmssw
scheduler = remoteGlidein [CMSSW]
datasetpath = /TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_LHEv3/dknowlto-TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_AODSIMv3-836fdeb709619a3677b02c7f68269\
4c9/USER
dbs_url = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet
pset = SingleTopSkim_TChannel_cfg.py events_per_job = 2
total_number_of_events = 10
get_edm_output
# dbs_url = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_01/servlet/DBSServlet
[USER] return_data = 0 copy_data storage_element storage_path
= 1
= 1
= T2_CH_CSCS
=
/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova
publish_data = 0

List of configuration parameters

The list of the main parameters you need to specify on your crab.cfg:

  • pset: the CMSSW configuration file name;
  • output_file: the output file name produced by your pset; if in the CMSSW pset the

output is defined in TFileService, the file is automatically handled by CRAB, and there is no need to specify it on this parameter;

  • datasetpath: the full dataset name you want to analyze;
  • Jobs splitting:

By event: only for MC data. You need to specify 2 of these parameters: total_number_of_events, number_of_jobs, events_per_job

  • specify the total_number_of_events and the number_of_jobs: this will assign to each job a number of events equal to total_number_of_events/number_of_jobs
  • specify the total_number_of_events and the events_per_job: this will assign to each job events_per_job events and will calculate the number of jobs by total_number_of_events/events_per_job
  • or you can specify the number_of_jobs and the events_per_job
  • By lumi: real data require it. You need to specify 2 of these parameters: total_number_of_lumis, lumis_per_job, number_of_jobs
  • because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are “advice” to CRAB rather than determinative.
  • specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job
  • or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and

lumis to process. CRAB will skip luminosity blocks not listed in the file.

  • return_data: this can be 0 or 1; if it is one you will retrieve your output files to your

local working area;

  • copy_data: this can be 0 or 1; if it is one you will copy your output files to a

remote Storage Element;

  • local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg
  • publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;
  • use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;
  • scheduler: the name of the scheduler you want to use;
computing/crab.1571930189.txt.gz · Last modified: 2019/10/24 17:16 by iwn