This is an old revision of the document!
See https://github.com/IzaakWN/CRAB for an example with CRAB3.
In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS
# lxplus source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh cmsenv source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh voms-proxy-init -voms cms # tier3 source /swshare/psit3/etc/profile.d/cms_ui_env.sh cmsenv source /swshare/CRAB/CRAB_2_9_1/crab.sh voms-proxy-init -voms cms
By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, you just source the standard CMS setup file (and this line is valid on any LCG Grid site in the world):
source $VO_CMS_SW_DIR/cmsset_default.sh
If you don't want to setup a CMSSW environment and want to use ROOT:
source /swshare/ROOT/thisroot.sh
To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute
kinit YourCERNAFSName@CERN.CH aklog cern.ch

import FWCore.ParameterSet.Config as cms
A CMSSW configuration file is necessary to tell what the job should do. For example,
process = cms.Process('Slurp') tutorial.py [1]: process.source = cms.Source("PoolSource", fileNames = process.maxEvents = cms.untracked.PSet( input = process.options = cms.untracked.PSet( wantSummary = cms.untracked.vstring()) cms.untracked.int32(10) ) cms.untracked.bool(True) ) process.output = cms.OutputModule("PoolOutputModule", outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"), fileName = cms.untracked.string('outfile.root'), ) process.out_step = cms.EndPath(process.output)
The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:
[CMSSW] total_number_of_events = 10 number_of_jobs pset datasetpath MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO = 5 = tutorial.py = /RelValProdTTbar/JobRobot- output_file = outfile.root [USER]  return_data copy_data storage_element user_remote_dir = 0 = 1 = T2_IT_Legnaro = TutGridSchool [CRAB] scheduler = remoteGlidein jobtype = cmssw
To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g.
[CMSSW] .... datasetpath=your_dataset_path dbs_url=url_local_dbs
Note As dbs_url
, the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.
Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet
Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet

It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need “scheduler = sge”
instead of “scheduler = remoteGlidein”
. Only have an [SGE]
section when you actually use the 'sge
' scheduler.
https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg
Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.
To run “normal” CRAB jobs anywhere the “[SGE]” section needs to be removed and it is necessary tostage out to the T2 (). Example of remote jobs (change the 'storage_element' variable): https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configu ration_file_for_CMS note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3. If it is only very few jobs (say <50 or so) with small output files it is also possible to stage out directly to the T3 (just put 'storage_element = T3_CH_PSI').
# CRAB cfg file used for tH analysis [CRAB] jobtype = cmssw scheduler = remoteGlidein [CMSSW] datasetpath = /TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_LHEv3/dknowlto-TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_AODSIMv3-836fdeb709619a3677b02c7f68269\ 4c9/USER dbs_url = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet pset = SingleTopSkim_TChannel_cfg.py events_per_job = 2 total_number_of_events = 10 get_edm_output # dbs_url = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_01/servlet/DBSServlet [USER] return_data = 0 copy_data storage_element storage_path = 1 = 1 = T2_CH_CSCS = /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova publish_data = 0
The list of the main parameters you need to specify on your crab.cfg: • pset: the CMSSW configuration file name; • output_file: the output file name produced by your pset; if in the CMSSW pset the output is defined in TFileService, the file is automatically handled by CRAB, and there is no need to specify it on this parameter; • datasetpath: the full dataset name you want to analyze; • Jobs splitting: By event: only for MC data. You need to specify 2 of these parameters: total_number_of_events, number_of_jobs, events_per_job o specify the total_number_of_events and the number_of_jobs: this will assing to each job a number of events equal to total_number_of_events/number_of_jobs o specify the total_number_of_events and the events_per_job: this will assign to each job events_per_job events and will calculate the number of jobs by total_number_of_events/events_per_job o or you can specify the number_of_jobs and the events_per_job By lumi: real data require it. You need to specify 2 of these parameters: total_number_of_lumis, lumis_per_job, number_of_jobs o because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are “advice” to CRAB rather than determinative.
lumis to process. CRAB will skip luminosity blocks not listed in the file.
local working area;
remote Storage Element;