====== CRAB3 ====== See * Tutorial: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRAB3Tutorial * Configuration: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3ConfigurationFile * Commands: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands * Example: https://github.com/IzaakWN/CRAB \\ ====== CRAB2 ====== CRAB2 has been superseded by CRAB3. ===== Setup local environment ===== In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS # lxplus source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh cmsenv source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh voms-proxy-init -voms cms # tier3 source /swshare/psit3/etc/profile.d/cms_ui_env.sh cmsenv source /swshare/CRAB/CRAB_2_9_1/crab.sh voms-proxy-init -voms cms By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, you just source the standard CMS setup file (and this line is valid on any LCG Grid site in the world): source $VO_CMS_SW_DIR/cmsset_default.sh If you don't want to setup a CMSSW environment and want to use ROOT: source /swshare/ROOT/thisroot.sh To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute kinit YourCERNAFSName@CERN.CH aklog cern.ch  ===== CRAB setup ===== import FWCore.ParameterSet.Config as cms A CMSSW configuration file is necessary to tell what the job should do. For example, process = cms.Process('Slurp') tutorial.py [1]: process.source = cms.Source("PoolSource", fileNames = process.maxEvents = cms.untracked.PSet( input = process.options = cms.untracked.PSet( wantSummary = cms.untracked.vstring()) cms.untracked.int32(10) ) cms.untracked.bool(True) ) process.output = cms.OutputModule("PoolOutputModule", outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_*_*_*"), fileName = cms.untracked.string('outfile.root'), ) process.out_step = cms.EndPath(process.output) ===== CRAB configuration file for Monte Carlo data ===== The CRAB configuration file (default name ''crab.cfg'') should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content: [CMSSW] total_number_of_events = 10 number_of_jobs pset datasetpath MC_3XY_V24_JobRobot-v1/GEN-SIM-DIGI-RECO = 5 = tutorial.py = /RelValProdTTbar/JobRobot- output_file = outfile.root [USER]  return_data copy_data storage_element user_remote_dir = 0 = 1 = T2_IT_Legnaro = TutGridSchool [CRAB] scheduler = remoteGlidein jobtype = cmssw ===== Analyse published results ====== To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS instance to which the data was published, datasetpath name of your dataset and the ''dbs_url''. To do this you must modify the ''[CMSSW]'' section of your CRAB configuration file, e.g. [CMSSW] .... datasetpath=your_dataset_path dbs_url=url_local_dbs Note: As ''dbs_url'', the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g. Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet  ===== Local jobs ===== It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need ''"scheduler = sge"'' instead of ''"scheduler = remoteGlidein"''. Only have an ''[SGE]'' section when you actually use the '''sge''' scheduler. ==== Example of local jobs ==== https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results. ===== Non local jobs ===== To run "normal" CRAB jobs anywhere the ''[SGE]'' section needs to be removed and it is necessary tostage out to the T2 (). ==== Example of remote jobs (change the ''storage_element'' variable) ==== https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configuration_file_for_CMS Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3. If it is only very few jobs (say <50 or so) with small output files it is also possible to stage out directly to the T3 (just put ''storage_element = T3_CH_PSI''). # CRAB cfg file used for tH analysis [CRAB] jobtype = cmssw scheduler = remoteGlidein [CMSSW] datasetpath = /TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_LHEv3/dknowlto-TToBLNuHTo2B_t-channel-NC_8TeV-madgraph- tauola_AODSIMv3-836fdeb709619a3677b02c7f68269\ 4c9/USER dbs_url = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet pset = SingleTopSkim_TChannel_cfg.py events_per_job = 2 total_number_of_events = 10 get_edm_output # dbs_url = http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_01/servlet/DBSServlet [USER] return_data = 0 copy_data storage_element storage_path = 1 = 1 = T2_CH_CSCS = /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova publish_data = 0 ===== List of configuration parameters ===== The list of the main parameters you need to specify on your ''crab.cfg'': * ''pset'': the CMSSW configuration file name; * ''output_file'': the output file name produced by your pset; if in the CMSSW pset the output is defined in TFileService, the file is automatically handled by CRAB, and there is no need to specify it on this parameter; * ''datasetpath'': the full dataset name you want to analyze; * ''Jobs splitting'': By event: only for MC data. You need to specify 2 of these parameters: total_number_of_events, number_of_jobs, events_per_job * specify the ''total_number_of_events'' and the ''number_of_jobs'': this will assign to each job a number of events equal to ''total_number_of_events/number_of_jobs'' * specify the ''total_number_of_events'' and the ''events_per_job'': this will assign to each job ''events_per_job events'' and will calculate the number of jobs by ''total_number_of_events/events_per_job'' * or you can specify the ''number_of_jobs'' and the ''events_per_job'' * By lumi: real data require it. You need to specify 2 of these parameters: ''total_number_of_lumis'', ''lumis_per_job'', ''number_of_jobs'' * because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative. * specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job * or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and lumis to process. CRAB will skip luminosity blocks not listed in the file. * return_data: this can be 0 or 1; if it is one you will retrieve your output files to your local working area; * copy_data: this can be 0 or 1; if it is one you will copy your output files to a remote Storage Element; * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your ''crab.cfg'' * publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS; * use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0; * scheduler: the name of the scheduler you want to use;