Differences

This shows you the differences between two versions of the page.

--- computing:crab [2019/10/24 17:02] – iwn
+++ computing:crab [2019/10/24 17:16] – [CRAB2] iwn
@@ Line 1: / Line 1: @@
-See [[https://github.com/IzaakWN/CRAB]] for an example with CRAB3.
+====== CRAB3 ======
-Setup local environment
+See
+  * Tutorial: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRAB3Tutorial
+  * Configuration: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3ConfigurationFile
+  * Commands: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands
+  * Example: https://github.com/IzaakWN/CRAB
+\\ \\
+====== CRAB2 ======
+CRAB2 has been superseded by CRAB3.
+===== Setup local environment =====
 In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS
-lxplus
+<code>
+# lxplus
 source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh cmsenv
 source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh voms-proxy-init -voms cms
-tier3
+# tier3
 source /swshare/psit3/etc/profile.d/cms_ui_env.sh cmsenv
 source /swshare/CRAB/CRAB_2_9_1/crab.sh voms-proxy-init -voms cms
+</code>
 By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, you just source the standard CMS setup file (and this line is valid on any LCG Grid site in the world):
+<code>
 source $VO_CMS_SW_DIR/cmsset_default.sh
+</code>
 If you don't want to setup a CMSSW environment and want to use ROOT:
+<code>
 source /swshare/ROOT/thisroot.sh
+</code>
 To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute
+<code>
 kinit YourCERNAFSName@CERN.CH aklog cern.ch
+</code>
-CRAB setup
+===== CRAB setup =====
+<code>
 import FWCore.ParameterSet.Config as cms
+</code>
 A CMSSW configuration file is necessary to tell what the job should do. For example,
+<code>
 process = cms.Process('Slurp') tutorial.py [1]:
 process.source = cms.Source("PoolSource", fileNames = process.maxEvents = cms.untracked.PSet( input = process.options = cms.untracked.PSet( wantSummary =
@@ Line 27: / Line 48: @@
 fileName = cms.untracked.string('outfile.root'), )
 process.out_step = cms.EndPath(process.output)
-CRAB configuration file for Monte Carlo data
+</code>
-The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:
+===== CRAB configuration file for Monte Carlo data =====
+The CRAB configuration file (default name ''crab.cfg'') should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:
+<code>
 [CMSSW]
 total_number_of_events  = 10
@@ Line 52: / Line 76: @@
 scheduler = remoteGlidein
 jobtype                 = cmssw
-Analyse published results
+</code>
+===== Analyse published results ======
 To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS
-instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g.
+instance to which the data was published, datasetpath name of your dataset and the ''dbs_url''. To do this you must modify the ''[CMSSW]'' section of your CRAB configuration file, e.g.
+<code>
 [CMSSW]
 ....
 datasetpath=your_dataset_path
 dbs_url=url_local_dbs
-Note:
+</code>
-As dbs_url, the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.
+Note: As ''dbs_url'', the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.
 Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet
 Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet
-Local jobs
+===== Local jobs =====
-It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need "scheduler = sge" instead of "scheduler = remoteGlidein”. Only have an [SGE] section when you actually use the 'sge' scheduler.
-Example of local jobs:
+It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need ''"scheduler = sge"'' instead of ''"scheduler = remoteGlidein"''. Only have an ''[SGE]'' section when you actually use the '''sge''' scheduler.
-https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example _of_crab_cfg
-note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.
+==== Example of local jobs ====
-Non local jobs
+https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg
-To run "normal" CRAB jobs anywhere the "[SGE]" section needs to be removed and it is necessary tostage out to the T2 ().
-Example of remote jobs (change the 'storage_element' variable):
+Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.
-https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configu ration_file_for_CMS
+===== Non local jobs =====
-note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3.
+To run "normal" CRAB jobs anywhere the ''[SGE]'' section needs to be removed and it is necessary tostage out to the T2 ().
+==== Example of remote jobs (change the ''storage_element'' variable) ====
+https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configuration_file_for_CMS
+Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3.
 If it is only very few jobs (say <50 or so) with small output files it is also possible to
-stage out directly to the T3 (just put 'storage_element = T3_CH_PSI').
+stage out directly to the T3 (just put ''storage_element = T3_CH_PSI'').
-CRAB cfg file used for tH analysis
+<code>
+# CRAB cfg file used for tH analysis
 [CRAB]
 jobtype = cmssw
@@ Line 94: / Line 129: @@
 /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova
 publish_data = 0
-List of configuration parameters
+</code>
-The list of the main parameters you need to specify on your crab.cfg:
-• pset: the CMSSW configuration file name;
-• output_file: the output file name produced by your pset; if in the CMSSW pset the
+===== List of configuration parameters =====
+The list of the main parameters you need to specify on your ''crab.cfg'':
+  * ''pset'': the CMSSW configuration file name;
+  * ''output_file'': the output file name produced by your pset; if in the CMSSW pset the
 output is defined in TFileService, the file is automatically handled by CRAB, and
 there is no need to specify it on this parameter;
-• datasetpath: the full dataset name you want to analyze;
+  * ''datasetpath'': the full dataset name you want to analyze;
-• Jobs splitting:
+  * ''Jobs splitting'':
 By event: only for MC data. You need to specify 2 of these parameters:
 total_number_of_events, number_of_jobs, events_per_job
-o specify the total_number_of_events and the number_of_jobs: this will
+  * specify the ''total_number_of_events'' and the ''number_of_jobs'': this will assign to each job a number of events equal to ''total_number_of_events/number_of_jobs''
-assing to each job a number of events equal to
+  * specify the ''total_number_of_events'' and the ''events_per_job'': this will assign to each job ''events_per_job events'' and will calculate the number of jobs by ''total_number_of_events/events_per_job''
-total_number_of_events/number_of_jobs
+  * or you can specify the ''number_of_jobs'' and the ''events_per_job''
-o specify the total_number_of_events and the events_per_job: this will
+  * By lumi: real data require it. You need to specify 2 of these parameters: ''total_number_of_lumis'', ''lumis_per_job'', ''number_of_jobs''
-assign to each job events_per_job events and will calculate the
+  * because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative.
-number of jobs by total_number_of_events/events_per_job
-o or you can specify the number_of_jobs and the events_per_job
-By lumi: real data require it. You need to specify 2 of these parameters:
-total_number_of_lumis, lumis_per_job, number_of_jobs
-o because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative.
   * specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job
   * or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and
@@ Line 121: / Line 154: @@
 local working area;
   * copy_data: this can be 0 or 1; if it is one you will copy your output files to a
 remote Storage Element;
+  * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your ''crab.cfg''
-  * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg
   * publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;
   * use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;
   * scheduler: the name of the scheduler you want to use;

Physik-Institut

CMS Wiki Pages

Differences

Physik-Institut

CMS Wiki Pages

User Tools

Site Tools

Differences

Page Tools