Differences

This shows you the differences between two versions of the page.

--- computing:crab [2019/10/24 17:08] – iwn
+++ computing:crab [2019/10/24 17:17] (current) – [CRAB3] iwn
@@ Line 2: / Line 2: @@
 ====== CRAB3 ======
-See [[https://github.com/IzaakWN/CRAB]] for an example with CRAB3.
+See
+  * Tutorial: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRAB3Tutorial
+  * Configuration: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3ConfigurationFile
+  * Commands: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands
+  * Example: https://github.com/IzaakWN/CRAB
+\\
 ====== CRAB2 ======
+CRAB2 has been superseded by CRAB3.
 ===== Setup local environment =====
 In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS
@@ Line 30: / Line 35: @@
 kinit YourCERNAFSName@CERN.CH aklog cern.ch
 </code>
 ===== CRAB setup =====
 <code>
@@ Line 47: / Line 51: @@
 ===== CRAB configuration file for Monte Carlo data =====
-The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:
+The CRAB configuration file (default name ''crab.cfg'') should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:
 <code>
 [CMSSW]
@@ Line 76: / Line 80: @@
 ===== Analyse published results ======
 To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS
-instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g.
+instance to which the data was published, datasetpath name of your dataset and the ''dbs_url''. To do this you must modify the ''[CMSSW]'' section of your CRAB configuration file, e.g.
 <code>
 [CMSSW]
@@ Line 83: / Line 87: @@
 dbs_url=url_local_dbs
 </code>
-<wrap important>Note</wrap> As ''dbs_url'', the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.
+Note: As ''dbs_url'', the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.
 Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet
 Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet
 ===== Local jobs =====
-It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need "scheduler = sge" instead of "scheduler = remoteGlidein”. Only have an [SGE] section when you actually use the 'sge' scheduler.
-Example of local jobs:
-https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example _of_crab_cfg
-note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.
+It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need ''"scheduler = sge"'' instead of ''"scheduler = remoteGlidein"''. Only have an ''[SGE]'' section when you actually use the '''sge''' scheduler.
+==== Example of local jobs ====
+https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg
+Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.
 ===== Non local jobs =====
-To run "normal" CRAB jobs anywhere the "[SGE]" section needs to be removed and it is necessary tostage out to the T2 ().
+To run "normal" CRAB jobs anywhere the ''[SGE]'' section needs to be removed and it is necessary tostage out to the T2 ().
-Example of remote jobs (change the 'storage_element' variable):
-https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configu ration_file_for_CMS
+==== Example of remote jobs (change the ''storage_element'' variable) ====
-note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3.
+https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configuration_file_for_CMS
+Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3.
 If it is only very few jobs (say <50 or so) with small output files it is also possible to
-stage out directly to the T3 (just put 'storage_element = T3_CH_PSI').
+stage out directly to the T3 (just put ''storage_element = T3_CH_PSI'').
 <code>
 # CRAB cfg file used for tH analysis
@@ Line 120: / Line 129: @@
 /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova
 publish_data = 0
-<code>
+</code>
 ===== List of configuration parameters =====
-The list of the main parameters you need to specify on your crab.cfg:
+The list of the main parameters you need to specify on your ''crab.cfg'':
-• pset: the CMSSW configuration file name;
+  * ''pset'': the CMSSW configuration file name;
-• output_file: the output file name produced by your pset; if in the CMSSW pset the
+  * ''output_file'': the output file name produced by your pset; if in the CMSSW pset the
 output is defined in TFileService, the file is automatically handled by CRAB, and
 there is no need to specify it on this parameter;
-• datasetpath: the full dataset name you want to analyze;
+  * ''datasetpath'': the full dataset name you want to analyze;
-• Jobs splitting:
+  * ''Jobs splitting'':
 By event: only for MC data. You need to specify 2 of these parameters:
 total_number_of_events, number_of_jobs, events_per_job
-o specify the total_number_of_events and the number_of_jobs: this will
+  * specify the ''total_number_of_events'' and the ''number_of_jobs'': this will assign to each job a number of events equal to ''total_number_of_events/number_of_jobs''
-assing to each job a number of events equal to
+  * specify the ''total_number_of_events'' and the ''events_per_job'': this will assign to each job ''events_per_job events'' and will calculate the number of jobs by ''total_number_of_events/events_per_job''
-total_number_of_events/number_of_jobs
+  * or you can specify the ''number_of_jobs'' and the ''events_per_job''
-o specify the total_number_of_events and the events_per_job: this will
+  * By lumi: real data require it. You need to specify 2 of these parameters: ''total_number_of_lumis'', ''lumis_per_job'', ''number_of_jobs''
-assign to each job events_per_job events and will calculate the
+  * because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative.
-number of jobs by total_number_of_events/events_per_job
-o or you can specify the number_of_jobs and the events_per_job
-By lumi: real data require it. You need to specify 2 of these parameters:
-total_number_of_lumis, lumis_per_job, number_of_jobs
-o because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative.
   * specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job
   * or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and
@@ Line 152: / Line 155: @@
   * copy_data: this can be 0 or 1; if it is one you will copy your output files to a
 remote Storage Element;
-  * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg
+  * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your ''crab.cfg''
   * publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;
   * use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;
   * scheduler: the name of the scheduler you want to use;

Physik-Institut

CMS Wiki Pages

Differences

Physik-Institut

CMS Wiki Pages

User Tools

Site Tools

Differences

Page Tools