User Tools

Site Tools


computing:crab

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
computing:crab [2019/10/24 17:09] – [Example of local jobs] iwncomputing:crab [2019/10/24 17:17] (current) – [CRAB3] iwn
Line 2: Line 2:
 ====== CRAB3 ====== ====== CRAB3 ======
  
-See [[https://github.com/IzaakWN/CRAB]] for an example with CRAB3. +See 
 +  * Tutorial: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRAB3Tutorial 
 +  * Configuration: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3ConfigurationFile 
 +  * Commands: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands 
 +  * Example: https://github.com/IzaakWN/CRAB
  
 +\\
 ====== CRAB2 ====== ====== CRAB2 ======
  
 +CRAB2 has been superseded by CRAB3.
 ===== Setup local environment ===== ===== Setup local environment =====
 In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS
Line 46: Line 51:
  
 ===== CRAB configuration file for Monte Carlo data ===== ===== CRAB configuration file for Monte Carlo data =====
-The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:+The CRAB configuration file (default name ''crab.cfg'') should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content:
 <code> <code>
 [CMSSW] [CMSSW]
Line 75: Line 80:
 ===== Analyse published results ====== ===== Analyse published results ======
 To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS
-instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g.+instance to which the data was published, datasetpath name of your dataset and the ''dbs_url''. To do this you must modify the ''[CMSSW]'' section of your CRAB configuration file, e.g.
 <code> <code>
 [CMSSW] [CMSSW]
Line 82: Line 87:
 dbs_url=url_local_dbs dbs_url=url_local_dbs
 </code> </code>
-<wrap important>Note</wrap> As ''dbs_url'', the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.+NoteAs ''dbs_url'', the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g.
 Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet Writing: https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSSe rvlet
 Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet Reading: http://cmsdbsprod.cern.ch/cms_dbs_ph_analysis_02/servlet/DBSServlet
Line 92: Line 97:
 ==== Example of local jobs ==== ==== Example of local jobs ====
 https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg https://wiki.chipp.ch/twiki/bin/view/CmsTier3/HowToManageJobsWithCRAB#Example_of_crab_cfg
-<wrap important>Note:</wrap> This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.+ 
 +Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results.
 ===== Non local jobs ===== ===== Non local jobs =====
  
-To run "normal" CRAB jobs anywhere the "[SGE]section needs to be removed and it is necessary tostage out to the T2 (). +To run "normal" CRAB jobs anywhere the ''[SGE]'' section needs to be removed and it is necessary tostage out to the T2 (). 
-Example of remote jobs (change the 'storage_element' variable): + 
-https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configu ration_file_for_CMS +==== Example of remote jobs (change the ''storage_element'' variable) ==== 
-note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3.+ 
 +https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial#CRAB_configuration_file_for_CMS 
 + 
 +Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3.
 If it is only very few jobs (say <50 or so) with small output files it is also possible to If it is only very few jobs (say <50 or so) with small output files it is also possible to
-stage out directly to the T3 (just put 'storage_element = T3_CH_PSI').+stage out directly to the T3 (just put ''storage_element = T3_CH_PSI'').
 <code> <code>
 # CRAB cfg file used for tH analysis # CRAB cfg file used for tH analysis
Line 120: Line 129:
 /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/dpinna/ #user_remote_dir = /prova
 publish_data = 0 publish_data = 0
-<code>+</code>
  
  
 ===== List of configuration parameters ===== ===== List of configuration parameters =====
  
-The list of the main parameters you need to specify on your crab.cfg: +The list of the main parameters you need to specify on your ''crab.cfg''
-• pset: the CMSSW configuration file name; +  * ''pset'': the CMSSW configuration file name; 
-• output_file: the output file name produced by your pset; if in the CMSSW pset the+  * ''output_file'': the output file name produced by your pset; if in the CMSSW pset the
 output is defined in TFileService, the file is automatically handled by CRAB, and output is defined in TFileService, the file is automatically handled by CRAB, and
 there is no need to specify it on this parameter; there is no need to specify it on this parameter;
-• datasetpath: the full dataset name you want to analyze; +  * ''datasetpath'': the full dataset name you want to analyze; 
-• Jobs splitting:+  * ''Jobs splitting'':
 By event: only for MC data. You need to specify 2 of these parameters: By event: only for MC data. You need to specify 2 of these parameters:
 total_number_of_events, number_of_jobs, events_per_job total_number_of_events, number_of_jobs, events_per_job
-specify the total_number_of_events and the number_of_jobs: this will +  * specify the ''total_number_of_events'' and the ''number_of_jobs'': this will assign to each job a number of events equal to ''total_number_of_events/number_of_jobs'' 
-assing to each job a number of events equal to +  specify the ''total_number_of_events'' and the ''events_per_job'': this will assign to each job ''events_per_job events'' and will calculate the number of jobs by ''total_number_of_events/events_per_job'' 
-total_number_of_events/number_of_jobs +  or you can specify the ''number_of_jobs'' and the ''events_per_job'' 
-specify the total_number_of_events and the events_per_job: this will +  By lumi: real data require it. You need to specify 2 of these parameters: ''total_number_of_lumis''''lumis_per_job''''number_of_jobs'' 
-assign to each job events_per_job events and will calculate the +  because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative.
-number of jobs by total_number_of_events/events_per_job +
-or you can specify the number_of_jobs and the events_per_job +
-By lumi: real data require it. You need to specify 2 of these parameters: +
-total_number_of_lumis, lumis_per_job, number_of_jobs +
-because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, a single job cannot analyze files from multiple blocks in DBS. So these parameters are "advice" to CRAB rather than determinative. +
   * specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job   * specify the lumis_per_job and the number_of_jobs: the total number of lumis processed will be number_of_jobs x lumis_per_job
   * or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and   * or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and
Line 152: Line 155:
   * copy_data: this can be 0 or 1; if it is one you will copy your output files to a   * copy_data: this can be 0 or 1; if it is one you will copy your output files to a
 remote Storage Element; remote Storage Element;
-  * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your crab.cfg+  * local_stage_out: this can be 0 or 1; if this is one your produced output is copied to the closeSE in the case of failure of the copy to the SE specified in your ''crab.cfg''
   * publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;   * publish_data: this can be 0 or 1; if it is one you can publish your produced data to a local DBS;
   * use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;   * use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0;
   * scheduler: the name of the scheduler you want to use;   * scheduler: the name of the scheduler you want to use;
computing/crab.1571929792.txt.gz · Last modified: 2019/10/24 17:09 by iwn