This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
computing:crab [2019/10/24 17:02] – iwn | computing:crab [2019/10/24 17:16] – [CRAB2] iwn | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | See [[https:// | ||
+ | ====== CRAB3 ====== | ||
- | Setup local environment | + | See |
+ | * Tutorial: https:// | ||
+ | * Configuration: | ||
+ | * Commands: https:// | ||
+ | * Example: https:// | ||
+ | |||
+ | \\ \\ | ||
+ | ====== CRAB2 ====== | ||
+ | |||
+ | CRAB2 has been superseded by CRAB3. | ||
+ | ===== Setup local environment | ||
In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS | In order to submit jobs to the Grid, you must have an access to a LCG User Interface (LCG UI). It will allow you to access WLCG-affiliated resources in a fully transparent way. Then, the setup of the CMSSW software and the source of the CRAB environment should be done with this order. Remember to create a proxy certificate for CMS | ||
- | lxplus | + | < |
+ | # lxplus | ||
source / | source / | ||
source / | source / | ||
- | tier3 | + | |
+ | # tier3 | ||
source / | source / | ||
source / | source / | ||
+ | </ | ||
By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, | By default your environment is set up as a neutral Grid User Interface (to allow running of tasks which may suffer from a contaminated environment). To load the standard CMS environment, | ||
+ | < | ||
source $VO_CMS_SW_DIR/ | source $VO_CMS_SW_DIR/ | ||
+ | </ | ||
If you don't want to setup a CMSSW environment and want to use ROOT: | If you don't want to setup a CMSSW environment and want to use ROOT: | ||
+ | < | ||
source / | source / | ||
+ | </ | ||
To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute | To access your CERN AFS account you need to obtain an AFS token from the CERN server. Execute | ||
+ | < | ||
kinit YourCERNAFSName@CERN.CH aklog cern.ch | kinit YourCERNAFSName@CERN.CH aklog cern.ch | ||
- |  | + | </ |
- | CRAB setup | + | ===== CRAB setup ===== |
+ | < | ||
import FWCore.ParameterSet.Config as cms | import FWCore.ParameterSet.Config as cms | ||
+ | </ | ||
A CMSSW configuration file is necessary to tell what the job should do. For example, | A CMSSW configuration file is necessary to tell what the job should do. For example, | ||
+ | < | ||
process = cms.Process(' | process = cms.Process(' | ||
process.source = cms.Source(" | process.source = cms.Source(" | ||
Line 27: | Line 48: | ||
fileName = cms.untracked.string(' | fileName = cms.untracked.string(' | ||
process.out_step = cms.EndPath(process.output) | process.out_step = cms.EndPath(process.output) | ||
- | CRAB configuration file for Monte Carlo data | + | </ |
- | The CRAB configuration file (default name crab.cfg) should be located at the same location as the CMSSW parameter-set to be used by CRAB with the following content: | + | |
+ | ===== CRAB configuration file for Monte Carlo data ===== | ||
+ | The CRAB configuration file (default name '' | ||
+ | < | ||
[CMSSW] | [CMSSW] | ||
total_number_of_events | total_number_of_events | ||
Line 52: | Line 76: | ||
scheduler = remoteGlidein | scheduler = remoteGlidein | ||
jobtype | jobtype | ||
- | Analyse published results | + | </ |
+ | |||
+ | ===== Analyse published results | ||
To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS | To analyse results that have been published in a local DBS you may use a CRAB configuration identical to any other, with the addition that you must specify the DBS | ||
- | instance to which the data was published, datasetpath name of your dataset and the dbs_url. To do this you must modify the [CMSSW] section of your CRAB configuration file, e.g. | + | instance to which the data was published, datasetpath name of your dataset and the '' |
+ | < | ||
[CMSSW] | [CMSSW] | ||
.... | .... | ||
datasetpath=your_dataset_path | datasetpath=your_dataset_path | ||
dbs_url=url_local_dbs | dbs_url=url_local_dbs | ||
- | Note: | + | </ |
- | As dbs_url, the URL for the DBS instance used in publication. Note that dbs_url should point to a read-only URL, which has a slightly different syntax from the one used to write in DBS. E.g. | + | Note: As '' |
Writing: https:// | Writing: https:// | ||
Reading: http:// | Reading: http:// | ||
 |  | ||
- | Local jobs | + | ===== Local jobs ===== |
- | It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need " | + | |
- | Example of local jobs: | + | It is possible to run CRAB jobs on the T3 only if the dataset used is as well on the T3. In this case you need '' |
- | https:// | + | |
- | note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results. | + | ==== Example of local jobs ==== |
- | Non local jobs | + | https:// |
- | To run " | + | |
- | Example of remote jobs (change the ' | + | Note: This type of jobs cannot be used to process a dataset that is not on the tier3. The network connection to the T3 is not fast enough to sustain a useful write speed in the stage-out step and the jobs will fail in the very end - i.e. when trying to copy the results. |
- | https:// | + | ===== Non local jobs ===== |
- | note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3. | + | |
+ | To run " | ||
+ | |||
+ | ==== Example of remote jobs (change the '' | ||
+ | |||
+ | https:// | ||
+ | |||
+ | Note: This is the recommended solution in case the dataset is not stored on the tier3. So the recommended solution is to stage-out to the T2 and then copy the files (using lcg-cp or data_replica) to the T3. | ||
If it is only very few jobs (say <50 or so) with small output files it is also possible to | If it is only very few jobs (say <50 or so) with small output files it is also possible to | ||
- | stage out directly to the T3 (just put ' | + | stage out directly to the T3 (just put '' |
- | CRAB cfg file used for tH analysis | + | < |
+ | # CRAB cfg file used for tH analysis | ||
[CRAB] | [CRAB] | ||
jobtype = cmssw | jobtype = cmssw | ||
Line 94: | Line 129: | ||
/ | / | ||
publish_data = 0 | publish_data = 0 | ||
- | List of configuration parameters | + | </ |
- | The list of the main parameters you need to specify on your crab.cfg: | + | |
- | • pset: the CMSSW configuration file name; | + | |
- | • output_file: | + | ===== List of configuration parameters |
+ | |||
+ | The list of the main parameters you need to specify on your '' | ||
+ | * '' | ||
+ | * '' | ||
output is defined in TFileService, | output is defined in TFileService, | ||
there is no need to specify it on this parameter; | there is no need to specify it on this parameter; | ||
- | • datasetpath: | + | * '' |
- | • Jobs splitting: | + | * '' |
By event: only for MC data. You need to specify 2 of these parameters: | By event: only for MC data. You need to specify 2 of these parameters: | ||
total_number_of_events, | total_number_of_events, | ||
- | o specify the total_number_of_events and the number_of_jobs: | + | * specify the '' |
- | assing | + | |
- | total_number_of_events/ | + | |
- | o specify the total_number_of_events and the events_per_job: | + | |
- | assign to each job events_per_job events and will calculate the | + | |
- | number of jobs by total_number_of_events/ | + | |
- | o or you can specify the number_of_jobs and the events_per_job | + | |
- | By lumi: real data require it. You need to specify 2 of these parameters: | + | |
- | total_number_of_lumis, | + | |
- | o because jobs in split-by-lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than expected. Additionally, | + | |
* specify the lumis_per_job and the number_of_jobs: | * specify the lumis_per_job and the number_of_jobs: | ||
* or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and | * or you can specify the total_number_of_lumis and the number_of_jobs o lumi_mask: the filename of a JSON file that describes which runs and | ||
Line 121: | Line 154: | ||
local working area; | local working area; | ||
* copy_data: this can be 0 or 1; if it is one you will copy your output files to a | * copy_data: this can be 0 or 1; if it is one you will copy your output files to a | ||
- | remote Storage Element; | + | remote Storage Element; |
- |  | + | * local_stage_out: |
- | * local_stage_out: | + | |
* publish_data: | * publish_data: | ||
* use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0; | * use_server: the usage for crab server is deprecated now, so by default this parameter is set to 0; | ||
* scheduler: the name of the scheduler you want to use; | * scheduler: the name of the scheduler you want to use; |