Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
btag:tmva [2014/12/17 14:17] – vlambert | btag:tmva [2014/12/17 14:43] – vlambert |
---|
Below are the subsequent steps for preparing the training samples for the TMVA: | Below are the subsequent steps for preparing the training samples for the TMVA: |
| |
**1)** Make the trees really flat without vectors and set variables that are not defined for a given vertex category to a default value. For this, run your ntuples through **createNewTree.py** which will produce sets of new flat ntuples split in event range such as //CombinedSVV2NoVertex_DUSG_0_249999.root// with the shared tree name //"tree"//. | **1)** The samples most likely need to be skimmed to not cause a memory allocation error for the TMVA training. One can first skim the samples, selecting 20,000 events in each pt/eta bin for each flavour/vertex root file, to ensure that there are enough events in each pt/eta bin for the training. This is performed by **TrainingFilter.py**. |
| |
**2)** | **2)** Make the trees really flat without vectors and set variables that are not defined for a given vertex category to a default value. For this, run your ntuples through **createNewTree.py** which will produce sets of new flat ntuples split in event range such as //CombinedSVV2NoVertex_DUSG_0_249999.root// with the shared tree name //"tree"//. |
| |
| *For the training, one can either combine these ntuples with hadd or leave them as is for the rest of the processing. |
| |
| **3)** Produce the category normalization weights for the training sample with **Normalization_Weights.C** and save the output to a text such as //QCD_normweights.txt//. These will be added as a weight branch "weight_norm" which flattens the vertex category distribution for the training sample. |
| |
| **4)** Assuming the evaluation sample vertex category weights have been produced (look at procedures for Evaluation Samples), add the normalization and category weight branches to the flat ntuples with **addWeightBranch.py**. The combination of these weights will remove the training sample vertex category information and match it with that of the evaluation sample. |
| |
| **5)** Create 2D Pt/Eta Histograms for the weighted ntuples with **createEtaPtWeightHists.py** (make sure "weight_norm*weight_category" are set for the weight in Draw() for the histograms). There will be 12 histograms created, 9 for the individual flavour/category files and 3 combined histograms, one for each flavour. |
| |
| **6)** Make the final weighted ntuples making sure that the new Pt/Eta histogram files are pointed to in **addWeightBranch.py**. There should be six new branches created:\\ |
| -**weight_etaPt** : the Pt/Eta weight, specific for a flavour/category file (for category dedicated training)\\ |
| -**weight_etaPtInc**: the Pt/Eta weight, inclusive for the flavour\\ |
| -**weight_category**: the category weight from the evaluation sample\\ |
| -**weight_norm** : the normalization weight from the training sample\\ |
| -**weight_flavour** : the ratio of the flavour prevalences in the evaluation process\\ |
| -**weight** : (weight_etaPtInc) x (weight_norm x weight_category) x (weight_flavour) //-- this can be used for combined trainings//\\ |
| |
| |