Differences

This shows you the differences between two versions of the page.

--- btag:tmva [2014/12/17 14:17] – vlambert
+++ btag:tmva [2014/12/17 14:43] – vlambert
@@ Line 6: / Line 6: @@
 Below are the subsequent steps for preparing the training samples for the TMVA:
-**1)** Make the trees really flat without vectors and set variables that are not defined for a given vertex category to a default value. For this, run your ntuples through **createNewTree.py** which will produce sets of new flat ntuples split in event range such as //CombinedSVV2NoVertex_DUSG_0_249999.root// with the shared tree name //"tree"//.
+**1)** The samples most likely need to be skimmed to not cause a memory allocation error for the TMVA training. One can first skim the samples, selecting 20,000 events in each pt/eta bin for each flavour/vertex root file, to ensure that there are enough events in each pt/eta bin for the training. This is performed by **TrainingFilter.py**.
-**2)**
+**2)** Make the trees really flat without vectors and set variables that are not defined for a given vertex category to a default value. For this, run your ntuples through **createNewTree.py** which will produce sets of new flat ntuples split in event range such as //CombinedSVV2NoVertex_DUSG_0_249999.root// with the shared tree name //"tree"//.
+*For the training, one can either combine these ntuples with hadd or leave them as is for the rest of the processing.
+**3)** Produce the category normalization weights for the training sample with **Normalization_Weights.C** and save the output to a text such as //QCD_normweights.txt//. These will be added as a weight branch "weight_norm" which flattens the vertex category distribution for the training sample.
+**4)** Assuming the evaluation sample vertex category weights have been produced (look at procedures for Evaluation Samples), add the normalization and category weight branches to the flat ntuples with **addWeightBranch.py**. The combination of these weights will remove the training sample vertex category information and match it with that of the evaluation sample.
+**5)** Create 2D Pt/Eta Histograms for the weighted ntuples with **createEtaPtWeightHists.py** (make sure "weight_norm*weight_category" are set for the weight in Draw() for the histograms). There will be 12 histograms created, 9 for the individual flavour/category files and 3 combined histograms, one for each flavour.
+**6)** Make the final weighted ntuples making sure that the new Pt/Eta histogram files are pointed to in **addWeightBranch.py**. There should be six new branches created:\\
+-**weight_etaPt**   : the Pt/Eta weight, specific for a flavour/category file (for category dedicated training)\\
+-**weight_etaPtInc**: the Pt/Eta weight, inclusive for the flavour\\
+-**weight_category**: the category weight from the evaluation sample\\
+-**weight_norm**    : the normalization weight from the training sample\\
+-**weight_flavour** : the ratio of the flavour prevalences in the evaluation process\\
+-**weight**         : (weight_etaPtInc) x (weight_norm x weight_category) x (weight_flavour) //-- this can be used for combined trainings//\\

Physik-Institut

CMS Wiki Pages

Differences

Physik-Institut

CMS Wiki Pages

User Tools

Site Tools

Differences

Page Tools