Differences

This shows you the differences between two versions of the page.

--- mva:mva [2016/08/06 11:17] – created iwn
+++ mva:mva [2023/06/01 13:29] (current) – [Other information] iwn
@@ Line 1: / Line 1: @@
  ====== How to run TMVA ======
-TMVA is a tool to run a multivariate analysis on a root tree. It is included in ROOT from version release 5.34/11 on. More information can be found in:
+**TMVA** is a tool to run a multivariate analysis in ROOT. Among many other methods, it includes boosted decision tree (BDT) and neural network (MLP). More information can be found at:
   * [[http://tmva.sourceforge.net/|Main website]]
   * [[http://tmva.sourceforge.net/docu/TMVAUsersGuide.pdf|Manual]]
+  * [[https://root.cern.ch/doc/v606/namespaceTMVA.html|TMVA Class Reference]]
+  * [[https://root.cern.ch/doc/v606/classTMVA_1_1Factory.html|Factory Class Reference]]
+  * [[https://root.cern.ch/doc/v606/classTMVA_1_1Reader.html|Reader Class Reference]]
+  * [[https://root.cern.ch/doc/master/group__tutorial__tmva.html|Official examples (C++)]]
-When using the neural netwerk method MLP, you might need ROOT 34.0.0 or newer, to have larger buffer for the xml reader, for example:
+When using the neural netwerk method MLP, you might need ROOT 5.34.0.0 or newer, to have larger buffer for the xml reader, for example:
-<code>
+<code bash>
 . /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.26/x86_64-slc6-gcc48-opt/root/bin/thisroot.sh
 </code>
+ ===== Training =====
+To run the MVA in python on a signal tree ''treeS'' and background tree ''treeB'' with a list of variables, ''varNames'', that are available in the tree, you first need:
+<code python>
+from ROOT import TFile, TTree, TMVA, TCut
+f_out = TFile("MVA.root","RECREATE")
+TMVA.Tools.Instance()
+factory = TMVA.Factory( "TMVAClassification", f_out, "" )
+for name in varNames:
+  factory.AddVariable(name,'F')
+factory.AddSignalTree(treeS)
+factory.AddBackgroundTree(treeB)
+cut_S = TCut("")
+cut_B = TCut("")
+factory.PrepareTrainingAndTestTree( cut_S, cut_B, "" )
+</code>
+In the empty quote marks you can add cuts or options. More information on can be found in section 3.1 in the [[http://tmva.sourceforge.net/docu/TMVAUsersGuide.pdf|manual]] and in the [[https://root.cern.ch/doc/v606/classTMVA_1_1Factory.html|Factory Class Reference]].
+Then you can book multiple methods like the BDT and MLP for different parameters:
+<code python>
+factory.BookMethod( TMVA.Types.kBDT, "BDT", "!H:!V" )
+factory.BookMethod( TMVA.Types.kBDT, "BDTTuned",
+                    "!H:!V:NTrees=2000:MaxDepth=4:BoostType=AdaBoost"+\
+                    "AdaBoostBeta=0.1:SeparationType=GiniIndex:nCuts=80" )
+factory.BookMethod( TMVA.Types.kMLP, "MLPTanh",
+                    "!H:!V:LearningRate=0.01:NCycles=200:NeuronType=tanh"+\
+                    "VarTransform=N:HiddenLayers=N,N:UseRegulator" )
+</code>
+Finally train, test and evaluate all the booked methods:
+<code python>
+factory.TrainAllMethods()
+factory.TestAllMethods()
+factory.EvaluateAllMethods()
+f_out.Close()
+</code>
+ ===== Applying the output to a tree =====
+The factory will output weights in a XML file you can use to apply to a tree that contains the same variable.
+<code python>
+reader = TMVA.Reader()
+vars = [ ]
+for name in config.varNames:
+    vars.append(array('f',[0]))
+    reader.AddVariable(name,vars[-1])
+reader.BookMVA("TMVAClassification.weights.xml")
+    for i in range(len(config.varNames)):
+        tree.SetBranchAddress(config.varNames[i],vars[i])
+    for evt in range(tree.GetEntries()):
+        tree.GetEntry(evt)
+        hist.Fill( reader.EvaluateMVA(method) )
+</code>
  ===== Parameters to tune =====
-The parameters and options of the MVA method can be optimized from the default settings for better a performance, see the [[http://tmva.sourceforge.net/optionRef.html|reference page]].
+The parameters and options of the MVA method can be optimized from the default settings for better a performance, check out the [[http://tmva.sourceforge.net/optionRef.html|option reference page]].
-For the **BDT** important parameters are the the learning rate, number of boost steps and maximal tree depth:
+In particular, for the **BDT**, important parameters are the the learning rate, number of boost steps and maximal tree depth:
-  * ''AdaBoostBeta=0.5'': [[https://en.wikipedia.org/wiki/AdaBoost|learning rate]], smaller (~0.1) is better, but takes longer
+  * ''AdaBoostBeta=0.5'': [[https://en.wikipedia.org/wiki/AdaBoost|learning rate for AdaBoost]], smaller (~0.1) is better, but takes longer
   * ''nTrees=800'': number of boost steps, too large mainly costs time and can cause overtraining
   * ''MaxDepth=3'': maximum tree depth, ~2-5 depending on interaction of the variables
   * ''nCuts=20'': grid points in variable range to find the optimal cut in node splitting
-  * ''MinNodeSize=5%''
+  * ''SeparationType=GiniIndex'': separating criterion at each splitting node to select best variable. The [[https://en.wikipedia.org/wiki/Gini_coefficient|Gini index]] is one often used measure
+  * ''MinNodeSize=5%'': minimum percentage of training events required in a leaf node
 Important **MLP**  parameters to tune are the number of neurons on each hidden layer, learning rate and the activation function.
@@ Line 30: / Line 97: @@
      * ''N,N'' = two hidden layers
      * ''N+2,N'' = two hidden layers, with N+2 nodes in the first
   * ''LearningRate=0.02''
+  * ''NeuronType=sigmoid'': other neuron activation function is ''tanh''
+ ===== Tutorials and examples =====
+  * [[https://indico.cern.ch/event/395374/other-view?view=standard#20151109.detailed|TMVA tutorial from the 2015 Data Science Workshop on Indico]]
+  * [[https://aholzner.wordpress.com/2011/08/27/a-tmva-example-in-pyroot/|TMVA in PyRoot tutorial]]
+  * [[mva:mvaexample|This python example]] shows how you can run over multiple trees and different sets of variables. It also includes functions to apply the MVA output to trees and make background rejection vs. signal efficiency plots and correlation plots.
+  * [[https://root.cern/doc/master/group__tutorial__tmva.html|ROOT official TMVA tutorials]]
- ===== Tutorial and examples =====
-  * [[https://indico.cern.ch/event/395374/other-view?view=standard#20151109.detailed|TMVA tutorial on Indico]]
-  * [[https://aholzner.wordpress.com/2011/08/27/a-tmva-example-in-pyroot/|PyRoot example]]
+ ===== Other information =====
+  * The working principles of a [[https://www.physik.uzh.ch/~grazzini/teaching/higgsnotes/lecture6.pdf|neural network]] and a [[https://www.physik.uzh.ch/~grazzini/teaching/higgsnotes/lecture10.pdf|BDT]] are visually explained in the UZH's [[https://www.physik.uzh.ch/~grazzini/teaching/higgs.html|Higgs Physics course]] by Mauro Donegà.
+  * [[https://arogozhnikov.github.io/2016/07/05/gradient_boosting_playground.html|Gradient Boosting Interactive Playground]] with interactive visuals
+  * TMVA BibTex reference:
+<code latex>
+@article{TMVA,
+  title         = {TMVA: Toolkit for Multivariate Data Analysis},
+  author        = {Hoecker, Andreas and Speckmayer, Peter and
+                   Stelzer, Joerg and Therhaag, Jan and
+                   von Toerne, Eckhard and Voss, Helge},
+  journal       = {PoS},
+  volume        = {ACAT},
+  year          = {2007},
+  month         = {Mar},
+  pages         = {040},
+  url           = {http://inspirehep.net/record/746087/},
+  reportNumber  = {CERN-OPEN-2007-007},
+  eprint        = {physics/0703039},
+  archivePrefix = {arXiv},
+  primaryClass  = {physics},
+  SLACcitation  = {%%CITATION = arXiv:physics/0703039;%%},
+}
+</code>

Physik-Institut

CMS Wiki Pages

Differences

Physik-Institut

CMS Wiki Pages

User Tools

Site Tools

Differences

Page Tools