How to run TMVA

TMVA is a tool to run a multivariate analysis in ROOT. Among many other methods, it includes boosted decision tree (BDT) and neural network (MLP). More information can be found at:

When using the neural netwerk method MLP, you might need ROOT 5.34.0.0 or newer, to have larger buffer for the xml reader, for example:

. /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.26/x86_64-slc6-gcc48-opt/root/bin/thisroot.sh

Training

To run the MVA in python on a signal tree treeS and background tree treeB with a list of variables, varNames, that are available in the tree, you first need:

from ROOT import TFile, TTree, TMVA, TCut
f_out = TFile("MVA.root","RECREATE")
TMVA.Tools.Instance()
factory = TMVA.Factory( "TMVAClassification", f_out, "" )
for name in varNames:
  factory.AddVariable(name,'F')
factory.AddSignalTree(treeS)
factory.AddBackgroundTree(treeB)
cut_S = TCut("")
cut_B = TCut("")
factory.PrepareTrainingAndTestTree( cut_S, cut_B, "" )

In the empty quote marks you can add cuts or options. More information on can be found in section 3.1 in the manual and in the Factory Class Reference.

Then you can book multiple methods like the BDT and MLP for different parameters:

factory.BookMethod( TMVA.Types.kBDT, "BDT", "!H:!V" )
factory.BookMethod( TMVA.Types.kBDT, "BDTTuned",
                    "!H:!V:NTrees=2000:MaxDepth=4:BoostType=AdaBoost"+\
                    "AdaBoostBeta=0.1:SeparationType=GiniIndex:nCuts=80" )
factory.BookMethod( TMVA.Types.kMLP, "MLPTanh",
                    "!H:!V:LearningRate=0.01:NCycles=200:NeuronType=tanh"+\
                    "VarTransform=N:HiddenLayers=N,N:UseRegulator" )

Finally train, test and evaluate all the booked methods:

factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()
f_out.Close()

Applying the output to a tree

The factory will output weights in a XML file you can use to apply to a tree that contains the same variable.

reader = TMVA.Reader()
vars = [ ]
for name in config.varNames:
    vars.append(array('f',[0]))
    reader.AddVariable(name,vars[-1])
reader.BookMVA("TMVAClassification.weights.xml")
    for i in range(len(config.varNames)):
        tree.SetBranchAddress(config.varNames[i],vars[i])
    for evt in range(tree.GetEntries()):
        tree.GetEntry(evt)
        hist.Fill( reader.EvaluateMVA(method) )

Parameters to tune

The parameters and options of the MVA method can be optimized from the default settings for better a performance, check out the option reference page.

In particular, for the BDT, important parameters are the the learning rate, number of boost steps and maximal tree depth:

AdaBoostBeta=0.5: learning rate for AdaBoost, smaller (~0.1) is better, but takes longer
nTrees=800: number of boost steps, too large mainly costs time and can cause overtraining
MaxDepth=3: maximum tree depth, ~2-5 depending on interaction of the variables
nCuts=20: grid points in variable range to find the optimal cut in node splitting
SeparationType=GiniIndex: separating criterion at each splitting node to select best variable. The Gini index is one often used measure
MinNodeSize=5%: minimum percentage of training events required in a leaf node

Important MLP parameters to tune are the number of neurons on each hidden layer, learning rate and the activation function.

HiddenLayers=N,N-1: number of nodes in each hidden layer for N variables
- N = one hidden layer with N nodes
- N,N = two hidden layers
- N+2,N = two hidden layers, with N+2 nodes in the first
LearningRate=0.02
NeuronType=sigmoid: other neuron activation function is tanh

Tutorials and examples

TMVA tutorial from the 2015 Data Science Workshop on Indico
TMVA in PyRoot tutorial
This python example shows how you can run over multiple trees and different sets of variables. It also includes functions to apply the MVA output to trees and make background rejection vs. signal efficiency plots and correlation plots.
ROOT official TMVA tutorials

Other information

The working principles of a neural network and a BDT are visually explained in the UZH's Higgs Physics course by Mauro Donegà.
Gradient Boosting Interactive Playground with interactive visuals
TMVA BibTex reference:

@article{TMVA, 
  title         = {TMVA: Toolkit for Multivariate Data Analysis},
  author        = {Hoecker, Andreas and Speckmayer, Peter and
                   Stelzer, Joerg and Therhaag, Jan and
                   von Toerne, Eckhard and Voss, Helge},
  journal       = {PoS},
  volume        = {ACAT},
  year          = {2007},
  month         = {Mar},
  pages         = {040},
  url           = {http://inspirehep.net/record/746087/},
  reportNumber  = {CERN-OPEN-2007-007},
  eprint        = {physics/0703039},
  archivePrefix = {arXiv},
  primaryClass  = {physics},
  SLACcitation  = {%%CITATION = arXiv:physics/0703039;%%},
}

Physik-Institut

CMS Wiki Pages

Table of Contents

How to run TMVA

Training

Applying the output to a tree

Parameters to tune

Tutorials and examples

Other information

Physik-Institut

CMS Wiki Pages

User Tools

Site Tools

Table of Contents

How to run TMVA

Training

Applying the output to a tree

Parameters to tune

Tutorials and examples

Other information

Page Tools