In the process of going from using the standard CSV algorithm on jet substructure to a dedicated boosted jet b-tagging algorithm, the current plan is to perform a training using fat H→bb jet containing two reconstructed secondary vertices.
We will start of by performing a dedicated training separating jets containing one b quark from jets containing two. To do this we create 2 flat trees using fat jets (starting with CA8, then perhaps moving to AK8); one containing 1 RecoVertex matched to 1 true B hadron and one containing 2 RecoVertices matched to 2 true B hadrons (RecoVertex_B and RecoVertex_BB). We will then perform a dedicated training to try to distinguish b from bb jets.
The obvious next step will then be to look to the cases where you have not necessarily have two RecoVertices. This implies performing the training in 5 different vertex categories:
Petra Van Mulders + group are currently working on creating these different categories.
You then have the question of weather or not to perform dedicated BB vs UDSG, BB vs CC, BB vs CB etc. training as well. This requires in the end several different training categories. One might also consider a 2D discriminant where the user can choose his required efficiency vs purity cut in 2D space; here b vs light will be on the x axis and b vs bb on the y axis.
Steps for the future:
A Combined Secondary Vertex Based B-Tagging Algorithm in CMS
Algorithms for b Jet Identification in CMS
Performance of b tagging at sqrt s=8 TeV in multijet, tt and boosted topology events
Performance Measurement of b-tagging Algorithms Using Data containing Muons within Jets
Implementation and training of the Combined Secondary Vertex MVA b-tagging algorithm in CMSSW