====== Limits with the Higgs Combined Tool ======

The **Combined Tool** by the [[https://twiki.cern.ch/twiki/bin/view/CMS/HiggsWG/HiggsCombination|Higgs Combination Group]] provides the ''combine'' command which runs different statistical methods with RooStats for some input, e.g. a text file (datacard) containing information on the systematic uncertainties, data and processes, [[limits:limits#Simple example: Running the combine tool|like the example below]]. It is extensively discussed here:
  * TWiki documentation: https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit
  * GitBooks documentation: https://cms-hcomb.gitbooks.io/combine/content/
  * Hypernews forum: https://hypernews.cern.ch/HyperNews/CMS/get/higgs-combination

[[https://indico.cern.ch/event/577649/#day-2016-11-30|These]] are useful **tutorials** on the Combine Tool from a 2016 CERN workshop (slides plus video). It may be interesting to do the exercises on [[https://indico.cern.ch/event/577649/sessions/212214/attachments/1378209/2093750/combine_SWAN_instructions.pdf|SWAN]]. Otherwise, take a look at [[https://github.com/nucleosynthesis/HiggsAnalysis-CombinedLimit/blob/combine_tutorial_SWAN/combine_tutorials_2016/combine_intro/python_CombineIntro.ipynb|these examples on GitHub]].

Many other **examples** can be found on [[https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookExercisesCMSDataAnalysisSchool|this TWiki page for the CMSDAS school]].

Description of the statistical methods at CMS:
  * CMS Collaboration (2012), "Observation of a new boson with a mass near 125 GeV", [[http://cds.cern.ch/record/1460438/files/HIG-12-020-pas.pdf|CMS-PAS-HIG-2012/020]].
  * Cowan et. al (2013), "Asymptotic formulae for likelihood-based tests of new physics",[[https://arxiv.org/abs/1007.1727|arXiv:1007.1727]].
  * [[https://indico.cern.ch/event/173726/|This four-part lecture]] by Glen Cowan covers a more **theoretical overview** of statistical methods used in Particle Physics.

**CombineHarvester** is a tool to manage datacards: [[https://github.com/cms-analysis/CombineHarvester|its GitHub repository]] and [[http://cms-analysis.github.io/CombineHarvester/index.html|the full documentation]].
 ===== Quick setup =====

Latest instructions to get ''combine'' can be found [[https://cms-hcomb.gitbooks.io/combine/content/part1/#for-end-users-that-dont-need-to-commit-or-do-any-development|in the manual]] and [[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#ROOT606_SLC6_CMSSW_8_1_X|on the TWiki]]. In summary: Get **CMSSW** for the Combined Tool:

<code bash>
export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_8_1_0
cd CMSSW_8_1_0/src 
cmsenv
</code>

(''combine'' in ''CMSSW_7_4_7'' is outdated.) Get **Combined Tool**:

<code bash>
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
cd HiggsAnalysis/CombinedLimit
git fetch origin
git checkout v7.0.8
scramv1 b clean; scramv1 b
</code>

With every new session in PSI Tier3, set up the environment with

<code bash>
source $VO_CMS_SW_DIR/cmsset_default.sh
export SCRAM_ARCH=slc6_amd64_gcc530
cd ~/phase2/CMSSW_8_0_1/src/
cmsenv
</code>
 ===== Simple example: Running the combine tool =====

To run the tool you need to make a datacard containing information on the nuisance parameters of the signal and background processes. It will look like this example from the [[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#Tutorials|official tutorial]]:

<file txt HHdatacard.txt>

# Simple counting experiment, with one signal and a few background processes 
# Simplified version H->WW analysis for mH = 160 GeV
imax 1  number of channels
jmax 3  number of backgrounds
kmax 4  number of nuisance parameters (sources of systematical uncertainties)
--------------------------------------------------------------------------------------
# one channel, 0 observed events
bin 1
observation 0
--------------------------------------------------------------------------------------
bin              1     1     1     1
process         ggH  qqWW  ggWW  others
process          0     1     2     3
rate           1.47  0.63  0.06  0.22
--------------------------------------------------------------------------------------
lumi      lnN  1.11    -   1.11    -    luminosity; lnN = lognormal
xs_ggH    lnN  1.16    -     -     -    gg->H cross section + signal efficiency
xs_ggWW   lnN    -     -   1.50    -    gg->WW cross section
bg_others lnN    -     -     -   1.30   30% uncertainty on the rest of the backgrounds

</file>

To get the **expected 95% upper limit on the signal strength with the asymptotic CLs method** run

<code bash>
combine -M Asymptotic -m 160 -n .HH HHdatacard.txt
</code>

which will prompt the upper limits ''r'' and output a ROOT file ''higgsCombine.HH.Asymptotic.mH160.root'' with all the results. 50.0% stands for the expected median, 16%-84% for the +/- one standard deviation 𝜎 and 2.5-97.5% for two. The option ''-m <M>'' is only used as a label of the particle mass ''M'' in the tree and file naming as you might vary this parameter at a later stage. Option ''-n <name>'' allows for a custom output name.

 ===== Creating datacards =====

If you have many different processes, channels and nuisance parameters, it can be hard to keep track of the datacards manually. **CombineHarvester** is a tool that allows you to quickly create one or more datacards with one python or C++ script. For instructions and examples, please take a look at the [[https://github.com/cms-analysis/CombineHarvester|GitHub repository]] and the [[http://cms-analysis.github.io/CombineHarvester/index.html|full documentation]].

In the [[limits:limits#Simple example: Running the combine tool| simple example above]] all nuisance parameters are assumed to be log-normally distributed ''lnN''. The [[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#How_to_prepare_the_datacard|tutorial]] lists other options, like the Gamma (''gmN'') and log-uniform (''lnU'') distributions, and the motivations for them.
Some systematic uncertainties however, may influence the final selections and variable shapes. Say you are using a tau object to calculate the final discriminating variable like the invariant mass. Then the analysis needs to be run again with an one standard deviation up and down shift of the tau energie scale (TES), propagated to this variable. These **shapes** (histograms), then, can be in included in the datacard as a nuissance parameter by using ''shape'' as distribution. The path to the ROOT file containing the histograms of each process plus the histograms where the relevant shift has been applied, needs to be included in the datacard.
Shapes can also be **probability density functions** (PDFs), which can be parametrized and is stored as a ''RooAbsPdf'' in a ''RooWorkspace'' in a ROOT file.
Read more [[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#Datacard_for_Shape_analyses|here]].

Note there are some general datacard guidelines and naming schemes you should follow, found [[https://twiki.cern.ch/twiki/bin/view/CMS/HiggsWG/HiggsPAGPreapprovalChecks|here]].
 ===== Other methods and options =====

The combine tool has many more [[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#How_to_run_the_tool|methods]] and options (type ''%%combine --help%%'').
If you are interested in the **expected significance**, use ''[[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#Maximum_likelihood_fits_and_diag|MaxLikelihoodFit]]'' (''FitDiagnositcs'' in ''CMSSW_8_1_0''). This can method also can be used to extract the **observed cross section**.
If you are interested in only an //approximation// of the **expected number of signal events** (**yield**) with the maximum likelihood fit, use ''[[https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideHiggsAnalysisCombinedLimit#Computing_the_observed_limit_wit|ProfileLikelihood]]''. For the **upper limits** however, you always should use ''Asymptotic'' instead.

For analysis with blinded data, you can use the option ''-t -1'', which will create a **Asimov dataset** with the background Monte Carlo samples. Otherwise you can simulate the dataset with **toys** using ''-t <N>'' for ''N'' toys.

 ===== Analysis of the systematic uncertainties =====

[[https://indico.cern.ch/event/577649/contributions/2388797/attachments/1380376/2098158/HComb-Tutorial-Nov16-Impacts.pdf|These slides]] of the aforementioned workshop explain how to separate the **systematical** from the **statistical uncertainties**, and how to analyse the **impact** and **pull** of each parameter. It makes use of the ''combine'' method ''MultiDimFit'' and [[https://github.com/cms-analysis/CombineHarvester/tree/master/CombineTools/scripts|scripts]] provided by the ''CombineHarvester'' tool.


 ===== Brazilian plot =====

To make a Brazilian plot, you can make a simple python script that automatically creates the datacards for different values of some parameter like the invariant mass. [[limits:brazilianplotexample|This example]] shows scan for the background uncertainty. Guidelines for a CMS-style limit plot can be found [[https://ghm.web.cern.ch/ghm/plots/|here]] and [[https://twiki.cern.ch/twiki/bin/view/CMS/Internal/FigGuidelines#Convention_for_Brazilian_flag_pl|here]].
{{ limits:upperlimit_brazilianplot.png ? 400 }}

 ===== Other =====

  * BibTeX reference to asymptotic CLs method:

<code latex>
@article{CLs,
  author        = {Cowan, Glen and Cranmer, Kyle and Gross, Eilam and Vitells, Ofer},
  title         = {Asymptotic formulae for likelihood-based tests of new physics},
  journal       = {The European Physical Journal C},
  volume        = {71},
  number        = {2},
  issn          = {1434-6052},
  year          = {2011},
  month         = {Feb},
  doi           = {10.1140/epjc/s10052-011-1554-0},
  eprint        = {1007.1727},
  archivePrefix = {arXiv},
  primaryClass  = {hep-ph},
  SLACcitation  = {%%CITATION = arXiv:hep-ph/1007.1727;%%},
}
</code>