# TMVA  Reader Example 

#### Example of applying the trained classified of TMVA on a data set and evaluate the classifier

The way that TMVA applies a trained classifier function is through us of weights stored in xml format. These can be read and applied to any other dataset.


author: [Lorenzo Moneta](https://github.com/lmoneta/tmva-tutorial)

In [1]:
import ROOT
from ROOT import TMVA

Welcome to JupyROOT 6.15/01


In [2]:
##%jsroot on

In [3]:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"

###  Give input file 


In [4]:
inputFile = ROOT.TFile("Higgs_data.root")

In [5]:
inputFile.ls()
#inputFile.sig_tree.Print()

TFile**		Higgs_data.root	
 TFile*		Higgs_data.root	
  KEY: TTree	sig_tree;1	Signal Tree
  KEY: TTree	bkg_tree;1	Background Tree


### Declare Factory and Data Loader

In [6]:
TMVA.Tools.Instance()

outputFile = ROOT.TFile.Open("Higgs_CrossValidationOutput.root", "RECREATE")
        
loader = ROOT.TMVA.DataLoader("dataset_cv")
## Define input variables 

loader.AddVariable("m_jj")
loader.AddVariable("m_jjj")
loader.AddVariable("m_lv")
loader.AddVariable("m_jlv")
loader.AddVariable("m_bb")
loader.AddVariable("m_wbb")
loader.AddVariable("m_wwbb")

In [7]:
signalTree     = inputFile.Get("sig_tree")
backgroundTree = inputFile.Get("bkg_tree")



### global event weights per tree (see below for setting event-wise weights)
signalWeight     = 1.0
backgroundWeight = 1.0
   
### You can add an arbitrary number of signal or background trees
loader.AddSignalTree    ( signalTree,     signalWeight     )
loader.AddBackgroundTree( backgroundTree, backgroundWeight )


## Apply additional cuts on the signal and background samples (can be different)
mycuts = ROOT.TCut("")   ## for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
mycutb = ROOT.TCut("")   ## for example: TCut mycutb = "abs(var1)<0.5";


loader.PrepareTrainingAndTestTree( mycuts, mycutb,
                                  "nTrain_Signal=7000:nTrain_Background=7000:SplitMode=Random:"
                                   "NormMode=NumEvents:!V" )

DataSetInfo              : [dataset_cv] : Added class "Signal"
                         : Add Tree sig_tree of type Signal with 10000 events
DataSetInfo              : [dataset_cv] : Added class "Background"
                         : Add Tree bkg_tree of type Background with 10000 events


## Run Cross Validation
Define first a string that is used to specify the options for the cross validation. This functionality is new and it is available only in the current ROOT master (6.13.03)

In [8]:
cvOptions = "!V:!Silent:ModelPersistence:AnalysisType=Classification:NumFolds=5:SplitExpr="""
cv = ROOT.TMVA.CrossValidation("TMVACrossValidation",loader,outputFile,cvOptions)

### Book methods that will be used for cross validation

In [9]:
cv.BookMethod(TMVA.Types.kBDT, "BDT","!V:NTrees=200:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );

# Multi-Layer Perceptron (Neural Network)
#factory.BookMethod(loader, TMVA.Types.kMLP, "MLP",
#                   "!H:!V:NeuronType=tanh:VarTransform=N:NCycles=100:HiddenLayers=N+5:TestRate=5:!UseRegulator" );

### Perform the Cross Validation: Train/Test the booked methods

In [10]:
cv.Evaluate()

                         : Evaluate method: BDT
<HEADER> Factory                  : Booking method: BDT_fold1
                         : 
<HEADER> BDT_fold1                : #events: (reweighted) sig: 5600 bkg: 5600
                         : #events: (unweighted) sig: 5641 bkg: 5559
                         : Training 200 Decision Trees ... patience please
                         : Elapsed time for training with 11200 events: 0.655 sec         
<HEADER> BDT_fold1                : [dataset_cv] : Evaluation of BDT_fold1 on training sample (11200 events)
                         : Elapsed time for evaluation of 11200 events: 0.0735 sec       
                         : Creating xml weight file: dataset_cv/weights/TMVACrossValidation_BDT_fold1.weights.xml
                         : Creating standalone class: dataset_cv/weights/TMVACrossValidation_BDT_fold1.class.C
<HEADER> Factory                  : Test all methods
<HEADER> Factory                  : Test method: BDT_fold1 for Classific

## Cross Validation Result

In [11]:
result = cv.GetResults()[0]
result.Print()

<HEADER> CrossValidation          :  ==== Results ====
                         : Fold  0 ROC-Int : 0.7496
                         : Fold  1 ROC-Int : 0.7619
                         : Fold  2 ROC-Int : 0.7785
                         : Fold  3 ROC-Int : 0.7612
                         : Fold  4 ROC-Int : 0.7564
                         : ------------------------
                         : Average ROC-Int : 0.7615
                         : Std-Dev ROC-Int : 0.0107


## Plot ROC Curve
We enable JavaScript visualisation for the plots

In [12]:
%jsroot on

In [13]:
c = ROOT.TCanvas()
result.GetROCCurves().Draw("AL")
c.BuildLegend()
c.Draw()

In [17]:
import math
print "Average ROC Integral = {} +/- {}".format(result.GetROCAverage(),
                                                result.GetROCStandardDeviation()/math.sqrt(cv.GetNumFolds()))

Average ROC Integral = 0.761524438858 +/- 0.00477527421981


In [18]:
# close outputfile to save output file
outputFile.Close()