Logo ROOT   6.10/02
Reference Guide
TMVAClassificationCategory.C File Reference

Detailed Description

This macro provides examples for the training and testing of the TMVA classifiers in categorisation mode.

As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables with category (eta) dependent properties.

For this example, only Fisher and Likelihood are used. Run via:

root -l TMVAClassificationCategory.C

The output file "TMVA.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro.

Processing /builddir/build/BUILD/root-6.10.02/tutorials/tmva/TMVAClassificationCategory.C...
==> Start TMVAClassificationCategory
--- TMVAClassificationCategory: Accessing /builddir/build/BUILD/root-6.10.02/tutorials/tmva/data/toy_sigbkg_categ_offset.root
<HEADER> DataSetInfo : [dataset] : Added class "Signal"
: Add Tree TreeS of type Signal with 10000 events
<HEADER> DataSetInfo : [dataset] : Added class "Background"
: Add Tree TreeB of type Background with 10000 events
<HEADER> Factory : Booking method: Fisher
:
<HEADER> Factory : Booking method: Likelihood
:
<HEADER> Factory : Booking method: FisherCat
:
: Adding sub-classifier: Fisher::Category_Fisher_1
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Background"
: Adding sub-classifier: Fisher::Category_Fisher_2
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Background"
<HEADER> Factory : Booking method: LikelihoodCat
:
: Adding sub-classifier: Likelihood::Category_Likelihood_1
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Background"
: Adding sub-classifier: Likelihood::Category_Likelihood_2
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Background"
<HEADER> Factory : Train all methods
<HEADER> DataSetFactory : [dataset] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 5000
: Signal -- testing events : 5000
: Signal -- training and testing events: 10000
: Background -- training events : 5000
: Background -- testing events : 5000
: Background -- training and testing events: 10000
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 +0.371 +0.379 +0.384
: var2: +0.371 +1.000 +0.376 +0.391
: var3: +0.379 +0.376 +1.000 +0.385
: var4: +0.384 +0.391 +0.385 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 +0.372 +0.378 +0.382
: var2: +0.372 +1.000 +0.382 +0.394
: var3: +0.378 +0.382 +1.000 +0.381
: var4: +0.382 +0.394 +0.381 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [dataset] :
:
<HEADER> Factory : Train method: Fisher for Classification
:
<HEADER> Fisher : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: -0.059
: var2: -0.006
: var3: +0.096
: var4: +0.219
: (offset): -0.024
: -----------------------
: Elapsed time for training with 10000 events: 0.0104 sec
<HEADER> Fisher : [dataset] : Evaluation of Fisher on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00404 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_Fisher.weights.xml
: Creating standalone class: dataset/weights/TMVAClassificationCategory_Fisher.class.C
<HEADER> Factory : Training finished
:
<HEADER> Factory : Train method: Likelihood for Classification
:
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 10000 events: 0.103 sec
<HEADER> Likelihood : [dataset] : Evaluation of Likelihood on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0385 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_Likelihood.weights.xml
: Creating standalone class: dataset/weights/TMVAClassificationCategory_Likelihood.class.C
: TMVA.root:/dataset/Method_Likelihood/Likelihood
<HEADER> Factory : Training finished
:
<HEADER> Factory : Train method: FisherCat for Classification
:
: Train all sub-classifiers for Classification ...
<HEADER> DataSetFactory : [Category_Fisher_1_dsi] : Number of events in input trees
: Dataset[Category_Fisher_1_dsi] : Signal requirement: "abs(eta)<=1.3"
: Dataset[Category_Fisher_1_dsi] : Signal -- number of events passed: 5123 / sum of weights: 5123
: Dataset[Category_Fisher_1_dsi] : Signal -- efficiency : 0.5123
: Dataset[Category_Fisher_1_dsi] : Background requirement: "abs(eta)<=1.3"
: Dataset[Category_Fisher_1_dsi] : Background -- number of events passed: 5134 / sum of weights: 5134
: Dataset[Category_Fisher_1_dsi] : Background -- efficiency : 0.5134
: Dataset[Category_Fisher_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Fisher_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2561
: Signal -- testing events : 2561
: Signal -- training and testing events: 5122
: Dataset[Category_Fisher_1_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5123
: Background -- training events : 2567
: Background -- testing events : 2567
: Background -- training and testing events: 5134
: Dataset[Category_Fisher_1_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5134
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.023 +0.001 +0.009
: var2: -0.023 +1.000 +0.007 +0.014
: var3: +0.001 +0.007 +1.000 -0.007
: var4: +0.009 +0.014 -0.007 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.029 -0.015 +0.019
: var2: -0.029 +1.000 +0.005 +0.003
: var3: -0.015 +0.005 +1.000 -0.019
: var4: +0.019 +0.003 -0.019 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Fisher_1_dsi] :
:
: Train method: Category_Fisher_1 for Classification
<HEADER> Category_Fisher_1 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +0.096
: var2: +0.135
: var3: +0.237
: var4: +0.382
: (offset): +0.626
: -----------------------
: Elapsed time for training with 5128 events: 0.00509 sec
<HEADER> Category_Fisher_1 : [Category_Fisher_1_dsi] : Evaluation of Category_Fisher_1 on training sample (5128 events)
: Elapsed time for evaluation of 5128 events: 0.00214 sec
: Training finished
<HEADER> DataSetFactory : [Category_Fisher_2_dsi] : Number of events in input trees
: Dataset[Category_Fisher_2_dsi] : Signal requirement: "abs(eta)>1.3"
: Dataset[Category_Fisher_2_dsi] : Signal -- number of events passed: 4877 / sum of weights: 4877
: Dataset[Category_Fisher_2_dsi] : Signal -- efficiency : 0.4877
: Dataset[Category_Fisher_2_dsi] : Background requirement: "abs(eta)>1.3"
: Dataset[Category_Fisher_2_dsi] : Background -- number of events passed: 4866 / sum of weights: 4866
: Dataset[Category_Fisher_2_dsi] : Background -- efficiency : 0.4866
: Dataset[Category_Fisher_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Fisher_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2438
: Signal -- testing events : 2438
: Signal -- training and testing events: 4876
: Dataset[Category_Fisher_2_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4877
: Background -- training events : 2433
: Background -- testing events : 2433
: Background -- training and testing events: 4866
: Dataset[Category_Fisher_2_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4866
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.021 -0.010 +0.011
: var2: -0.021 +1.000 +0.043 +0.001
: var3: -0.010 +0.043 +1.000 -0.003
: var4: +0.011 +0.001 -0.003 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.026 +0.006 +0.016
: var2: -0.026 +1.000 +0.004 +0.044
: var3: +0.006 +0.004 +1.000 -0.027
: var4: +0.016 +0.044 -0.027 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Fisher_2_dsi] :
:
: Train method: Category_Fisher_2 for Classification
<HEADER> Category_Fisher_2 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +0.107
: var2: +0.125
: var3: +0.249
: var4: +0.375
: (offset): -0.733
: -----------------------
: Elapsed time for training with 4871 events: 0.00479 sec
<HEADER> Category_Fisher_2 : [Category_Fisher_2_dsi] : Evaluation of Category_Fisher_2 on training sample (4871 events)
: Elapsed time for evaluation of 4871 events: 0.00197 sec
: Training finished
: Begin ranking of input variables...
<HEADER> Category_Fisher_1 : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 2.224e-01
: 2 : var3 : 9.802e-02
: 3 : var2 : 3.679e-02
: 4 : var1 : 1.825e-02
: -------------------------------
<HEADER> Category_Fisher_2 : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 2.177e-01
: 2 : var3 : 1.102e-01
: 3 : var2 : 3.583e-02
: 4 : var1 : 2.281e-02
: -------------------------------
: Elapsed time for training with 10000 events: 0.15 sec
<HEADER> FisherCat : [dataset] : Evaluation of FisherCat on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0131 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_FisherCat.weights.xml
<HEADER> Factory : Training finished
:
<HEADER> Factory : Train method: LikelihoodCat for Classification
:
: Train all sub-classifiers for Classification ...
<HEADER> DataSetFactory : [Category_Likelihood_1_dsi] : Number of events in input trees
: Dataset[Category_Likelihood_1_dsi] : Signal requirement: "abs(eta)<=1.3"
: Dataset[Category_Likelihood_1_dsi] : Signal -- number of events passed: 5123 / sum of weights: 5123
: Dataset[Category_Likelihood_1_dsi] : Signal -- efficiency : 0.5123
: Dataset[Category_Likelihood_1_dsi] : Background requirement: "abs(eta)<=1.3"
: Dataset[Category_Likelihood_1_dsi] : Background -- number of events passed: 5134 / sum of weights: 5134
: Dataset[Category_Likelihood_1_dsi] : Background -- efficiency : 0.5134
: Dataset[Category_Likelihood_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Likelihood_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2561
: Signal -- testing events : 2561
: Signal -- training and testing events: 5122
: Dataset[Category_Likelihood_1_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5123
: Background -- training events : 2567
: Background -- testing events : 2567
: Background -- training and testing events: 5134
: Dataset[Category_Likelihood_1_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5134
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.023 +0.001 +0.009
: var2: -0.023 +1.000 +0.007 +0.014
: var3: +0.001 +0.007 +1.000 -0.007
: var4: +0.009 +0.014 -0.007 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.029 -0.015 +0.019
: var2: -0.029 +1.000 +0.005 +0.003
: var3: -0.015 +0.005 +1.000 -0.019
: var4: +0.019 +0.003 -0.019 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Likelihood_1_dsi] :
:
: Train method: Category_Likelihood_1 for Classification
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 5128 events: 0.0635 sec
<HEADER> Category_Likelihood_1 : [Category_Likelihood_1_dsi] : Evaluation of Category_Likelihood_1 on training sample (5128 events)
: Elapsed time for evaluation of 5128 events: 0.0202 sec
: TMVA.root:/dataset/Method_LikelihoodCat/LikelihoodCat/Method_Likelihood/Category_Likelihood_1
: Training finished
<HEADER> DataSetFactory : [Category_Likelihood_2_dsi] : Number of events in input trees
: Dataset[Category_Likelihood_2_dsi] : Signal requirement: "abs(eta)>1.3"
: Dataset[Category_Likelihood_2_dsi] : Signal -- number of events passed: 4877 / sum of weights: 4877
: Dataset[Category_Likelihood_2_dsi] : Signal -- efficiency : 0.4877
: Dataset[Category_Likelihood_2_dsi] : Background requirement: "abs(eta)>1.3"
: Dataset[Category_Likelihood_2_dsi] : Background -- number of events passed: 4866 / sum of weights: 4866
: Dataset[Category_Likelihood_2_dsi] : Background -- efficiency : 0.4866
: Dataset[Category_Likelihood_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Likelihood_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2438
: Signal -- testing events : 2438
: Signal -- training and testing events: 4876
: Dataset[Category_Likelihood_2_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4877
: Background -- training events : 2433
: Background -- testing events : 2433
: Background -- training and testing events: 4866
: Dataset[Category_Likelihood_2_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4866
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.021 -0.010 +0.011
: var2: -0.021 +1.000 +0.043 +0.001
: var3: -0.010 +0.043 +1.000 -0.003
: var4: +0.011 +0.001 -0.003 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.026 +0.006 +0.016
: var2: -0.026 +1.000 +0.004 +0.044
: var3: +0.006 +0.004 +1.000 -0.027
: var4: +0.016 +0.044 -0.027 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Likelihood_2_dsi] :
:
: Train method: Category_Likelihood_2 for Classification
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 4871 events: 0.0602 sec
<HEADER> Category_Likelihood_2 : [Category_Likelihood_2_dsi] : Evaluation of Category_Likelihood_2 on training sample (4871 events)
: Elapsed time for evaluation of 4871 events: 0.0197 sec
: TMVA.root:/dataset/Method_LikelihoodCat/LikelihoodCat/Method_Likelihood/Category_Likelihood_2
: Training finished
: Begin ranking of input variables...
<HEADER> Category_Likelihood_1 : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.349e-01
: 2 : var3 : 2.514e-02
: 3 : var1 : 9.827e-03
: 4 : var2 : 2.960e-03
: -----------------------------------
<HEADER> Category_Likelihood_2 : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.656e-01
: 2 : var3 : 8.002e-02
: 3 : var1 : 2.459e-02
: 4 : var2 : -3.762e-04
: -----------------------------------
: Elapsed time for training with 10000 events: 0.737 sec
<HEADER> LikelihoodCat : [dataset] : Evaluation of LikelihoodCat on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0476 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_LikelihoodCat.weights.xml
<HEADER> Factory : Training finished
:
: Ranking input variables (method specific)...
<HEADER> Fisher : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 1.488e-01
: 2 : var3 : 7.387e-02
: 3 : var2 : 2.853e-02
: 4 : var1 : 1.148e-02
: -------------------------------
<HEADER> Likelihood : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.108e-01
: 2 : var3 : 5.508e-02
: 3 : var2 : 3.017e-02
: 4 : var1 : 2.291e-02
: -----------------------------------
: No variable ranking supplied by classifier: FisherCat
: No variable ranking supplied by classifier: LikelihoodCat
<HEADER> Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Recreating sub-classifiers from XML-file
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Background"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Background"
: Recreating sub-classifiers from XML-file
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Background"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Background"
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: Fisher for Classification performance
:
<HEADER> Fisher : [dataset] : Evaluation of Fisher on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0048 sec
<HEADER> Factory : Test method: Likelihood for Classification performance
:
<HEADER> Likelihood : [dataset] : Evaluation of Likelihood on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0383 sec
<HEADER> Factory : Test method: FisherCat for Classification performance
:
<HEADER> FisherCat : [dataset] : Evaluation of FisherCat on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0114 sec
<HEADER> Factory : Test method: LikelihoodCat for Classification performance
:
<HEADER> LikelihoodCat : [dataset] : Evaluation of LikelihoodCat on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0462 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: Fisher
:
<HEADER> Fisher : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_Fisher : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: Likelihood
:
<HEADER> Likelihood : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_Likelihood : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: FisherCat
:
<HEADER> FisherCat : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_FisherCat : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: LikelihoodCat
:
<HEADER> LikelihoodCat : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_LikelihoodCat : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: dataset FisherCat : 0.914
: dataset LikelihoodCat : 0.912
: dataset Fisher : 0.803
: dataset Likelihood : 0.763
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset FisherCat : 0.340 (0.352) 0.741 (0.738) 0.920 (0.916)
: dataset LikelihoodCat : 0.332 (0.357) 0.740 (0.739) 0.918 (0.916)
: dataset Fisher : 0.172 (0.173) 0.475 (0.474) 0.729 (0.739)
: dataset Likelihood : 0.187 (0.220) 0.437 (0.441) 0.599 (0.607)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Dataset:dataset : Created tree 'TestTree' with 10000 events
:
<HEADER> Dataset:dataset : Created tree 'TrainTree' with 10000 events
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
==> Wrote root file: TMVA.root
==> TMVAClassificationCategory is done!
#include <cstdlib>
#include <iostream>
#include <map>
#include <string>
#include "TChain.h"
#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TObjString.h"
#include "TSystem.h"
#include "TROOT.h"
#include "TMVA/Factory.h"
#include "TMVA/Tools.h"
#include "TMVA/TMVAGui.h"
// two types of category methods are implemented
Bool_t UseOffsetMethod = kTRUE;
void TMVAClassificationCategory()
{
//---------------------------------------------------------------
// Example for usage of different event categories with classifiers
std::cout << std::endl << "==> Start TMVAClassificationCategory" << std::endl;
// This loads the library
bool batchMode = false;
// Create a new root output file.
TString outfileName( "TMVA.root" );
TFile* outputFile = TFile::Open( outfileName, "RECREATE" );
// Create the factory object (see TMVAClassification.C for more information)
std::string factoryOptions( "!V:!Silent:Transformations=I;D;P;G,D" );
if (batchMode) factoryOptions += ":!Color:!DrawProgressBar";
TMVA::Factory *factory = new TMVA::Factory( "TMVAClassificationCategory", outputFile, factoryOptions );
// Create DataLoader
TMVA::DataLoader *dataloader=new TMVA::DataLoader("dataset");
// Define the input variables used for the MVA training
dataloader->AddVariable( "var1", 'F' );
dataloader->AddVariable( "var2", 'F' );
dataloader->AddVariable( "var3", 'F' );
dataloader->AddVariable( "var4", 'F' );
// You can add so-called "Spectator variables", which are not used in the MVA training,
// but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
// input variables, the response values of all trained MVAs, and the spectator variables
dataloader->AddSpectator( "eta" );
// Load the signal and background event samples from ROOT trees
TFile *input(0);
TString fname = TString(gSystem->DirName(__FILE__) ) + "/data/";
if (gSystem->AccessPathName( fname + "toy_sigbkg_categ_offset.root")) {
// if directory data not found try using tutorials dir
fname = gROOT->GetTutorialDir() + "/tmva/data/";
}
if (UseOffsetMethod) fname += "toy_sigbkg_categ_offset.root";
else fname += "toy_sigbkg_categ_varoff.root";
if (!gSystem->AccessPathName( fname )) {
// first we try to find tmva_example.root in the local directory
std::cout << "--- TMVAClassificationCategory: Accessing " << fname << std::endl;
input = TFile::Open( fname );
}
if (!input) {
std::cout << "ERROR: could not open data file: " << fname << std::endl;
exit(1);
}
TTree *signalTree = (TTree*)input->Get("TreeS");
TTree *background = (TTree*)input->Get("TreeB");
// Global event weights per tree (see below for setting event-wise weights)
Double_t signalWeight = 1.0;
Double_t backgroundWeight = 1.0;
// You can add an arbitrary number of signal or background trees
dataloader->AddSignalTree ( signalTree, signalWeight );
dataloader->AddBackgroundTree( background, backgroundWeight );
// Apply additional cuts on the signal and background samples (can be different)
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = ""; // for example: TCut mycutb = "abs(var1)<0.5";
// Tell the factory how to use the training and testing events
dataloader->PrepareTrainingAndTestTree( mycuts, mycutb,
"nTrain_Signal=0:nTrain_Background=0:SplitMode=Random:NormMode=NumEvents:!V" );
// Book MVA methods
// Fisher discriminant
factory->BookMethod( dataloader, TMVA::Types::kFisher, "Fisher", "!H:!V:Fisher" );
// Likelihood
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "Likelihood",
"!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
// Categorised classifier
// The variable sets
TString theCat1Vars = "var1:var2:var3:var4";
TString theCat2Vars = (UseOffsetMethod ? "var1:var2:var3:var4" : "var1:var2:var3");
// Fisher with categories
TMVA::MethodBase* fiCat = factory->BookMethod( dataloader, TMVA::Types::kCategory, "FisherCat","" );
mcat = dynamic_cast<TMVA::MethodCategory*>(fiCat);
mcat->AddMethod( "abs(eta)<=1.3", theCat1Vars, TMVA::Types::kFisher, "Category_Fisher_1","!H:!V:Fisher" );
mcat->AddMethod( "abs(eta)>1.3", theCat2Vars, TMVA::Types::kFisher, "Category_Fisher_2","!H:!V:Fisher" );
// Likelihood with categories
TMVA::MethodBase* liCat = factory->BookMethod( dataloader, TMVA::Types::kCategory, "LikelihoodCat","" );
mcat = dynamic_cast<TMVA::MethodCategory*>(liCat);
mcat->AddMethod( "abs(eta)<=1.3",theCat1Vars, TMVA::Types::kLikelihood,
"Category_Likelihood_1","!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
mcat->AddMethod( "abs(eta)>1.3", theCat2Vars, TMVA::Types::kLikelihood,
"Category_Likelihood_2","!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
// Now you can tell the factory to train, test, and evaluate the MVAs
// Train MVAs using the set of training events
factory->TrainAllMethods();
// Evaluate all MVAs using the set of test events
factory->TestAllMethods();
// Evaluate and compare performance of all configured MVAs
factory->EvaluateAllMethods();
// --------------------------------------------------------------
// Save the output
outputFile->Close();
std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVAClassificationCategory is done!" << std::endl;
// Clean up
delete factory;
delete dataloader;
// Launch the GUI for the root macros
if (!gROOT->IsBatch()) TMVA::TMVAGui( outfileName );
}
int main( int argc, char** argv )
{
TMVAClassificationCategory();
return 0;
}
Author
Andreas Hoecker

Definition in file TMVAClassificationCategory.C.