Logo ROOT   6.10/02
Reference Guide
TMVAClassification.C File Reference

Detailed Description

This macro provides examples for the training and testing of the TMVA classifiers.

As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables. The methods to be used can be switched on and off by means of booleans, or via the prompt command, for example:

root -l ./TMVAClassification.C\(\"Fisher,Likelihood\"\)

(note that the backslashes are mandatory) If no method given, a default set of classifiers is used. The output file "TMVA.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro. Launch the GUI via the command:

root -l ./TMVAGui.C

You can also compile and run the example with the following commands

make
./TMVAClassification <Methods>

where: <Methods> = "method1 method2" are the TMVA classifier names example:

./TMVAClassification Fisher LikelihoodPCA BDT

If no method given, a default set is of classifiers is used

Processing /builddir/build/BUILD/root-6.10.02/tutorials/tmva/TMVAClassification.C...
==> Start TMVAClassification
--- TMVAClassification : Using input file: ./files/tmva_class_example.root
DataSetInfo : [dataset] : Added class "Signal"
: Add Tree TreeS of type Signal with 6000 events
DataSetInfo : [dataset] : Added class "Background"
: Add Tree TreeB of type Background with 6000 events
Factory : Booking method: Cuts
:
: Use optimization method: "Monte Carlo"
: Use efficiency computation method: "Event Selection"
: Use "FSmart" cuts for variable: 'myvar1'
: Use "FSmart" cuts for variable: 'myvar2'
: Use "FSmart" cuts for variable: 'var3'
: Use "FSmart" cuts for variable: 'var4'
Factory : Booking method: CutsD
:
CutsD : [dataset] : Create Transformation "Decorrelate" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
: Use optimization method: "Monte Carlo"
: Use efficiency computation method: "Event Selection"
: Use "FSmart" cuts for variable: 'myvar1'
: Use "FSmart" cuts for variable: 'myvar2'
: Use "FSmart" cuts for variable: 'var3'
: Use "FSmart" cuts for variable: 'var4'
Factory : Booking method: Likelihood
:
Factory : Booking method: LikelihoodPCA
:
LikelihoodPCA : [dataset] : Create Transformation "PCA" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
Factory : Booking method: PDERS
:
Factory : Booking method: PDEFoam
:
Factory : Booking method: KNN
:
Factory : Booking method: LD
:
DataSetFactory : [dataset] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 1000
: Signal -- testing events : 5000
: Signal -- training and testing events: 6000
: Background -- training events : 1000
: Background -- testing events : 5000
: Background -- training and testing events: 6000
:
DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------------
: var1+var2 var1-var2 var3 var4
: var1+var2: +1.000 -0.039 +0.778 +0.931
: var1-var2: -0.039 +1.000 -0.111 +0.033
: var3: +0.778 -0.111 +1.000 +0.860
: var4: +0.931 +0.033 +0.860 +1.000
: ----------------------------------------------
DataSetInfo : Correlation matrix (Background):
: ----------------------------------------------
: var1+var2 var1-var2 var3 var4
: var1+var2: +1.000 +0.032 +0.784 +0.931
: var1-var2: +0.032 +1.000 -0.014 +0.111
: var3: +0.784 -0.014 +1.000 +0.863
: var4: +0.931 +0.111 +0.863 +1.000
: ----------------------------------------------
DataSetFactory : [dataset] :
:
Factory : Booking method: FDA_GA
:
: Create parameter interval for parameter 0 : [-1,1]
: Create parameter interval for parameter 1 : [-10,10]
: Create parameter interval for parameter 2 : [-10,10]
: Create parameter interval for parameter 3 : [-10,10]
: Create parameter interval for parameter 4 : [-10,10]
: User-defined formula string : "(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3"
: TFormula-compatible formula string: "[0]+[1]*[5]+[2]*[6]+[3]*[7]+[4]*[8]"
Factory : Booking method: MLPBNN
:
MLPBNN : [dataset] : Create Transformation "N" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
MLPBNN : Building Network.
: Initializing weights
Factory : Booking method: SVM
:
SVM : [dataset] : Create Transformation "Norm" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
Factory : Booking method: BDT
:
Factory : Booking method: RuleFit
:
Factory : Train all methods
Factory : [dataset] : Create Transformation "I" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
Factory : [dataset] : Create Transformation "D" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
Factory : [dataset] : Create Transformation "P" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
Factory : [dataset] : Create Transformation "G" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
Factory : [dataset] : Create Transformation "D" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'myvar1' <---> Output : variable 'myvar1'
: Input : variable 'myvar2' <---> Output : variable 'myvar2'
: Input : variable 'var3' <---> Output : variable 'var3'
: Input : variable 'var4' <---> Output : variable 'var4'
TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.015467 1.7574 [ -8.1442 7.2697 ]
: myvar2: -0.032440 1.1075 [ -3.9664 4.0259 ]
: var3: 0.0077507 1.0833 [ -5.0373 4.2785 ]
: var4: 0.15186 1.2738 [ -5.9505 4.6404 ]
: -----------------------------------------------------------
: Preparing the Decorrelation transformation...
TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.10647 1.0000 [ -4.3326 4.1397 ]
: myvar2: -0.048178 1.0000 [ -3.6769 3.6584 ]
: var3: -0.11237 1.0000 [ -3.8235 3.8765 ]
: var4: 0.31310 1.0000 [ -4.0754 3.2826 ]
: -----------------------------------------------------------
: Preparing the Principle Component (PCA) transformation...
TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.23314 2.3249 [ -11.198 8.9858 ]
: myvar2: 0.019827 1.1139 [ -4.0542 3.9417 ]
: var3: -0.0063033 0.59141 [ -2.0115 1.9675 ]
: var4: 0.0014833 0.33924 [ -0.99775 1.0218 ]
: -----------------------------------------------------------
: Preparing the Gaussian transformation...
: Preparing the Decorrelation transformation...
TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.036139 1.0000 [ -3.5532 7.9087 ]
: myvar2: 0.018981 1.0000 [ -3.2726 5.5734 ]
: var3: 0.053788 1.0000 [ -3.3619 7.7733 ]
: var4: -0.029204 1.0000 [ -3.4334 5.2857 ]
: -----------------------------------------------------------
: Ranking input variables (method unspecific)...
IdTransformation : Ranking result (top variable is best ranked)
: -------------------------------------
: Rank : Variable : Separation
: -------------------------------------
: 1 : Variable 4 : 2.790e-01
: 2 : Variable 3 : 1.458e-01
: 3 : myvar1 : 8.452e-02
: 4 : Expression 2 : 2.852e-02
: -------------------------------------
Factory : Train method: Cuts for Classification
:
FitterBase : <MCFitter> Sampling, please be patient ...
: Elapsed time: 11.7 sec
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.1
: Corresponding background efficiency : 0.00632442
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -5.92112 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 0.372788
: Cut[ 2]: -1.03103 < var3 <= 1e+30
: Cut[ 3]: 1.92872 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.2
: Corresponding background efficiency : 0.0195583
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -2.39694 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 0.10647
: Cut[ 2]: -3.38371 < var3 <= 1e+30
: Cut[ 3]: 1.25336 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.3
: Corresponding background efficiency : 0.036122
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -7.99661 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 0.516878
: Cut[ 2]: -2.4921 < var3 <= 1e+30
: Cut[ 3]: 1.01508 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.4
: Corresponding background efficiency : 0.0682383
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -7.92486 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 2.14952
: Cut[ 2]: -2.18407 < var3 <= 1e+30
: Cut[ 3]: 1.06279 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.5
: Corresponding background efficiency : 0.114033
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -5.79159 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 2.13777
: Cut[ 2]: -3.31357 < var3 <= 1e+30
: Cut[ 3]: 0.773468 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.6
: Corresponding background efficiency : 0.175228
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -4.44635 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 1.39479
: Cut[ 2]: -1.08281 < var3 <= 1e+30
: Cut[ 3]: 0.343334 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.7
: Corresponding background efficiency : 0.250904
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -4.12819 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 3.1382
: Cut[ 2]: -2.7095 < var3 <= 1e+30
: Cut[ 3]: 0.2305 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.8
: Corresponding background efficiency : 0.381613
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -3.65399 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 4.05488
: Cut[ 2]: -2.42379 < var3 <= 1e+30
: Cut[ 3]: -0.08696 < var4 <= 1e+30
: ------------------------------------------
: ------------------------------------------
Cuts : Cut values for requested signal efficiency: 0.9
: Corresponding background efficiency : 0.501033
: Transformation applied to input variables : None
: ------------------------------------------
: Cut[ 0]: -7.95481 < myvar1 <= 1e+30
: Cut[ 1]: -1e+30 < myvar2 <= 2.87258
: Cut[ 2]: -3.51779 < var3 <= 1e+30
: Cut[ 3]: -0.564128 < var4 <= 1e+30
: ------------------------------------------
: Elapsed time for training with 2000 events: 11.7 sec
Cuts : [dataset] : Evaluation of Cuts on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.000965 sec
: Creating xml weight file: dataset/weights/TMVAClassification_Cuts.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_Cuts.class.C
: TMVA.root:/dataset/Method_Cuts/Cuts
Factory : Training finished
:
Factory : Train method: CutsD for Classification
:
: Preparing the Decorrelation transformation...
TFHandler_CutsD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.10647 1.0000 [ -4.3326 4.1397 ]
: myvar2: -0.048178 1.0000 [ -3.6769 3.6584 ]
: var3: -0.11237 1.0000 [ -3.8235 3.8765 ]
: var4: 0.31310 1.0000 [ -4.0754 3.2826 ]
: -----------------------------------------------------------
TFHandler_CutsD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.10647 1.0000 [ -4.3326 4.1397 ]
: myvar2: -0.048178 1.0000 [ -3.6769 3.6584 ]
: var3: -0.11237 1.0000 [ -3.8235 3.8765 ]
: var4: 0.31310 1.0000 [ -4.0754 3.2826 ]
: -----------------------------------------------------------
FitterBase : <MCFitter> Sampling, please be patient ...
: Elapsed time: 8.52 sec
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.1
: Corresponding background efficiency : 0
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= -0.776575
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 2.87483
: Cut[ 2]: -0.420335 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 0.851491 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.2
: Corresponding background efficiency : 0.00122458
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 0.396766
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 0.275176
: Cut[ 2]: -2.02873 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 1.10575 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.3
: Corresponding background efficiency : 0.00423588
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 3.08016
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 0.714855
: Cut[ 2]: -1.89111 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 1.21039 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.4
: Corresponding background efficiency : 0.0104513
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 1.90636
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 1.54225
: Cut[ 2]: -2.16041 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 1.14096 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.5
: Corresponding background efficiency : 0.0191782
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 3.43619
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 1.91064
: Cut[ 2]: -1.7203 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 0.98712 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.6
: Corresponding background efficiency : 0.0359533
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 2.38751
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 3.08498
: Cut[ 2]: -2.28981 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 0.869049 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.7
: Corresponding background efficiency : 0.0668718
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 3.6647
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 2.57117
: Cut[ 2]: -2.05772 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 0.60765 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.8
: Corresponding background efficiency : 0.110299
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 2.55808
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 2.76353
: Cut[ 2]: -2.14693 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 0.420759 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: ------------------------------------------------------------------------------------------------------------------------
CutsD : Cut values for requested signal efficiency: 0.9
: Corresponding background efficiency : 0.255091
: Transformation applied to input variables : "Deco"
: ------------------------------------------------------------------------------------------------------------------------
: Cut[ 0]: -1e+30 < + 1.1153*[myvar1] + 0.035656*[myvar2] - 0.19996*[var3] - 0.79684*[var4] <= 3.23653
: Cut[ 1]: -1e+30 < + 0.035656*[myvar1] + 0.91607*[myvar2] + 0.11642*[var3] - 0.13113*[var4] <= 3.06894
: Cut[ 2]: -3.65126 < - 0.19996*[myvar1] + 0.11642*[myvar2] + 1.7869*[var3] - 0.78589*[var4] <= 1e+30
: Cut[ 3]: 0.0991743 < - 0.79684*[myvar1] - 0.13113*[myvar2] - 0.78589*[var3] + 2.155*[var4] <= 1e+30
: ------------------------------------------------------------------------------------------------------------------------
: Elapsed time for training with 2000 events: 8.53 sec
CutsD : [dataset] : Evaluation of CutsD on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.00528 sec
: Creating xml weight file: dataset/weights/TMVAClassification_CutsD.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_CutsD.class.C
: TMVA.root:/dataset/Method_CutsD/CutsD
Factory : Training finished
:
Factory : Train method: Likelihood for Classification
:
:
: ================================================================
: H e l p f o r M V A m e t h o d [ Likelihood ] :
:
: --- Short description:
:
: The maximum-likelihood classifier models the data with probability
: density functions (PDF) reproducing the signal and background
: distributions of the input variables. Correlations among the
: variables are ignored.
:
: --- Performance optimisation:
:
: Required for good performance are decorrelated input variables
: (PCA transformation via the option "VarTransform=Decorrelate"
: may be tried). Irreducible non-linear correlations may be reduced
: by precombining strongly correlated input variables, or by simply
: removing one of the variables.
:
: --- Performance tuning via configuration options:
:
: High fidelity PDF estimates are mandatory, i.e., sufficient training
: statistics is required to populate the tails of the distributions
: It would be a surprise if the default Spline or KDE kernel parameters
: provide a satisfying fit to the data. The user is advised to properly
: tune the events per bin and smooth options in the spline cases
: individually per variable. If the KDE kernel is used, the adaptive
: Gaussian kernel may lead to artefacts, so please always also try
: the non-adaptive one.
:
: All tuning parameters must be adjusted individually for each input
: variable!
:
: <Suppress this message by specifying "!H" in the booking option>
: ================================================================
:
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 2000 events: 0.0362 sec
Likelihood : [dataset] : Evaluation of Likelihood on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.00936 sec
: Creating xml weight file: dataset/weights/TMVAClassification_Likelihood.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_Likelihood.class.C
: TMVA.root:/dataset/Method_Likelihood/Likelihood
Factory : Training finished
:
Factory : Train method: LikelihoodPCA for Classification
:
: Preparing the Principle Component (PCA) transformation...
TFHandler_LikelihoodPCA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.23314 2.3249 [ -11.198 8.9858 ]
: myvar2: 0.019827 1.1139 [ -4.0542 3.9417 ]
: var3: -0.0063033 0.59141 [ -2.0115 1.9675 ]
: var4: 0.0014833 0.33924 [ -0.99775 1.0218 ]
: -----------------------------------------------------------
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 2000 events: 0.0484 sec
LikelihoodPCA : [dataset] : Evaluation of LikelihoodPCA on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.0166 sec
: Creating xml weight file: dataset/weights/TMVAClassification_LikelihoodPCA.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_LikelihoodPCA.class.C
: TMVA.root:/dataset/Method_LikelihoodPCA/LikelihoodPCA
Factory : Training finished
:
Factory : Train method: PDERS for Classification
:
: Elapsed time for training with 2000 events: 0.0126 sec
PDERS : [dataset] : Evaluation of PDERS on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 1.09 sec
: Creating xml weight file: dataset/weights/TMVAClassification_PDERS.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_PDERS.class.C
Factory : Training finished
:
Factory : Train method: PDEFoam for Classification
:
PDEFoam : NormMode=NUMEVENTS chosen. Note that only NormMode=EqualNumEvents ensures that Discriminant values correspond to signal probabilities.
: Build up discriminator foam
: Elapsed time: 1.08 sec
: Elapsed time for training with 2000 events: 1.2 sec
PDEFoam : [dataset] : Evaluation of PDEFoam on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.0931 sec
: Creating xml weight file: dataset/weights/TMVAClassification_PDEFoam.weights.xml
: writing foam DiscrFoam to file
: Foams written to file: dataset/weights/TMVAClassification_PDEFoam.weights_foams.root
: Creating standalone class: dataset/weights/TMVAClassification_PDEFoam.class.C
Factory : Training finished
:
Factory : Train method: KNN for Classification
:
:
: ================================================================
: H e l p f o r M V A m e t h o d [ KNN ] :
:
: --- Short description:
:
: The k-nearest neighbor (k-NN) algorithm is a multi-dimensional classification
: and regression algorithm. Similarly to other TMVA algorithms, k-NN uses a set of
: training events for which a classification category/regression target is known.
: The k-NN method compares a test event to all training events using a distance
: function, which is an Euclidean distance in a space defined by the input variables.
: The k-NN method, as implemented in TMVA, uses a kd-tree algorithm to perform a
: quick search for the k events with shortest distance to the test event. The method
: returns a fraction of signal events among the k neighbors. It is recommended
: that a histogram which stores the k-NN decision variable is binned with k+1 bins
: between 0 and 1.
:
: --- Performance tuning via configuration options: 
:
: The k-NN method estimates a density of signal and background events in a
: neighborhood around the test event. The method assumes that the density of the
: signal and background events is uniform and constant within the neighborhood.
: k is an adjustable parameter and it determines an average size of the
: neighborhood. Small k values (less than 10) are sensitive to statistical
: fluctuations and large (greater than 100) values might not sufficiently capture
: local differences between events in the training set. The speed of the k-NN
: method also increases with larger values of k.
:
: The k-NN method assigns equal weight to all input variables. Different scales
: among the input variables is compensated using ScaleFrac parameter: the input
: variables are scaled so that the widths for central ScaleFrac*100% events are
: equal among all the input variables.
:
: --- Additional configuration options: 
:
: The method inclues an option to use a Gaussian kernel to smooth out the k-NN
: response. The kernel re-weights events using a distance to the test event.
:
: <Suppress this message by specifying "!H" in the booking option>
: ================================================================
:
KNN : <Train> start...
: Reading 2000 events
: Number of signal events 1000
: Number of background events 1000
: Creating kd-tree with 2000 events
: Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN : Optimizing tree for 4 variables with 2000 values
: <Fill> Class 1 has 1000 events
: <Fill> Class 2 has 1000 events
: Elapsed time for training with 2000 events: 0.0104 sec
KNN : [dataset] : Evaluation of KNN on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.139 sec
: Creating xml weight file: dataset/weights/TMVAClassification_KNN.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_KNN.class.C
Factory : Training finished
:
Factory : Train method: LD for Classification
:
:
: ================================================================
: H e l p f o r M V A m e t h o d [ LD ] :
:
: --- Short description:
:
: Linear discriminants select events by distinguishing the mean
: values of the signal and background distributions in a trans-
: formed variable space where linear correlations are removed.
: The LD implementation here is equivalent to the "Fisher" discriminant
: for classification, but also provides linear regression.
:
: (More precisely: the "linear discriminator" determines
: an axis in the (correlated) hyperspace of the input
: variables such that, when projecting the output classes
: (signal and background) upon this axis, they are pushed
: as far as possible away from each other, while events
: of a same class are confined in a close vicinity. The
: linearity property of this classifier is reflected in the
: metric with which "far apart" and "close vicinity" are
: determined: the covariance matrix of the discriminating
: variable space.)
:
: --- Performance optimisation:
:
: Optimal performance for the linear discriminant is obtained for
: linearly correlated Gaussian-distributed variables. Any deviation
: from this ideal reduces the achievable separation power. In
: particular, no discrimination at all is achieved for a variable
: that has the same sample mean for signal and background, even if
: the shapes of the distributions are very different. Thus, the linear
: discriminant often benefits from a suitable transformation of the
: input variables. For example, if a variable x in [-1,1] has a
: a parabolic signal distributions, and a uniform background
: distributions, their mean value is zero in both cases, leading
: to no separation. The simple transformation x -> |x| renders this
: variable powerful for the use in a linear discriminant.
:
: --- Performance tuning via configuration options:
:
: <None>
:
: <Suppress this message by specifying "!H" in the booking option>
: ================================================================
:
LD : Results for LD coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: myvar1: -0.326
: myvar2: -0.080
: var3: -0.195
: var4: +0.758
: (offset): -0.056
: -----------------------
: Elapsed time for training with 2000 events: 0.00269 sec
LD : [dataset] : Evaluation of LD on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.00149 sec
: <CreateMVAPdfs> Separation from histogram (PDF): 0.578 (0.000)
: Dataset[dataset] : Evaluation of LD on training sample
: Creating xml weight file: dataset/weights/TMVAClassification_LD.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_LD.class.C
Factory : Training finished
:
Factory : Train method: FDA_GA for Classification
:
:
: ================================================================
: H e l p f o r M V A m e t h o d [ FDA_GA ] :
:
: --- Short description:
:
: The function discriminant analysis (FDA) is a classifier suitable
: to solve linear or simple nonlinear discrimination problems.
:
: The user provides the desired function with adjustable parameters
: via the configuration option string, and FDA fits the parameters to
: it, requiring the signal (background) function value to be as close
: as possible to 1 (0). Its advantage over the more involved and
: automatic nonlinear discriminators is the simplicity and transparency
: of the discrimination expression. A shortcoming is that FDA will
: underperform for involved problems with complicated, phase space
: dependent nonlinear correlations.
:
: Please consult the Users Guide for the format of the formula string
: and the allowed parameter ranges:
: http://tmva.sourceforge.net/docu/TMVAUsersGuide.pdf
:
: --- Performance optimisation:
:
: The FDA performance depends on the complexity and fidelity of the
: user-defined discriminator function. As a general rule, it should
: be able to reproduce the discrimination power of any linear
: discriminant analysis. To reach into the nonlinear domain, it is
: useful to inspect the correlation profiles of the input variables,
: and add quadratic and higher polynomial terms between variables as
: necessary. Comparison with more involved nonlinear classifiers can
: be used as a guide.
:
: --- Performance tuning via configuration options:
:
: Depending on the function used, the choice of "FitMethod" is
: crucial for getting valuable solutions with FDA. As a guideline it
: is recommended to start with "FitMethod=MINUIT". When more complex
: functions are used where MINUIT does not converge to reasonable
: results, the user should switch to non-gradient FitMethods such
: as GeneticAlgorithm (GA) or Monte Carlo (MC). It might prove to be
: useful to combine GA (or MC) with MINUIT by setting the option
: "Converger=MINUIT". GA (MC) will then set the starting parameters
: for MINUIT such that the basic quality of GA (MC) of finding global
: minima is combined with the efficacy of MINUIT of finding local
: minima.
:
: <Suppress this message by specifying "!H" in the booking option>
: ================================================================
:
FitterBase : <GeneticFitter> Optimisation, please be patient ... (inaccurate progress timing for GA)
: Elapsed time: 2.14 sec
FDA_GA : Results for parameter fit using "GA" fitter:
: -----------------------
: Parameter: Fit result:
: -----------------------
: Par(0): 0.371412
: Par(1): 0
: Par(2): 0
: Par(3): 0
: Par(4): 0.169203
: -----------------------
: Discriminator expression: "(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3"
: Value of estimator at minimum: 0.394096
: Elapsed time for training with 2000 events: 2.24 sec
FDA_GA : [dataset] : Evaluation of FDA_GA on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.00148 sec
: Creating xml weight file: dataset/weights/TMVAClassification_FDA_GA.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_FDA_GA.class.C
Factory : Training finished
:
Factory : Train method: MLPBNN for Classification
:
:
: ================================================================
: H e l p f o r M V A m e t h o d [ MLPBNN ] :
:
: --- Short description:
:
: The MLP artificial neural network (ANN) is a traditional feed-
: forward multilayer perceptron implementation. The MLP has a user-
: defined hidden layer architecture, while the number of input (output)
: nodes is determined by the input variables (output classes, i.e.,
: signal and one background).
:
: --- Performance optimisation:
:
: Neural networks are stable and performing for a large variety of
: linear and non-linear classification problems. However, in contrast
: to (e.g.) boosted decision trees, the user is advised to reduce the
: number of input variables that have only little discrimination power.
:
: In the tests we have carried out so far, the MLP and ROOT networks
: (TMlpANN, interfaced via TMVA) performed equally well, with however
: a clear speed advantage for the MLP. The Clermont-Ferrand neural
: net (CFMlpANN) exhibited worse classification performance in these
: tests, which is partly due to the slow convergence of its training
: (at least 10k training cycles are required to achieve approximately
: competitive results).
:
: Overtraining: only the TMlpANN performs an explicit separation of the
: full training sample into independent training and validation samples.
: We have found that in most high-energy physics applications the
: available degrees of freedom (training events) are sufficient to
: constrain the weights of the relatively simple architectures required
: to achieve good performance. Hence no overtraining should occur, and
: the use of validation samples would only reduce the available training
: information. However, if the performance on the training sample is
: found to be significantly better than the one found with the inde-
: pendent test sample, caution is needed. The results for these samples
: are printed to standard output at the end of each training job.
:
: --- Performance tuning via configuration options:
:
: The hidden layer architecture for all ANNs is defined by the option
: "HiddenLayers=N+1,N,...", where here the first hidden layer has N+1
: neurons and the second N neurons (and so on), and where N is the number
: of input variables. Excessive numbers of hidden layers should be avoided,
: in favour of more neurons in the first hidden layer.
:
: The number of cycles should be above 500. As said, if the number of
: adjustable weights is small compared to the training sample size,
: using a large number of training samples should not lead to overtraining.
:
: <Suppress this message by specifying "!H" in the booking option>
: ================================================================
:
TFHandler_MLPBNN : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.058742 0.22803 [ -1.0000 1.0000 ]
: myvar2: -0.015557 0.27714 [ -1.0000 1.0000 ]
: var3: 0.083122 0.23258 [ -1.0000 1.0000 ]
: var4: 0.15238 0.24054 [ -1.0000 1.0000 ]
: -----------------------------------------------------------
: Training Network
:
: Finalizing handling of Regulator terms, trainE=0.670373 testE=0.739958
: Done with handling of Regulator terms
: Elapsed time for training with 2000 events: 7.57 sec
MLPBNN : [dataset] : Evaluation of MLPBNN on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.012 sec
: Creating xml weight file: dataset/weights/TMVAClassification_MLPBNN.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_MLPBNN.class.C
: Write special histos to file: TMVA.root:/dataset/Method_MLPBNN/MLPBNN
Factory : Training finished
:
Factory : Train method: SVM for Classification
:
TFHandler_SVM : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.058742 0.22803 [ -1.0000 1.0000 ]
: myvar2: -0.015557 0.27714 [ -1.0000 1.0000 ]
: var3: 0.083122 0.23258 [ -1.0000 1.0000 ]
: var4: 0.15238 0.24054 [ -1.0000 1.0000 ]
: -----------------------------------------------------------
: Building SVM Working Set...with 2000 event instances
: Elapsed time for Working Set build: 0.371 sec
: Sorry, no computing time forecast available for SVM, please wait ...
: Elapsed time: 3.33 sec
: Elapsed time for training with 2000 events: 3.72 sec
SVM : [dataset] : Evaluation of SVM on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.323 sec
: Creating xml weight file: dataset/weights/TMVAClassification_SVM.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_SVM.class.C
Factory : Training finished
:
Factory : Train method: BDT for Classification
:
BDT : #events: (reweighted) sig: 1000 bkg: 1000
: #events: (unweighted) sig: 1000 bkg: 1000
: Training 850 Decision Trees ... patience please
: Elapsed time for training with 2000 events: 1.98 sec
BDT : [dataset] : Evaluation of BDT on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.561 sec
: Creating xml weight file: dataset/weights/TMVAClassification_BDT.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_BDT.class.C
: TMVA.root:/dataset/Method_BDT/BDT
Factory : Training finished
:
Factory : Train method: RuleFit for Classification
:
:
: ================================================================
: H e l p f o r M V A m e t h o d [ RuleFit ] :
:
: --- Short description:
:
: This method uses a collection of so called rules to create a
: discriminating scoring function. Each rule consists of a series
: of cuts in parameter space. The ensemble of rules are created
: from a forest of decision trees, trained using the training data.
: Each node (apart from the root) corresponds to one rule.
: The scoring function is then obtained by linearly combining
: the rules. A fitting procedure is applied to find the optimum
: set of coefficients. The goal is to find a model with few rules
: but with a strong discriminating power.
:
: --- Performance optimisation:
:
: There are two important considerations to make when optimising:
:
: 1. Topology of the decision tree forest
: 2. Fitting of the coefficients
:
: The maximum complexity of the rules is defined by the size of
: the trees. Large trees will yield many complex rules and capture
: higher order correlations. On the other hand, small trees will
: lead to a smaller ensemble with simple rules, only capable of
: modeling simple structures.
: Several parameters exists for controlling the complexity of the
: rule ensemble.
:
: The fitting procedure searches for a minimum using a gradient
: directed path. Apart from step size and number of steps, the
: evolution of the path is defined by a cut-off parameter, tau.
: This parameter is unknown and depends on the training data.
: A large value will tend to give large weights to a few rules.
: Similarly, a small value will lead to a large set of rules
: with similar weights.
:
: A final point is the model used; rules and/or linear terms.
: For a given training sample, the result may improve by adding
: linear terms. If best performance is obtained using only linear
: terms, it is very likely that the Fisher discriminant would be
: a better choice. Ideally the fitting procedure should be able to
: make this choice by giving appropriate weights for either terms.
:
: --- Performance tuning via configuration options:
:
: I. TUNING OF RULE ENSEMBLE:
:
: ForestType : Recommended is to use the default "AdaBoost".
: nTrees : More trees leads to more rules but also slow
: performance. With too few trees the risk is
: that the rule ensemble becomes too simple.
: fEventsMin 
: fEventsMax : With a lower min, more large trees will be generated
: leading to more complex rules.
: With a higher max, more small trees will be
: generated leading to more simple rules.
: By changing this range, the average complexity
: of the rule ensemble can be controlled.
: RuleMinDist : By increasing the minimum distance between
: rules, fewer and more diverse rules will remain.
: Initially it is a good idea to keep this small
: or zero and let the fitting do the selection of
: rules. In order to reduce the ensemble size,
: the value can then be increased.
:
: II. TUNING OF THE FITTING:
:
: GDPathEveFrac : fraction of events in path evaluation
: Increasing this fraction will improve the path
: finding. However, a too high value will give few
: unique events available for error estimation.
: It is recommended to use the default = 0.5.
: GDTau : cutoff parameter tau
: By default this value is set to -1.0.
: This means that the cut off parameter is
: automatically estimated. In most cases
: this should be fine. However, you may want
: to fix this value if you already know it
: and want to reduce on training time.
: GDTauPrec : precision of estimated tau
: Increase this precision to find a more
: optimum cut-off parameter.
: GDNStep : number of steps in path search
: If the number of steps is too small, then
: the program will give a warning message.
:
: III. WARNING MESSAGES
:
: Risk(i+1)>=Risk(i) in path
: Chaotic behaviour of risk evolution.
: The error rate was still decreasing at the end
: By construction the Risk should always decrease.
: However, if the training sample is too small or
: the model is overtrained, such warnings can
: occur.
: The warnings can safely be ignored if only a
: few (<3) occur. If more warnings are generated,
: the fitting fails.
: A remedy may be to increase the value
: GDValidEveFrac to 1.0 (or a larger value).
: In addition, if GDPathEveFrac is too high
: the same warnings may occur since the events
: used for error estimation are also used for
: path estimation.
: Another possibility is to modify the model -
: See above on tuning the rule ensemble.
:
: The error rate was still decreasing at the end of the path
: Too few steps in path! Increase GDNSteps.
:
: Reached minimum early in the search
: Minimum was found early in the fitting. This
: may indicate that the used step size GDStep.
: was too large. Reduce it and rerun.
: If the results still are not OK, modify the
: model either by modifying the rule ensemble
: or add/remove linear terms
:
: <Suppress this message by specifying "!H" in the booking option>
: ================================================================
:
RuleFit : -------------------RULE ENSEMBLE SUMMARY------------------------
: Tree training method : AdaBoost
: Number of events per tree : 2000
: Number of trees : 20
: Number of generated rules : 188
: Idem, after cleanup : 65
: Average number of cuts per rule : 2.71
: Spread in number of cuts per rules : 1.11
: ----------------------------------------------------------------
:
: GD path scan - the scan stops when the max num. of steps is reached or a min is found
: Estimating the cutoff parameter tau. The estimated time is a pessimistic maximum.
: Best path found with tau = 0.0400 after 4.84 sec
: Fitting model...
<WARNING> : Risk(i+1)>=Risk(i) in path
: Risk(i+1)>=Risk(i) in path
: Risk(i+1)>=Risk(i) in path
:
: Minimisation elapsed time : 0.408 sec
: ----------------------------------------------------------------
: Found minimum at step 300 with error = 0.545798
: Reason for ending loop: clear minima found
: ----------------------------------------------------------------
<WARNING> : Reached minimum early in the search
: Check results and maybe decrease GDStep size
: Removed 13 out of a total of 65 rules with importance < 0.001
:
: ================================================================
: M o d e l
: ================================================================
RuleFit : Offset (a0) = 1.81844
: ------------------------------------
: Linear model (weights unnormalised)
: ------------------------------------
: Variable : Weights : Importance
: ------------------------------------
: myvar1 : -2.303e-01 : 0.901
: myvar2 : -3.967e-02 : 0.097
: var3 : 7.026e-03 : 0.017
: var4 : 3.519e-01 : 1.000
: ------------------------------------
: Number of rules = 52
: Printing the first 10 rules, ordered in importance.
: Rule 1 : Importance = 0.5224
: Cut 1 : var3 < 0.286
: Cut 2 : var4 < -0.615
: Rule 2 : Importance = 0.3544
: Cut 1 : 0.664 < myvar1
: Cut 2 : var4 < 1.85
: Rule 3 : Importance = 0.3517
: Cut 1 : -0.804 < myvar1
: Rule 4 : Importance = 0.3250
: Cut 1 : var3 < 0.286
: Cut 2 : -0.615 < var4
: Rule 5 : Importance = 0.3101
: Cut 1 : myvar1 < 1.4
: Cut 2 : -0.725 < myvar2
: Cut 3 : var4 < -0.672
: Rule 6 : Importance = 0.2823
: Cut 1 : myvar1 < 2.87
: Cut 2 : var4 < 1.21
: Rule 7 : Importance = 0.2738
: Cut 1 : var4 < 1.11
: Rule 8 : Importance = 0.2264
: Cut 1 : myvar1 < 1.36
: Cut 2 : myvar2 < 0.22
: Cut 3 : var4 < -0.412
: Rule 9 : Importance = 0.2121
: Cut 1 : 0.22 < myvar2
: Rule 10 : Importance = 0.1985
: Cut 1 : -0.0703 < myvar1
: Cut 2 : 0.358 < myvar2
: Skipping the next 42 rules
: ================================================================
:
<WARNING> : No input variable directory found - BUG?
: Elapsed time for training with 2000 events: 5.39 sec
RuleFit : [dataset] : Evaluation of RuleFit on training sample (2000 events)
: Elapsed time for evaluation of 2000 events: 0.00755 sec
: Creating xml weight file: dataset/weights/TMVAClassification_RuleFit.weights.xml
: Creating standalone class: dataset/weights/TMVAClassification_RuleFit.class.C
: TMVA.root:/dataset/Method_RuleFit/RuleFit
Factory : Training finished
:
: Ranking input variables (method specific)...
: No variable ranking supplied by classifier: Cuts
: No variable ranking supplied by classifier: CutsD
Likelihood : Ranking result (top variable is best ranked)
: -------------------------------------
: Rank : Variable : Delta Separation
: -------------------------------------
: 1 : var4 : 4.895e-02
: 2 : myvar1 : 1.429e-02
: 3 : var3 : 5.082e-03
: 4 : myvar2 : -9.248e-03
: -------------------------------------
LikelihoodPCA : Ranking result (top variable is best ranked)
: -------------------------------------
: Rank : Variable : Delta Separation
: -------------------------------------
: 1 : var4 : 3.692e-01
: 2 : myvar1 : 7.230e-02
: 3 : var3 : 3.123e-02
: 4 : myvar2 : -1.272e-03
: -------------------------------------
: No variable ranking supplied by classifier: PDERS
PDEFoam : Ranking result (top variable is best ranked)
: ----------------------------------------
: Rank : Variable : Variable Importance
: ----------------------------------------
: 1 : var4 : 3.036e-01
: 2 : myvar1 : 2.500e-01
: 3 : var3 : 2.500e-01
: 4 : myvar2 : 1.964e-01
: ----------------------------------------
: No variable ranking supplied by classifier: KNN
LD : Ranking result (top variable is best ranked)
: ---------------------------------
: Rank : Variable : Discr. power
: ---------------------------------
: 1 : var4 : 7.576e-01
: 2 : myvar1 : 3.263e-01
: 3 : var3 : 1.948e-01
: 4 : myvar2 : 8.026e-02
: ---------------------------------
: No variable ranking supplied by classifier: FDA_GA
MLPBNN : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Importance
: -------------------------------
: 1 : var4 : 1.896e+00
: 2 : myvar1 : 1.295e+00
: 3 : myvar2 : 4.347e-01
: 4 : var3 : 4.243e-01
: -------------------------------
: No variable ranking supplied by classifier: SVM
BDT : Ranking result (top variable is best ranked)
: ----------------------------------------
: Rank : Variable : Variable Importance
: ----------------------------------------
: 1 : var4 : 2.748e-01
: 2 : myvar1 : 2.593e-01
: 3 : var3 : 2.336e-01
: 4 : myvar2 : 2.322e-01
: ----------------------------------------
RuleFit : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Importance
: -------------------------------
: 1 : var4 : 1.000e+00
: 2 : myvar1 : 8.310e-01
: 3 : myvar2 : 5.152e-01
: 4 : var3 : 4.272e-01
: -------------------------------
Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Read cuts optimised using sample of MC events
: Reading 100 signal efficiency bins for 4 variables
: Read cuts optimised using sample of MC events
: Reading 100 signal efficiency bins for 4 variables
: signal and background scales: 0.001 0.001
: Read foams from file: dataset/weights/TMVAClassification_PDEFoam.weights_foams.root
: Creating kd-tree with 2000 events
: Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN : Optimizing tree for 4 variables with 2000 values
: <Fill> Class 1 has 1000 events
: <Fill> Class 2 has 1000 events
: User-defined formula string : "(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3"
: TFormula-compatible formula string: "[0]+[1]*[5]+[2]*[6]+[3]*[7]+[4]*[8]"
MLPBNN : Building Network.
: Initializing weights
Factory : Test all methods
Factory : Test method: Cuts for Classification performance
:
Cuts : [dataset] : Evaluation of Cuts on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00184 sec
Factory : Test method: CutsD for Classification performance
:
CutsD : [dataset] : Evaluation of CutsD on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0236 sec
Factory : Test method: Likelihood for Classification performance
:
Likelihood : [dataset] : Evaluation of Likelihood on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0392 sec
Factory : Test method: LikelihoodPCA for Classification performance
:
LikelihoodPCA : [dataset] : Evaluation of LikelihoodPCA on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0786 sec
Factory : Test method: PDERS for Classification performance
:
PDERS : [dataset] : Evaluation of PDERS on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 3.83 sec
Factory : Test method: PDEFoam for Classification performance
:
PDEFoam : [dataset] : Evaluation of PDEFoam on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.467 sec
Factory : Test method: KNN for Classification performance
:
KNN : [dataset] : Evaluation of KNN on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.611 sec
Factory : Test method: LD for Classification performance
:
LD : [dataset] : Evaluation of LD on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00687 sec
: Dataset[dataset] : Evaluation of LD on testing sample
Factory : Test method: FDA_GA for Classification performance
:
FDA_GA : [dataset] : Evaluation of FDA_GA on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00517 sec
Factory : Test method: MLPBNN for Classification performance
:
MLPBNN : [dataset] : Evaluation of MLPBNN on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0525 sec
Factory : Test method: SVM for Classification performance
:
SVM : [dataset] : Evaluation of SVM on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 1.53 sec
Factory : Test method: BDT for Classification performance
:
BDT : [dataset] : Evaluation of BDT on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 2.18 sec
Factory : Test method: RuleFit for Classification performance
:
RuleFit : [dataset] : Evaluation of RuleFit on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0421 sec
Factory : Evaluate all methods
Factory : Evaluate classifier: Cuts
:
<WARNING> : You have asked for histogram MVA_EFF_BvsS which does not seem to exist in *Results* .. better don't use it
<WARNING> : You have asked for histogram EFF_BVSS_TR which does not seem to exist in *Results* .. better don't use it
TFHandler_Cuts : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: CutsD
:
<WARNING> : You have asked for histogram MVA_EFF_BvsS which does not seem to exist in *Results* .. better don't use it
TFHandler_CutsD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.14823 0.98425 [ -5.4957 4.9064 ]
: myvar2: -0.074925 0.99280 [ -3.6989 3.6412 ]
: var3: -0.097101 1.0022 [ -4.4664 4.4326 ]
: var4: 0.64124 0.94684 [ -3.6874 3.7410 ]
: -----------------------------------------------------------
<WARNING> : You have asked for histogram EFF_BVSS_TR which does not seem to exist in *Results* .. better don't use it
TFHandler_CutsD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.10647 1.0000 [ -4.3326 4.1397 ]
: myvar2: -0.048178 1.0000 [ -3.6769 3.6584 ]
: var3: -0.11237 1.0000 [ -3.8235 3.8765 ]
: var4: 0.31310 1.0000 [ -4.0754 3.2826 ]
: -----------------------------------------------------------
TFHandler_CutsD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: -0.14823 0.98425 [ -5.4957 4.9064 ]
: myvar2: -0.074925 0.99280 [ -3.6989 3.6412 ]
: var3: -0.097101 1.0022 [ -4.4664 4.4326 ]
: var4: 0.64124 0.94684 [ -3.6874 3.7410 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: Likelihood
:
Likelihood : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_Likelihood : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: LikelihoodPCA
:
TFHandler_LikelihoodPCA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.74178 2.2502 [ -12.862 10.364 ]
: myvar2: -0.14040 1.1170 [ -4.0632 3.9888 ]
: var3: -0.18967 0.58222 [ -2.2874 1.9945 ]
: var4: -0.32089 0.33116 [ -1.4056 0.88219 ]
: -----------------------------------------------------------
LikelihoodPCA : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_LikelihoodPCA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.74178 2.2502 [ -12.862 10.364 ]
: myvar2: -0.14040 1.1170 [ -4.0632 3.9888 ]
: var3: -0.18967 0.58222 [ -2.2874 1.9945 ]
: var4: -0.32089 0.33116 [ -1.4056 0.88219 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: PDERS
:
PDERS : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_PDERS : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: PDEFoam
:
PDEFoam : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_PDEFoam : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: KNN
:
KNN : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_KNN : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: LD
:
LD : [dataset] : Loop over test events and fill histograms with classifier response...
:
: Also filling probability and rarity histograms (on request)...
TFHandler_LD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: FDA_GA
:
FDA_GA : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_FDA_GA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: MLPBNN
:
TFHandler_MLPBNN : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.083080 0.22243 [ -1.2227 1.0821 ]
: myvar2: -0.019635 0.27648 [ -1.0298 1.0008 ]
: var3: 0.11576 0.22607 [ -1.0685 1.0783 ]
: var4: 0.20452 0.23064 [ -1.1921 1.0737 ]
: -----------------------------------------------------------
MLPBNN : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_MLPBNN : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.083080 0.22243 [ -1.2227 1.0821 ]
: myvar2: -0.019635 0.27648 [ -1.0298 1.0008 ]
: var3: 0.11576 0.22607 [ -1.0685 1.0783 ]
: var4: 0.20452 0.23064 [ -1.1921 1.0737 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: SVM
:
TFHandler_SVM : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.083080 0.22243 [ -1.2227 1.0821 ]
: myvar2: -0.019635 0.27648 [ -1.0298 1.0008 ]
: var3: 0.11576 0.22607 [ -1.0685 1.0783 ]
: var4: 0.20452 0.23064 [ -1.1921 1.0737 ]
: -----------------------------------------------------------
SVM : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_SVM : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.083080 0.22243 [ -1.2227 1.0821 ]
: myvar2: -0.019635 0.27648 [ -1.0298 1.0008 ]
: var3: 0.11576 0.22607 [ -1.0685 1.0783 ]
: var4: 0.20452 0.23064 [ -1.1921 1.0737 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: BDT
:
BDT : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_BDT : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
Factory : Evaluate classifier: RuleFit
:
RuleFit : [dataset] : Loop over test events and fill histograms with classifier response...
:
TFHandler_RuleFit : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: myvar1: 0.20304 1.7143 [ -9.8605 7.9024 ]
: myvar2: -0.048739 1.1049 [ -4.0854 4.0291 ]
: var3: 0.15976 1.0530 [ -5.3563 4.6430 ]
: var4: 0.42794 1.2213 [ -6.9675 5.0307 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: dataset LD : 0.919
: dataset MLPBNN : 0.916
: dataset LikelihoodPCA : 0.913
: dataset SVM : 0.911
: dataset CutsD : 0.901
: dataset BDT : 0.874
: dataset RuleFit : 0.869
: dataset KNN : 0.842
: dataset PDEFoam : 0.812
: dataset PDERS : 0.806
: dataset FDA_GA : 0.795
: dataset Cuts : 0.792
: dataset Likelihood : 0.761
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset LD : 0.351 (0.426) 0.763 (0.762) 0.922 (0.925)
: dataset MLPBNN : 0.350 (0.409) 0.754 (0.756) 0.918 (0.922)
: dataset LikelihoodPCA : 0.317 (0.370) 0.751 (0.763) 0.913 (0.929)
: dataset SVM : 0.357 (0.369) 0.744 (0.752) 0.912 (0.919)
: dataset CutsD : 0.242 (0.394) 0.727 (0.788) 0.903 (0.922)
: dataset BDT : 0.225 (0.418) 0.640 (0.724) 0.854 (0.904)
: dataset RuleFit : 0.215 (0.258) 0.637 (0.692) 0.865 (0.899)
: dataset KNN : 0.185 (0.257) 0.554 (0.630) 0.819 (0.847)
: dataset PDEFoam : 0.000 (0.192) 0.458 (0.494) 0.775 (0.773)
: dataset PDERS : 0.170 (0.155) 0.466 (0.447) 0.754 (0.733)
: dataset FDA_GA : 0.119 (0.107) 0.454 (0.465) 0.741 (0.718)
: dataset Cuts : 0.108 (0.145) 0.446 (0.479) 0.741 (0.746)
: dataset Likelihood : 0.082 (0.085) 0.385 (0.383) 0.684 (0.661)
: -------------------------------------------------------------------------------------------------------------------
:
Dataset:dataset : Created tree 'TestTree' with 10000 events
:
Dataset:dataset : Created tree 'TrainTree' with 2000 events
:
Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
==> Wrote root file: TMVA.root
==> TMVAClassification is done!
(int) 0
#include <cstdlib>
#include <iostream>
#include <map>
#include <string>
#include "TChain.h"
#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TObjString.h"
#include "TSystem.h"
#include "TROOT.h"
#include "TMVA/Factory.h"
#include "TMVA/Tools.h"
#include "TMVA/TMVAGui.h"
int TMVAClassification( TString myMethodList = "" )
{
// The explicit loading of the shared libTMVA is done in TMVAlogon.C, defined in .rootrc
// if you use your private .rootrc, or run from a different directory, please copy the
// corresponding lines from .rootrc
// Methods to be processed can be given as an argument; use format:
//
// mylinux~> root -l TMVAClassification.C\(\"myMethod1,myMethod2,myMethod3\"\)
//---------------------------------------------------------------
// This loads the library
// Default MVA methods to be trained + tested
std::map<std::string,int> Use;
// Cut optimisation
Use["Cuts"] = 1;
Use["CutsD"] = 1;
Use["CutsPCA"] = 0;
Use["CutsGA"] = 0;
Use["CutsSA"] = 0;
//
// 1-dimensional likelihood ("naive Bayes estimator")
Use["Likelihood"] = 1;
Use["LikelihoodD"] = 0; // the "D" extension indicates decorrelated input variables (see option strings)
Use["LikelihoodPCA"] = 1; // the "PCA" extension indicates PCA-transformed input variables (see option strings)
Use["LikelihoodKDE"] = 0;
Use["LikelihoodMIX"] = 0;
//
// Mutidimensional likelihood and Nearest-Neighbour methods
Use["PDERS"] = 1;
Use["PDERSD"] = 0;
Use["PDERSPCA"] = 0;
Use["PDEFoam"] = 1;
Use["PDEFoamBoost"] = 0; // uses generalised MVA method boosting
Use["KNN"] = 1; // k-nearest neighbour method
//
// Linear Discriminant Analysis
Use["LD"] = 1; // Linear Discriminant identical to Fisher
Use["Fisher"] = 0;
Use["FisherG"] = 0;
Use["BoostedFisher"] = 0; // uses generalised MVA method boosting
Use["HMatrix"] = 0;
//
// Function Discriminant analysis
Use["FDA_GA"] = 1; // minimisation of user-defined function using Genetics Algorithm
Use["FDA_SA"] = 0;
Use["FDA_MC"] = 0;
Use["FDA_MT"] = 0;
Use["FDA_GAMT"] = 0;
Use["FDA_MCMT"] = 0;
//
// Neural Networks (all are feed-forward Multilayer Perceptrons)
Use["MLP"] = 0; // Recommended ANN
Use["MLPBFGS"] = 0; // Recommended ANN with optional training method
Use["MLPBNN"] = 1; // Recommended ANN with BFGS training method and bayesian regulator
Use["CFMlpANN"] = 0; // Depreciated ANN from ALEPH
Use["TMlpANN"] = 0; // ROOT's own ANN
Use["DNN_GPU"] = 0; // CUDA-accelerated DNN training.
Use["DNN_CPU"] = 0; // Multi-core accelerated DNN.
//
// Support Vector Machine
Use["SVM"] = 1;
//
// Boosted Decision Trees
Use["BDT"] = 1; // uses Adaptive Boost
Use["BDTG"] = 0; // uses Gradient Boost
Use["BDTB"] = 0; // uses Bagging
Use["BDTD"] = 0; // decorrelation + Adaptive Boost
Use["BDTF"] = 0; // allow usage of fisher discriminant for node splitting
//
// Friedman's RuleFit method, ie, an optimised series of cuts ("rules")
Use["RuleFit"] = 1;
// ---------------------------------------------------------------
std::cout << std::endl;
std::cout << "==> Start TMVAClassification" << std::endl;
// Select methods (don't look at this code - not of interest)
if (myMethodList != "") {
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) it->second = 0;
std::vector<TString> mlist = TMVA::gTools().SplitString( myMethodList, ',' );
for (UInt_t i=0; i<mlist.size(); i++) {
std::string regMethod(mlist[i]);
if (Use.find(regMethod) == Use.end()) {
std::cout << "Method \"" << regMethod << "\" not known in TMVA under this name. Choose among the following:" << std::endl;
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) std::cout << it->first << " ";
std::cout << std::endl;
return 1;
}
Use[regMethod] = 1;
}
}
// --------------------------------------------------------------------------------------------------
// Here the preparation phase begins
// Read training and test data
// (it is also possible to use ASCII format as input -> see TMVA Users Guide)
TFile *input(0);
TString fname = "./tmva_class_example.root";
if (!gSystem->AccessPathName( fname )) {
input = TFile::Open( fname ); // check if file in local directory exists
}
else {
input = TFile::Open("http://root.cern.ch/files/tmva_class_example.root", "CACHEREAD");
}
if (!input) {
std::cout << "ERROR: could not open data file" << std::endl;
exit(1);
}
std::cout << "--- TMVAClassification : Using input file: " << input->GetName() << std::endl;
// Register the training and test trees
TTree *signalTree = (TTree*)input->Get("TreeS");
TTree *background = (TTree*)input->Get("TreeB");
// Create a ROOT output file where TMVA will store ntuples, histograms, etc.
TString outfileName( "TMVA.root" );
TFile* outputFile = TFile::Open( outfileName, "RECREATE" );
// Create the factory object. Later you can choose the methods
// whose performance you'd like to investigate. The factory is
// the only TMVA object you have to interact with
//
// The first argument is the base of the name of all the
// weightfiles in the directory weight/
//
// The second argument is the output file for the training results
// All TMVA output can be suppressed by removing the "!" (not) in
// front of the "Silent" argument in the option string
TMVA::Factory *factory = new TMVA::Factory( "TMVAClassification", outputFile,
"!V:!Silent:Color:DrawProgressBar:Transformations=I;D;P;G,D:AnalysisType=Classification" );
// If you wish to modify default settings
// (please check "src/Config.h" to see all available global options)
//
// (TMVA::gConfig().GetVariablePlotting()).fTimesRMS = 8.0;
// (TMVA::gConfig().GetIONames()).fWeightFileDir = "myWeightDirectory";
// Define the input variables that shall be used for the MVA training
// note that you may also use variable expressions, such as: "3*var1/var2*abs(var3)"
// [all types of expressions that can also be parsed by TTree::Draw( "expression" )]
dataloader->AddVariable( "myvar1 := var1+var2", 'F' );
dataloader->AddVariable( "myvar2 := var1-var2", "Expression 2", "", 'F' );
dataloader->AddVariable( "var3", "Variable 3", "units", 'F' );
dataloader->AddVariable( "var4", "Variable 4", "units", 'F' );
// You can add so-called "Spectator variables", which are not used in the MVA training,
// but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
// input variables, the response values of all trained MVAs, and the spectator variables
dataloader->AddSpectator( "spec1 := var1*2", "Spectator 1", "units", 'F' );
dataloader->AddSpectator( "spec2 := var1*3", "Spectator 2", "units", 'F' );
// global event weights per tree (see below for setting event-wise weights)
Double_t signalWeight = 1.0;
Double_t backgroundWeight = 1.0;
// You can add an arbitrary number of signal or background trees
dataloader->AddSignalTree ( signalTree, signalWeight );
dataloader->AddBackgroundTree( background, backgroundWeight );
// To give different trees for training and testing, do as follows:
//
// dataloader->AddSignalTree( signalTrainingTree, signalTrainWeight, "Training" );
// dataloader->AddSignalTree( signalTestTree, signalTestWeight, "Test" );
// Use the following code instead of the above two or four lines to add signal and background
// training and test events "by hand"
// NOTE that in this case one should not give expressions (such as "var1+var2") in the input
// variable definition, but simply compute the expression before adding the event
// ```cpp
// // --- begin ----------------------------------------------------------
// std::vector<Double_t> vars( 4 ); // vector has size of number of input variables
// Float_t treevars[4], weight;
//
// // Signal
// for (UInt_t ivar=0; ivar<4; ivar++) signalTree->SetBranchAddress( Form( "var%i", ivar+1 ), &(treevars[ivar]) );
// for (UInt_t i=0; i<signalTree->GetEntries(); i++) {
// signalTree->GetEntry(i);
// for (UInt_t ivar=0; ivar<4; ivar++) vars[ivar] = treevars[ivar];
// // add training and test events; here: first half is training, second is testing
// // note that the weight can also be event-wise
// if (i < signalTree->GetEntries()/2.0) dataloader->AddSignalTrainingEvent( vars, signalWeight );
// else dataloader->AddSignalTestEvent ( vars, signalWeight );
// }
//
// // Background (has event weights)
// background->SetBranchAddress( "weight", &weight );
// for (UInt_t ivar=0; ivar<4; ivar++) background->SetBranchAddress( Form( "var%i", ivar+1 ), &(treevars[ivar]) );
// for (UInt_t i=0; i<background->GetEntries(); i++) {
// background->GetEntry(i);
// for (UInt_t ivar=0; ivar<4; ivar++) vars[ivar] = treevars[ivar];
// // add training and test events; here: first half is training, second is testing
// // note that the weight can also be event-wise
// if (i < background->GetEntries()/2) dataloader->AddBackgroundTrainingEvent( vars, backgroundWeight*weight );
// else dataloader->AddBackgroundTestEvent ( vars, backgroundWeight*weight );
// }
// // --- end ------------------------------------------------------------
// ```
// End of tree registration
// Set individual event weights (the variables must exist in the original TTree)
// - for signal : `dataloader->SetSignalWeightExpression ("weight1*weight2");`
// - for background: `dataloader->SetBackgroundWeightExpression("weight1*weight2");`
dataloader->SetBackgroundWeightExpression( "weight" );
// Apply additional cuts on the signal and background samples (can be different)
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = ""; // for example: TCut mycutb = "abs(var1)<0.5";
// Tell the dataloader how to use the training and testing events
//
// If no numbers of events are given, half of the events in the tree are used
// for training, and the other half for testing:
//
// dataloader->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );
//
// To also specify the number of testing events, use:
//
// dataloader->PrepareTrainingAndTestTree( mycut,
// "NSigTrain=3000:NBkgTrain=3000:NSigTest=3000:NBkgTest=3000:SplitMode=Random:!V" );
dataloader->PrepareTrainingAndTestTree( mycuts, mycutb,
"nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V" );
// ### Book MVA methods
//
// Please lookup the various method configuration options in the corresponding cxx files, eg:
// src/MethoCuts.cxx, etc, or here: http://tmva.sourceforge.net/optionRef.html
// it is possible to preset ranges in the option string in which the cut optimisation should be done:
// "...:CutRangeMin[2]=-1:CutRangeMax[2]=1"...", where [2] is the third input variable
// Cut optimisation
if (Use["Cuts"])
factory->BookMethod( dataloader, TMVA::Types::kCuts, "Cuts",
"!H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart" );
if (Use["CutsD"])
factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsD",
"!H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart:VarTransform=Decorrelate" );
if (Use["CutsPCA"])
factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsPCA",
"!H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart:VarTransform=PCA" );
if (Use["CutsGA"])
factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsGA",
"H:!V:FitMethod=GA:CutRangeMin[0]=-10:CutRangeMax[0]=10:VarProp[1]=FMax:EffSel:Steps=30:Cycles=3:PopSize=400:SC_steps=10:SC_rate=5:SC_factor=0.95" );
if (Use["CutsSA"])
factory->BookMethod( dataloader, TMVA::Types::kCuts, "CutsSA",
"!H:!V:FitMethod=SA:EffSel:MaxCalls=150000:KernelTemp=IncAdaptive:InitialTemp=1e+6:MinTemp=1e-6:Eps=1e-10:UseDefaultScale" );
// Likelihood ("naive Bayes estimator")
if (Use["Likelihood"])
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "Likelihood",
"H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
// Decorrelated likelihood
if (Use["LikelihoodD"])
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodD",
"!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmooth=5:NAvEvtPerBin=50:VarTransform=Decorrelate" );
// PCA-transformed likelihood
if (Use["LikelihoodPCA"])
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodPCA",
"!H:!V:!TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmooth=5:NAvEvtPerBin=50:VarTransform=PCA" );
// Use a kernel density estimator to approximate the PDFs
if (Use["LikelihoodKDE"])
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodKDE",
"!H:!V:!TransformOutput:PDFInterpol=KDE:KDEtype=Gauss:KDEiter=Adaptive:KDEFineFactor=0.3:KDEborder=None:NAvEvtPerBin=50" );
// Use a variable-dependent mix of splines and kernel density estimator
if (Use["LikelihoodMIX"])
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "LikelihoodMIX",
"!H:!V:!TransformOutput:PDFInterpolSig[0]=KDE:PDFInterpolBkg[0]=KDE:PDFInterpolSig[1]=KDE:PDFInterpolBkg[1]=KDE:PDFInterpolSig[2]=Spline2:PDFInterpolBkg[2]=Spline2:PDFInterpolSig[3]=Spline2:PDFInterpolBkg[3]=Spline2:KDEtype=Gauss:KDEiter=Nonadaptive:KDEborder=None:NAvEvtPerBin=50" );
// Test the multi-dimensional probability density estimator
// here are the options strings for the MinMax and RMS methods, respectively:
//
// "!H:!V:VolumeRangeMode=MinMax:DeltaFrac=0.2:KernelEstimator=Gauss:GaussSigma=0.3" );
// "!H:!V:VolumeRangeMode=RMS:DeltaFrac=3:KernelEstimator=Gauss:GaussSigma=0.3" );
if (Use["PDERS"])
factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERS",
"!H:!V:NormTree=T:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=400:NEventsMax=600" );
if (Use["PDERSD"])
factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERSD",
"!H:!V:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=400:NEventsMax=600:VarTransform=Decorrelate" );
if (Use["PDERSPCA"])
factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERSPCA",
"!H:!V:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=400:NEventsMax=600:VarTransform=PCA" );
// Multi-dimensional likelihood estimator using self-adapting phase-space binning
if (Use["PDEFoam"])
factory->BookMethod( dataloader, TMVA::Types::kPDEFoam, "PDEFoam",
"!H:!V:SigBgSeparate=F:TailCut=0.001:VolFrac=0.0666:nActiveCells=500:nSampl=2000:nBin=5:Nmin=100:Kernel=None:Compress=T" );
if (Use["PDEFoamBoost"])
factory->BookMethod( dataloader, TMVA::Types::kPDEFoam, "PDEFoamBoost",
"!H:!V:Boost_Num=30:Boost_Transform=linear:SigBgSeparate=F:MaxDepth=4:UseYesNoCell=T:DTLogic=MisClassificationError:FillFoamWithOrigWeights=F:TailCut=0:nActiveCells=500:nBin=20:Nmin=400:Kernel=None:Compress=T" );
// K-Nearest Neighbour classifier (KNN)
if (Use["KNN"])
factory->BookMethod( dataloader, TMVA::Types::kKNN, "KNN",
"H:nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );
// H-Matrix (chi2-squared) method
if (Use["HMatrix"])
factory->BookMethod( dataloader, TMVA::Types::kHMatrix, "HMatrix", "!H:!V:VarTransform=None" );
// Linear discriminant (same as Fisher discriminant)
if (Use["LD"])
factory->BookMethod( dataloader, TMVA::Types::kLD, "LD", "H:!V:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10" );
// Fisher discriminant (same as LD)
if (Use["Fisher"])
factory->BookMethod( dataloader, TMVA::Types::kFisher, "Fisher", "H:!V:Fisher:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10" );
// Fisher with Gauss-transformed input variables
if (Use["FisherG"])
factory->BookMethod( dataloader, TMVA::Types::kFisher, "FisherG", "H:!V:VarTransform=Gauss" );
// Composite classifier: ensemble (tree) of boosted Fisher classifiers
if (Use["BoostedFisher"])
factory->BookMethod( dataloader, TMVA::Types::kFisher, "BoostedFisher",
"H:!V:Boost_Num=20:Boost_Transform=log:Boost_Type=AdaBoost:Boost_AdaBoostBeta=0.2:!Boost_DetailedMonitoring" );
// Function discrimination analysis (FDA) -- test of various fitters - the recommended one is Minuit (or GA or SA)
if (Use["FDA_MC"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MC",
"H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=MC:SampleSize=100000:Sigma=0.1" );
if (Use["FDA_GA"]) // can also use Simulated Annealing (SA) algorithm (see Cuts_SA options])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_GA",
"H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=GA:PopSize=100:Cycles=2:Steps=5:Trim=True:SaveBestGen=1" );
if (Use["FDA_SA"]) // can also use Simulated Annealing (SA) algorithm (see Cuts_SA options])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_SA",
"H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=SA:MaxCalls=15000:KernelTemp=IncAdaptive:InitialTemp=1e+6:MinTemp=1e-6:Eps=1e-10:UseDefaultScale" );
if (Use["FDA_MT"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MT",
"H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=2:UseImprove:UseMinos:SetBatch" );
if (Use["FDA_GAMT"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_GAMT",
"H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=GA:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:Cycles=1:PopSize=5:Steps=5:Trim" );
if (Use["FDA_MCMT"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MCMT",
"H:!V:Formula=(0)+(1)*x0+(2)*x1+(3)*x2+(4)*x3:ParRanges=(-1,1);(-10,10);(-10,10);(-10,10);(-10,10):FitMethod=MC:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:SampleSize=20" );
// TMVA ANN: MLP (recommended ANN) -- all ANNs in TMVA are Multilayer Perceptrons
if (Use["MLP"])
factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLP", "H:!V:NeuronType=tanh:VarTransform=N:NCycles=600:HiddenLayers=N+5:TestRate=5:!UseRegulator" );
if (Use["MLPBFGS"])
factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLPBFGS", "H:!V:NeuronType=tanh:VarTransform=N:NCycles=600:HiddenLayers=N+5:TestRate=5:TrainingMethod=BFGS:!UseRegulator" );
if (Use["MLPBNN"])
factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLPBNN", "H:!V:NeuronType=tanh:VarTransform=N:NCycles=60:HiddenLayers=N+5:TestRate=5:TrainingMethod=BFGS:UseRegulator" ); // BFGS training with bayesian regulators
// Multi-architecture DNN implementation.
if (Use["DNN_CPU"] or Use["DNN_GPU"]) {
// General layout.
TString layoutString ("Layout=TANH|128,TANH|128,TANH|128,LINEAR");
// Training strategies.
TString training0("LearningRate=1e-1,Momentum=0.9,Repetitions=1,"
"ConvergenceSteps=20,BatchSize=256,TestRepetitions=10,"
"WeightDecay=1e-4,Regularization=L2,"
"DropConfig=0.0+0.5+0.5+0.5, Multithreading=True");
TString training1("LearningRate=1e-2,Momentum=0.9,Repetitions=1,"
"ConvergenceSteps=20,BatchSize=256,TestRepetitions=10,"
"WeightDecay=1e-4,Regularization=L2,"
"DropConfig=0.0+0.0+0.0+0.0, Multithreading=True");
TString training2("LearningRate=1e-3,Momentum=0.0,Repetitions=1,"
"ConvergenceSteps=20,BatchSize=256,TestRepetitions=10,"
"WeightDecay=1e-4,Regularization=L2,"
"DropConfig=0.0+0.0+0.0+0.0, Multithreading=True");
TString trainingStrategyString ("TrainingStrategy=");
trainingStrategyString += training0 + "|" + training1 + "|" + training2;
// General Options.
TString dnnOptions ("!H:V:ErrorStrategy=CROSSENTROPY:VarTransform=N:"
"WeightInitialization=XAVIERUNIFORM");
dnnOptions.Append (":"); dnnOptions.Append (layoutString);
dnnOptions.Append (":"); dnnOptions.Append (trainingStrategyString);
// Cuda implementation.
if (Use["DNN_GPU"]) {
TString gpuOptions = dnnOptions + ":Architecture=GPU";
factory->BookMethod(dataloader, TMVA::Types::kDNN, "DNN_GPU", gpuOptions);
}
// Multi-core CPU implementation.
if (Use["DNN_CPU"]) {
TString cpuOptions = dnnOptions + ":Architecture=CPU";
factory->BookMethod(dataloader, TMVA::Types::kDNN, "DNN_CPU", cpuOptions);
}
}
// CF(Clermont-Ferrand)ANN
if (Use["CFMlpANN"])
factory->BookMethod( dataloader, TMVA::Types::kCFMlpANN, "CFMlpANN", "!H:!V:NCycles=200:HiddenLayers=N+1,N" ); // n_cycles:#nodes:#nodes:...
// Tmlp(Root)ANN
if (Use["TMlpANN"])
factory->BookMethod( dataloader, TMVA::Types::kTMlpANN, "TMlpANN", "!H:!V:NCycles=200:HiddenLayers=N+1,N:LearningMethod=BFGS:ValidationFraction=0.3" ); // n_cycles:#nodes:#nodes:...
// Support Vector Machine
if (Use["SVM"])
factory->BookMethod( dataloader, TMVA::Types::kSVM, "SVM", "Gamma=0.25:Tol=0.001:VarTransform=Norm" );
// Boosted Decision Trees
if (Use["BDTG"]) // Gradient Boost
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTG",
"!H:!V:NTrees=1000:MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=2" );
if (Use["BDT"]) // Adaptive Boost
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDT",
"!H:!V:NTrees=850:MinNodeSize=2.5%:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );
if (Use["BDTB"]) // Bagging
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTB",
"!H:!V:NTrees=400:BoostType=Bagging:SeparationType=GiniIndex:nCuts=20" );
if (Use["BDTD"]) // Decorrelation + Adaptive Boost
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTD",
"!H:!V:NTrees=400:MinNodeSize=5%:MaxDepth=3:BoostType=AdaBoost:SeparationType=GiniIndex:nCuts=20:VarTransform=Decorrelate" );
if (Use["BDTF"]) // Allow Using Fisher discriminant in node splitting for (strong) linearly correlated variables
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTF",
"!H:!V:NTrees=50:MinNodeSize=2.5%:UseFisherCuts:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20" );
// RuleFit -- TMVA implementation of Friedman's method
if (Use["RuleFit"])
factory->BookMethod( dataloader, TMVA::Types::kRuleFit, "RuleFit",
"H:!V:RuleFitModule=RFTMVA:Model=ModRuleLinear:MinImp=0.001:RuleMinDist=0.001:NTrees=20:fEventsMin=0.01:fEventsMax=0.5:GDTau=-1.0:GDTauPrec=0.01:GDStep=0.01:GDNSteps=10000:GDErrScale=1.02" );
// For an example of the category classifier usage, see: TMVAClassificationCategory
//
// --------------------------------------------------------------------------------------------------
// Now you can optimize the setting (configuration) of the MVAs using the set of training events
// STILL EXPERIMENTAL and only implemented for BDT's !
//
// factory->OptimizeAllMethods("SigEffAt001","Scan");
// factory->OptimizeAllMethods("ROCIntegral","FitGA");
//
// --------------------------------------------------------------------------------------------------
// Now you can tell the factory to train, test, and evaluate the MVAs
//
// Train MVAs using the set of training events
factory->TrainAllMethods();
// Evaluate all MVAs using the set of test events
factory->TestAllMethods();
// Evaluate and compare performance of all configured MVAs
factory->EvaluateAllMethods();
// --------------------------------------------------------------
// Save the output
outputFile->Close();
std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVAClassification is done!" << std::endl;
delete factory;
delete dataloader;
// Launch the GUI for the root macros
if (!gROOT->IsBatch()) TMVA::TMVAGui( outfileName );
return 0;
}
int main( int argc, char** argv )
{
// Select methods (don't look at this code - not of interest)
TString methodList;
for (int i=1; i<argc; i++) {
TString regMethod(argv[i]);
if(regMethod=="-b" || regMethod=="--batch") continue;
if (!methodList.IsNull()) methodList += TString(",");
methodList += regMethod;
}
return TMVAClassification(methodList);
}
Author
Andreas Hoecker

Definition in file TMVAClassification.C.