Logo ROOT   6.10/02
Reference Guide
TMVARegression.C File Reference

Detailed Description

This macro provides examples for the training and testing of the TMVA classifiers.

As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables.

The methods to be used can be switched on and off by means of booleans, or via the prompt command, for example:

root -l TMVARegression.C\(\"LD,MLP\"\)

(note that the backslashes are mandatory) If no method given, a default set is used.

The output file "TMVAReg.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro.

Processing /builddir/build/BUILD/root-6.10.02/tutorials/tmva/TMVARegression.C...
==> Start TMVARegression
--- TMVARegression : Using input file: ./files/tmva_reg_example.root
DataSetInfo : [dataset] : Added class "Regression"
: Add Tree TreeR of type Regression with 10000 events
: Dataset[dataset] : Class index : 0 name : Regression
Factory : Booking method: PDEFoam
:
DataSetFactory : [dataset] : Number of events in input trees
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Regression -- training events : 1000
: Regression -- testing events : 9000
: Regression -- training and testing events: 10000
:
DataSetInfo : Correlation matrix (Regression):
: ------------------------
: var1 var2
: var1: +1.000 -0.017
: var2: -0.017 +1.000
: ------------------------
DataSetFactory : [dataset] :
:
Factory : Booking method: KNN
:
Factory : Booking method: LD
:
Factory : Booking method: FDA_GA
:
FDA_GA : [dataset] : Create Transformation "Norm" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
: Input : target 'fvalue' <---> Output : target 'fvalue'
: Create parameter interval for parameter 0 : [-100,100]
: Create parameter interval for parameter 1 : [-100,100]
: Create parameter interval for parameter 2 : [-100,100]
: User-defined formula string : "(0)+(1)*x0+(2)*x1"
: TFormula-compatible formula string: "[0]+[1]*[3]+[2]*[4]"
Factory : Booking method: MLP
:
MLP : [dataset] : Create Transformation "Norm" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
: Input : target 'fvalue' <---> Output : target 'fvalue'
MLP : Building Network.
: Initializing weights
Factory : Booking method: BDTG
:
<WARNING> : Value for option maxdepth was previously set to 3
: the option *InverseBoostNegWeights* does not exist for BoostType=Grad --> change
: to new default for GradBoost *Pray*
Factory : Train all methods
Factory : [dataset] : Create Transformation "I" with events from all classes.
:
: Transformation, Variable selection :
: Input : variable 'var1' <---> Output : variable 'var1'
: Input : variable 'var2' <---> Output : variable 'var2'
TFHandler_Factory : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.4152 1.1962 [ 0.0026062 4.9957 ]
: var2: 2.4350 1.4125 [ 0.0092062 4.9990 ]
: fvalue: 164.97 82.189 [ 1.7144 391.23 ]
: -----------------------------------------------------------
: Ranking input variables (method unspecific)...
IdTransformation : Ranking result (top variable is best ranked)
: --------------------------------------------
: Rank : Variable : |Correlation with target|
: --------------------------------------------
: 1 : var2 : 7.418e-01
: 2 : var1 : 5.999e-01
: --------------------------------------------
IdTransformation : Ranking result (top variable is best ranked)
: -------------------------------------
: Rank : Variable : Mutual information
: -------------------------------------
: 1 : var2 : 2.029e+00
: 2 : var1 : 1.950e+00
: -------------------------------------
IdTransformation : Ranking result (top variable is best ranked)
: ------------------------------------
: Rank : Variable : Correlation Ratio
: ------------------------------------
: 1 : var1 : 6.546e+00
: 2 : var2 : 2.461e+00
: ------------------------------------
IdTransformation : Ranking result (top variable is best ranked)
: ----------------------------------------
: Rank : Variable : Correlation Ratio (T)
: ----------------------------------------
: 1 : var2 : 9.176e-01
: 2 : var1 : 2.977e-01
: ----------------------------------------
Factory : Train method: PDEFoam for Regression
:
: Build mono target regression foam
: Elapsed time: 2.79 sec
: Elapsed time for training with 1000 events: 2.82 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of PDEFoam on training sample
: Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.0279 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVARegression_PDEFoam.weights.xml
: writing foam MonoTargetRegressionFoam to file
: Foams written to file: dataset/weights/TMVARegression_PDEFoam.weights_foams.root
Factory : Training finished
:
Factory : Train method: KNN for Regression
:
KNN : <Train> start...
: Reading 1000 events
: Number of signal events 1000
: Number of background events 0
: Creating kd-tree with 1000 events
: Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN : Optimizing tree for 2 variables with 1000 values
: <Fill> Class 1 has 1000 events
: Elapsed time for training with 1000 events: 0.00566 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of KNN on training sample
: Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.0335 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVARegression_KNN.weights.xml
Factory : Training finished
:
Factory : Train method: LD for Regression
:
LD : Results for LD coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +42.090
: var2: +44.604
: (offset): -87.385
: -----------------------
: Elapsed time for training with 1000 events: 0.001 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of LD on training sample
: Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.00597 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVARegression_LD.weights.xml
Factory : Training finished
:
Factory : Train method: FDA_GA for Regression
:
TFHandler_FDA_GA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.36693 0.47914 [ -1.0000 1.0000 ]
: var2: -0.027695 0.56614 [ -1.0000 1.0000 ]
: fvalue: -0.16174 0.42200 [ -1.0000 1.0000 ]
: -----------------------------------------------------------
FitterBase : <GeneticFitter> Optimisation, please be patient ... (inaccurate progress timing for GA)
: Elapsed time: 62.9 sec
FDA_GA : Results for parameter fit using "GA" fitter:
: -----------------------
: Parameter: Fit result:
: -----------------------
: Par(0): -0.345151
: Par(1): 0.539931
: Par(2): 0.569534
: -----------------------
: Discriminator expression: "(0)+(1)*x0+(2)*x1"
: Value of estimator at minimum: 0.0094457
: Elapsed time for training with 1000 events: 63.7 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of FDA_GA on training sample
: Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.012 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVARegression_FDA_GA.weights.xml
Factory : Training finished
:
Factory : Train method: MLP for Regression
:
TFHandler_MLP : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.36693 0.47914 [ -1.0000 1.0000 ]
: var2: -0.027695 0.56614 [ -1.0000 1.0000 ]
: fvalue: -0.16174 0.42200 [ -1.0000 1.0000 ]
: -----------------------------------------------------------
: Training Network
:
: Inaccurate progress timing for MLP...
<WARNING> : Line search increased error! Something is wrong.fLastAlpha=2.5046al123=0.986836 2.96051 8.88152 err1=0.00131353 errfinal=0.00131354
<WARNING> : Line search increased error! Something is wrong.fLastAlpha=5.07736al123=2 6 18 err1=0.00131344 errfinal=0.00131344
: Elapsed time for training with 1000 events: 29.5 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of MLP on training sample
: Dataset[dataset] : Elapsed time for evaluation of 1000 events: 0.0165 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVARegression_MLP.weights.xml
: Write special histos to file: TMVAReg.root:/dataset/Method_MLP/MLP
Factory : Training finished
:
Factory : Train method: BDTG for Regression
:
: Regression Loss Function: Huber
: Training 2000 Decision Trees ... patience please
: Elapsed time for training with 1000 events: 3.9 sec
: Dataset[dataset] : Create results for training
: Dataset[dataset] : Evaluation of BDTG on training sample
: Dataset[dataset] : Elapsed time for evaluation of 1000 events: 1.3 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
: Creating xml weight file: dataset/weights/TMVARegression_BDTG.weights.xml
: TMVAReg.root:/dataset/Method_BDTG/BDTG
Factory : Training finished
:
Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Read foams from file: dataset/weights/TMVARegression_PDEFoam.weights_foams.root
: Creating kd-tree with 1000 events
: Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
ModulekNN : Optimizing tree for 2 variables with 1000 values
: <Fill> Class 1 has 1000 events
: User-defined formula string : "(0)+(1)*x0+(2)*x1"
: TFormula-compatible formula string: "[0]+[1]*[3]+[2]*[4]"
MLP : Building Network.
: Initializing weights
Factory : Test all methods
Factory : Test method: PDEFoam for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of PDEFoam on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.241 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: KNN for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of KNN on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.296 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: LD for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of LD on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.048 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: FDA_GA for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of FDA_GA on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.0925 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: MLP for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of MLP on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 9000 events: 0.126 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Test method: BDTG for Regression performance
:
: Dataset[dataset] : Create results for testing
: Dataset[dataset] : Evaluation of BDTG on testing sample
: Dataset[dataset] : Elapsed time for evaluation of 9000 events: 7.75 sec
: Create variable histograms
: Create regression target histograms
: Create regression average deviation
: Results created
Factory : Evaluate all methods
: Evaluate regression method: PDEFoam
TFHandler_PDEFoam : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3308 1.1858 [ 0.00020069 5.0000 ]
: var2: 2.4914 1.4394 [ 0.00071490 5.0000 ]
: fvalue: 164.02 83.934 [ 1.6186 394.84 ]
: -----------------------------------------------------------
: Evaluate regression method: KNN
TFHandler_KNN : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3308 1.1858 [ 0.00020069 5.0000 ]
: var2: 2.4914 1.4394 [ 0.00071490 5.0000 ]
: fvalue: 164.02 83.934 [ 1.6186 394.84 ]
: -----------------------------------------------------------
: Evaluate regression method: LD
TFHandler_LD : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3308 1.1858 [ 0.00020069 5.0000 ]
: var2: 2.4914 1.4394 [ 0.00071490 5.0000 ]
: fvalue: 164.02 83.934 [ 1.6186 394.84 ]
: -----------------------------------------------------------
: Evaluate regression method: FDA_GA
TFHandler_FDA_GA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.33312 0.47497 [ -1.0010 1.0017 ]
: var2: -0.0050675 0.57694 [ -1.0034 1.0004 ]
: fvalue: -0.16662 0.43096 [ -1.0005 1.0185 ]
: -----------------------------------------------------------
TFHandler_FDA_GA : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.33312 0.47497 [ -1.0010 1.0017 ]
: var2: -0.0050675 0.57694 [ -1.0034 1.0004 ]
: fvalue: -0.16662 0.43096 [ -1.0005 1.0185 ]
: -----------------------------------------------------------
: Evaluate regression method: MLP
TFHandler_MLP : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.33312 0.47497 [ -1.0010 1.0017 ]
: var2: -0.0050675 0.57694 [ -1.0034 1.0004 ]
: fvalue: -0.16662 0.43096 [ -1.0005 1.0185 ]
: -----------------------------------------------------------
TFHandler_MLP : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 0.33312 0.47497 [ -1.0010 1.0017 ]
: var2: -0.0050675 0.57694 [ -1.0034 1.0004 ]
: fvalue: -0.16662 0.43096 [ -1.0005 1.0185 ]
: -----------------------------------------------------------
: Evaluate regression method: BDTG
TFHandler_BDTG : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: 3.3308 1.1858 [ 0.00020069 5.0000 ]
: var2: 2.4914 1.4394 [ 0.00071490 5.0000 ]
: fvalue: 164.02 83.934 [ 1.6186 394.84 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on test sample:
: ("Bias" quotes the mean deviation of the regression from true target.
: "MutInf" is the "Mutual Information" between regression and target.
: Indicated by "_T" are the corresponding "truncated" quantities ob-
: tained when removing events deviating more than 2sigma from average.)
: --------------------------------------------------------------------------------------------------
: --------------------------------------------------------------------------------------------------
: dataset MLP : -0.0166 -0.0160 0.309 0.298 | 3.435 3.438
: dataset BDTG : 0.213 0.173 2.38 1.87 | 3.119 3.184
: dataset KNN : -0.511 0.432 5.76 3.78 | 2.871 2.903
: dataset PDEFoam : -1.23 -0.960 9.63 7.88 | 2.263 2.341
: dataset FDA_GA : -0.302 1.39 19.8 17.9 | 1.990 1.980
: dataset LD : -0.0832 1.61 19.7 17.9 | 1.987 1.981
: --------------------------------------------------------------------------------------------------
:
: Evaluation results ranked by smallest RMS on training sample:
: (overtraining check)
: --------------------------------------------------------------------------------------------------
: DataSet Name: MVA Method: <Bias> <Bias_T> RMS RMS_T | MutInf MutInf_T
: --------------------------------------------------------------------------------------------------
: dataset MLP :-0.000756 0.00366 0.305 0.294 | 3.435 3.432
: dataset BDTG : 0.0297 0.00807 0.487 0.261 | 3.430 3.462
: dataset KNN : -0.525 0.296 5.55 3.82 | 2.936 2.951
: dataset PDEFoam : 1.97e-07 0.344 7.90 6.18 | 2.517 2.583
: dataset FDA_GA : -0.208 1.57 18.9 16.9 | 2.113 2.105
: dataset LD : 2.81e-06 1.76 18.9 16.9 | 2.106 2.104
: --------------------------------------------------------------------------------------------------
:
Dataset:dataset : Created tree 'TestTree' with 9000 events
:
Dataset:dataset : Created tree 'TrainTree' with 1000 events
:
Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
==> Wrote root file: TMVAReg.root
==> TMVARegression is done!
#include <cstdlib>
#include <iostream>
#include <map>
#include <string>
#include "TChain.h"
#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TObjString.h"
#include "TSystem.h"
#include "TROOT.h"
#include "TMVA/Tools.h"
#include "TMVA/Factory.h"
using namespace TMVA;
void TMVARegression( TString myMethodList = "" )
{
// The explicit loading of the shared libTMVA is done in TMVAlogon.C, defined in .rootrc
// if you use your private .rootrc, or run from a different directory, please copy the
// corresponding lines from .rootrc
// methods to be processed can be given as an argument; use format:
//
// mylinux~> root -l TMVARegression.C\(\"myMethod1,myMethod2,myMethod3\"\)
//
//---------------------------------------------------------------
// This loads the library
// Default MVA methods to be trained + tested
std::map<std::string,int> Use;
// Mutidimensional likelihood and Nearest-Neighbour methods
Use["PDERS"] = 0;
Use["PDEFoam"] = 1;
Use["KNN"] = 1;
//
// Linear Discriminant Analysis
Use["LD"] = 1;
//
// Function Discriminant analysis
Use["FDA_GA"] = 1;
Use["FDA_MC"] = 0;
Use["FDA_MT"] = 0;
Use["FDA_GAMT"] = 0;
//
// Neural Network
Use["MLP"] = 1;
Use["DNN_CPU"] = 0;
//
// Support Vector Machine
Use["SVM"] = 0;
//
// Boosted Decision Trees
Use["BDT"] = 0;
Use["BDTG"] = 1;
// ---------------------------------------------------------------
std::cout << std::endl;
std::cout << "==> Start TMVARegression" << std::endl;
// Select methods (don't look at this code - not of interest)
if (myMethodList != "") {
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) it->second = 0;
std::vector<TString> mlist = gTools().SplitString( myMethodList, ',' );
for (UInt_t i=0; i<mlist.size(); i++) {
std::string regMethod(mlist[i]);
if (Use.find(regMethod) == Use.end()) {
std::cout << "Method \"" << regMethod << "\" not known in TMVA under this name. Choose among the following:" << std::endl;
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) std::cout << it->first << " ";
std::cout << std::endl;
return;
}
Use[regMethod] = 1;
}
}
// --------------------------------------------------------------------------------------------------
// Here the preparation phase begins
// Create a new root output file
TString outfileName( "TMVAReg.root" );
TFile* outputFile = TFile::Open( outfileName, "RECREATE" );
// Create the factory object. Later you can choose the methods
// whose performance you'd like to investigate. The factory will
// then run the performance analysis for you.
//
// The first argument is the base of the name of all the
// weightfiles in the directory weight/
//
// The second argument is the output file for the training results
// All TMVA output can be suppressed by removing the "!" (not) in
// front of the "Silent" argument in the option string
TMVA::Factory *factory = new TMVA::Factory( "TMVARegression", outputFile,
"!V:!Silent:Color:DrawProgressBar:AnalysisType=Regression" );
TMVA::DataLoader *dataloader=new TMVA::DataLoader("dataset");
// If you wish to modify default settings
// (please check "src/Config.h" to see all available global options)
//
// (TMVA::gConfig().GetVariablePlotting()).fTimesRMS = 8.0;
// (TMVA::gConfig().GetIONames()).fWeightFileDir = "myWeightDirectory";
// Define the input variables that shall be used for the MVA training
// note that you may also use variable expressions, such as: "3*var1/var2*abs(var3)"
// [all types of expressions that can also be parsed by TTree::Draw( "expression" )]
dataloader->AddVariable( "var1", "Variable 1", "units", 'F' );
dataloader->AddVariable( "var2", "Variable 2", "units", 'F' );
// You can add so-called "Spectator variables", which are not used in the MVA training,
// but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
// input variables, the response values of all trained MVAs, and the spectator variables
dataloader->AddSpectator( "spec1:=var1*2", "Spectator 1", "units", 'F' );
dataloader->AddSpectator( "spec2:=var1*3", "Spectator 2", "units", 'F' );
// Add the variable carrying the regression target
dataloader->AddTarget( "fvalue" );
// It is also possible to declare additional targets for multi-dimensional regression, ie:
// factory->AddTarget( "fvalue2" );
// BUT: this is currently ONLY implemented for MLP
// Read training and test data (see TMVAClassification for reading ASCII files)
// load the signal and background event samples from ROOT trees
TFile *input(0);
TString fname = "./tmva_reg_example.root";
if (!gSystem->AccessPathName( fname )) {
input = TFile::Open( fname ); // check if file in local directory exists
}
else {
input = TFile::Open("http://root.cern.ch/files/tmva_reg_example.root", "CACHEREAD"); // if not: download from ROOT server
}
if (!input) {
std::cout << "ERROR: could not open data file" << std::endl;
exit(1);
}
std::cout << "--- TMVARegression : Using input file: " << input->GetName() << std::endl;
// Register the regression tree
TTree *regTree = (TTree*)input->Get("TreeR");
// global event weights per tree (see below for setting event-wise weights)
Double_t regWeight = 1.0;
// You can add an arbitrary number of regression trees
dataloader->AddRegressionTree( regTree, regWeight );
// This would set individual event weights (the variables defined in the
// expression need to exist in the original TTree)
dataloader->SetWeightExpression( "var1", "Regression" );
// Apply additional cuts on the signal and background samples (can be different)
TCut mycut = ""; // for example: TCut mycut = "abs(var1)<0.5 && abs(var2-0.5)<1";
// tell the DataLoader to use all remaining events in the trees after training for testing:
dataloader->PrepareTrainingAndTestTree( mycut,
"nTrain_Regression=1000:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
//
// dataloader->PrepareTrainingAndTestTree( mycut,
// "nTrain_Regression=0:nTest_Regression=0:SplitMode=Random:NormMode=NumEvents:!V" );
// If no numbers of events are given, half of the events in the tree are used
// for training, and the other half for testing:
//
// dataloader->PrepareTrainingAndTestTree( mycut, "SplitMode=random:!V" );
// Book MVA methods
//
// Please lookup the various method configuration options in the corresponding cxx files, eg:
// src/MethoCuts.cxx, etc, or here: http://tmva.sourceforge.net/optionRef.html
// it is possible to preset ranges in the option string in which the cut optimisation should be done:
// "...:CutRangeMin[2]=-1:CutRangeMax[2]=1"...", where [2] is the third input variable
// PDE - RS method
if (Use["PDERS"])
factory->BookMethod( dataloader, TMVA::Types::kPDERS, "PDERS",
"!H:!V:NormTree=T:VolumeRangeMode=Adaptive:KernelEstimator=Gauss:GaussSigma=0.3:NEventsMin=40:NEventsMax=60:VarTransform=None" );
// And the options strings for the MinMax and RMS methods, respectively:
//
// "!H:!V:VolumeRangeMode=MinMax:DeltaFrac=0.2:KernelEstimator=Gauss:GaussSigma=0.3" );
// "!H:!V:VolumeRangeMode=RMS:DeltaFrac=3:KernelEstimator=Gauss:GaussSigma=0.3" );
if (Use["PDEFoam"])
factory->BookMethod( dataloader, TMVA::Types::kPDEFoam, "PDEFoam",
"!H:!V:MultiTargetRegression=F:TargetSelection=Mpv:TailCut=0.001:VolFrac=0.0666:nActiveCells=500:nSampl=2000:nBin=5:Compress=T:Kernel=None:Nmin=10:VarTransform=None" );
// K-Nearest Neighbour classifier (KNN)
if (Use["KNN"])
factory->BookMethod( dataloader, TMVA::Types::kKNN, "KNN",
"nkNN=20:ScaleFrac=0.8:SigmaFact=1.0:Kernel=Gaus:UseKernel=F:UseWeight=T:!Trim" );
// Linear discriminant
if (Use["LD"])
factory->BookMethod( dataloader, TMVA::Types::kLD, "LD",
"!H:!V:VarTransform=None" );
// Function discrimination analysis (FDA) -- test of various fitters - the recommended one is Minuit (or GA or SA)
if (Use["FDA_MC"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MC",
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=MC:SampleSize=100000:Sigma=0.1:VarTransform=D" );
if (Use["FDA_GA"]) // can also use Simulated Annealing (SA) algorithm (see Cuts_SA options) .. the formula of this example is good for parabolas
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_GA",
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=GA:PopSize=100:Cycles=3:Steps=30:Trim=True:SaveBestGen=1:VarTransform=Norm" );
if (Use["FDA_MT"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_MT",
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100);(-10,10):FitMethod=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=2:UseImprove:UseMinos:SetBatch" );
if (Use["FDA_GAMT"])
factory->BookMethod( dataloader, TMVA::Types::kFDA, "FDA_GAMT",
"!H:!V:Formula=(0)+(1)*x0+(2)*x1:ParRanges=(-100,100);(-100,100);(-100,100):FitMethod=GA:Converger=MINUIT:ErrorLevel=1:PrintLevel=-1:FitStrategy=0:!UseImprove:!UseMinos:SetBatch:Cycles=1:PopSize=5:Steps=5:Trim" );
// Neural network (MLP)
if (Use["MLP"])
factory->BookMethod( dataloader, TMVA::Types::kMLP, "MLP", "!H:!V:VarTransform=Norm:NeuronType=tanh:NCycles=20000:HiddenLayers=N+20:TestRate=6:TrainingMethod=BFGS:Sampling=0.3:SamplingEpoch=0.8:ConvergenceImprove=1e-6:ConvergenceTests=15:!UseRegulator" );
if (Use["DNN_CPU"]) {
/*
TString layoutString ("Layout=TANH|(N+100)*2,LINEAR");
TString layoutString ("Layout=SOFTSIGN|100,SOFTSIGN|50,SOFTSIGN|20,LINEAR");
TString layoutString ("Layout=RELU|300,RELU|100,RELU|30,RELU|10,LINEAR");
TString layoutString ("Layout=SOFTSIGN|50,SOFTSIGN|30,SOFTSIGN|20,SOFTSIGN|10,LINEAR");
TString layoutString ("Layout=TANH|50,TANH|30,TANH|20,TANH|10,LINEAR");
TString layoutString ("Layout=SOFTSIGN|50,SOFTSIGN|20,LINEAR");
TString layoutString ("Layout=TANH|100,TANH|30,LINEAR");
*/
TString layoutString("Layout=TANH|100,LINEAR");
TString training0("LearningRate=1e-5,Momentum=0.5,Repetitions=1,ConvergenceSteps=500,BatchSize=50,"
"TestRepetitions=7,WeightDecay=0.01,Regularization=NONE,DropConfig=0.5+0.5+0.5+0.5,"
"DropRepetitions=2");
TString training1("LearningRate=1e-5,Momentum=0.9,Repetitions=1,ConvergenceSteps=170,BatchSize=30,"
"TestRepetitions=7,WeightDecay=0.01,Regularization=L2,DropConfig=0.1+0.1+0.1,DropRepetitions="
"1");
TString training2("LearningRate=1e-5,Momentum=0.3,Repetitions=1,ConvergenceSteps=150,BatchSize=40,"
"TestRepetitions=7,WeightDecay=0.01,Regularization=NONE");
TString training3("LearningRate=1e-6,Momentum=0.1,Repetitions=1,ConvergenceSteps=500,BatchSize=100,"
"TestRepetitions=7,WeightDecay=0.0001,Regularization=NONE");
TString trainingStrategyString("TrainingStrategy=");
trainingStrategyString += training0 + "|" + training1 + "|" + training2 + "|" + training3;
// TString trainingStrategyString
// ("TrainingStrategy=LearningRate=1e-1,Momentum=0.3,Repetitions=3,ConvergenceSteps=20,BatchSize=30,TestRepetitions=7,WeightDecay=0.0,L1=false,DropFraction=0.0,DropRepetitions=5");
TString nnOptions(
"!H:V:ErrorStrategy=SUMOFSQUARES:VarTransform=G:WeightInitialization=XAVIERUNIFORM:Architecture=CPU");
// TString nnOptions ("!H:V:VarTransform=Normalize:ErrorStrategy=CHECKGRADIENTS");
nnOptions.Append(":");
nnOptions.Append(layoutString);
nnOptions.Append(":");
nnOptions.Append(trainingStrategyString);
factory->BookMethod(dataloader, TMVA::Types::kDNN, "DNN_CPU", nnOptions); // NN
}
// Support Vector Machine
if (Use["SVM"])
factory->BookMethod( dataloader, TMVA::Types::kSVM, "SVM", "Gamma=0.25:Tol=0.001:VarTransform=Norm" );
// Boosted Decision Trees
if (Use["BDT"])
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDT",
"!H:!V:NTrees=100:MinNodeSize=1.0%:BoostType=AdaBoostR2:SeparationType=RegressionVariance:nCuts=20:PruneMethod=CostComplexity:PruneStrength=30" );
if (Use["BDTG"])
factory->BookMethod( dataloader, TMVA::Types::kBDT, "BDTG",
"!H:!V:NTrees=2000::BoostType=Grad:Shrinkage=0.1:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=3:MaxDepth=4" );
// --------------------------------------------------------------------------------------------------
// Now you can tell the factory to train, test, and evaluate the MVAs
// Train MVAs using the set of training events
factory->TrainAllMethods();
// Evaluate all MVAs using the set of test events
factory->TestAllMethods();
// Evaluate and compare performance of all configured MVAs
factory->EvaluateAllMethods();
// --------------------------------------------------------------
// Save the output
outputFile->Close();
std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVARegression is done!" << std::endl;
delete factory;
delete dataloader;
// Launch the GUI for the root macros
if (!gROOT->IsBatch()) TMVA::TMVARegGui( outfileName );
}
int main( int argc, char** argv )
{
// Select methods (don't look at this code - not of interest)
TString methodList;
for (int i=1; i<argc; i++) {
TString regMethod(argv[i]);
if(regMethod=="-b" || regMethod=="--batch") continue;
if (!methodList.IsNull()) methodList += TString(",");
methodList += regMethod;
}
TMVARegression(methodList);
return 0;
}
Author
Andreas Hoecker

Definition in file TMVARegression.C.