ROOT  6.06/08
Reference Guide
MethodDT.cxx
Go to the documentation of this file.
1 // @(#)root/tmva $Id$
2 // Author: Andreas Hoecker, Joerg Stelzer, Helge Voss, Kai Voss
3 
4 /**********************************************************************************
5  * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
6  * Package: TMVA *
7  * Class : MethodDT (DT = Decision Trees) *
8  * Web : http://tmva.sourceforge.net *
9  * *
10  * Description: *
11  * Analysis of Boosted Decision Trees *
12  * *
13  * Authors (alphabetical): *
14  * Andreas Hoecker <Andreas.Hocker@cern.ch> - CERN, Switzerland *
15  * Helge Voss <Helge.Voss@cern.ch> - MPI-K Heidelberg, Germany *
16  * Or Cohen <orcohenor@gmail.com> - Weizmann Inst., Israel *
17  * *
18  * Copyright (c) 2005: *
19  * CERN, Switzerland *
20  * MPI-K Heidelberg, Germany *
21  * *
22  * Redistribution and use in source and binary forms, with or without *
23  * modification, are permitted according to the terms listed in LICENSE *
24  * (http://tmva.sourceforge.net/LICENSE) *
25  **********************************************************************************/
26 
27 //_______________________________________________________________________
28 //
29 // Analysis of Boosted Decision Trees
30 //
31 // Boosted decision trees have been successfully used in High Energy
32 // Physics analysis for example by the MiniBooNE experiment
33 // (Yang-Roe-Zhu, physics/0508045). In Boosted Decision Trees, the
34 // selection is done on a majority vote on the result of several decision
35 // trees, which are all derived from the same training sample by
36 // supplying different event weights during the training.
37 //
38 // Decision trees:
39 //
40 // successive decision nodes are used to categorize the
41 // events out of the sample as either signal or background. Each node
42 // uses only a single discriminating variable to decide if the event is
43 // signal-like ("goes right") or background-like ("goes left"). This
44 // forms a tree like structure with "baskets" at the end (leave nodes),
45 // and an event is classified as either signal or background according to
46 // whether the basket where it ends up has been classified signal or
47 // background during the training. Training of a decision tree is the
48 // process to define the "cut criteria" for each node. The training
49 // starts with the root node. Here one takes the full training event
50 // sample and selects the variable and corresponding cut value that gives
51 // the best separation between signal and background at this stage. Using
52 // this cut criterion, the sample is then divided into two subsamples, a
53 // signal-like (right) and a background-like (left) sample. Two new nodes
54 // are then created for each of the two sub-samples and they are
55 // constructed using the same mechanism as described for the root
56 // node. The devision is stopped once a certain node has reached either a
57 // minimum number of events, or a minimum or maximum signal purity. These
58 // leave nodes are then called "signal" or "background" if they contain
59 // more signal respective background events from the training sample.
60 //
61 // Boosting:
62 //
63 // the idea behind the boosting is, that signal events from the training
64 // sample, that *end up in a background node (and vice versa) are given a
65 // larger weight than events that are in the correct leave node. This
66 // results in a re-weighed training event sample, with which then a new
67 // decision tree can be developed. The boosting can be applied several
68 // times (typically 100-500 times) and one ends up with a set of decision
69 // trees (a forest).
70 //
71 // Bagging:
72 //
73 // In this particular variant of the Boosted Decision Trees the boosting
74 // is not done on the basis of previous training results, but by a simple
75 // stochasitc re-sampling of the initial training event sample.
76 //
77 // Analysis:
78 //
79 // applying an individual decision tree to a test event results in a
80 // classification of the event as either signal or background. For the
81 // boosted decision tree selection, an event is successively subjected to
82 // the whole set of decision trees and depending on how often it is
83 // classified as signal, a "likelihood" estimator is constructed for the
84 // event being signal or background. The value of this estimator is the
85 // one which is then used to select the events from an event sample, and
86 // the cut value on this estimator defines the efficiency and purity of
87 // the selection.
88 //*
89 //_______________________________________________________________________
90 
91 #include <algorithm>
92 #include "Riostream.h"
93 #include "TRandom3.h"
94 #include "TMath.h"
95 #include "TObjString.h"
96 
97 #include "TMVA/ClassifierFactory.h"
98 #include "TMVA/MethodDT.h"
99 #include "TMVA/Tools.h"
100 #include "TMVA/Timer.h"
101 #include "TMVA/Ranking.h"
102 #include "TMVA/SdivSqrtSplusB.h"
103 #include "TMVA/BinarySearchTree.h"
104 #include "TMVA/SeparationBase.h"
105 #include "TMVA/GiniIndex.h"
106 #include "TMVA/CrossEntropy.h"
108 #include "TMVA/MethodBoost.h"
109 #include "TMVA/CCPruner.h"
110 
111 using std::vector;
112 
113 REGISTER_METHOD(DT)
114 
115 ClassImp(TMVA::MethodDT)
116 
117 ////////////////////////////////////////////////////////////////////////////////
118 /// the standard constructor for just an ordinar "decision trees"
119 
120  TMVA::MethodDT::MethodDT( const TString& jobName,
121  const TString& methodTitle,
122  DataSetInfo& theData,
123  const TString& theOption,
124  TDirectory* theTargetDir ) :
125  TMVA::MethodBase( jobName, Types::kDT, methodTitle, theData, theOption, theTargetDir )
126  , fTree(0)
127  , fSepType(0)
128  , fMinNodeEvents(0)
129  , fMinNodeSize(0)
130  , fNCuts(0)
131  , fUseYesNoLeaf(kFALSE)
132  , fNodePurityLimit(0)
133  , fMaxDepth(0)
134  , fErrorFraction(0)
135  , fPruneStrength(0)
136  , fPruneMethod(DecisionTree::kNoPruning)
137  , fAutomatic(kFALSE)
138  , fRandomisedTrees(kFALSE)
139  , fUseNvars(0)
140  , fUsePoissonNvars(0) // don't use this initialisation, only here to make Coverity happy. Is set in Init()
141  , fDeltaPruneStrength(0)
142 {
143 }
144 
145 ////////////////////////////////////////////////////////////////////////////////
146 ///constructor from Reader
147 
149  const TString& theWeightFile,
150  TDirectory* theTargetDir ) :
151  TMVA::MethodBase( Types::kDT, dsi, theWeightFile, theTargetDir )
152  , fTree(0)
153  , fSepType(0)
154  , fMinNodeEvents(0)
155  , fMinNodeSize(0)
156  , fNCuts(0)
157  , fUseYesNoLeaf(kFALSE)
158  , fNodePurityLimit(0)
159  , fMaxDepth(0)
160  , fErrorFraction(0)
161  , fPruneStrength(0)
162  , fPruneMethod(DecisionTree::kNoPruning)
163  , fAutomatic(kFALSE)
164  , fRandomisedTrees(kFALSE)
165  , fUseNvars(0)
166  , fDeltaPruneStrength(0)
167 {
168 }
169 
170 ////////////////////////////////////////////////////////////////////////////////
171 /// FDA can handle classification with 2 classes and regression with one regression-target
172 
174 {
175  if( type == Types::kClassification && numberClasses == 2 ) return kTRUE;
176  return kFALSE;
177 }
178 
179 
180 ////////////////////////////////////////////////////////////////////////////////
181 /// define the options (their key words) that can be set in the option string
182 /// UseRandomisedTrees choose at each node splitting a random set of variables
183 /// UseNvars use UseNvars variables in randomised trees
184 /// SeparationType the separation criterion applied in the node splitting
185 /// known: GiniIndex
186 /// MisClassificationError
187 /// CrossEntropy
188 /// SDivSqrtSPlusB
189 /// nEventsMin: the minimum number of events in a node (leaf criteria, stop splitting)
190 /// nCuts: the number of steps in the optimisation of the cut for a node (if < 0, then
191 /// step size is determined by the events)
192 /// UseYesNoLeaf decide if the classification is done simply by the node type, or the S/B
193 /// (from the training) in the leaf node
194 /// NodePurityLimit the minimum purity to classify a node as a signal node (used in pruning and boosting to determine
195 /// misclassification error rate)
196 /// PruneMethod The Pruning method:
197 /// known: NoPruning // switch off pruning completely
198 /// ExpectedError
199 /// CostComplexity
200 /// PruneStrength a parameter to adjust the amount of pruning. Should be large enouth such that overtraining is avoided");
201 
203 {
204  DeclareOptionRef(fRandomisedTrees,"UseRandomisedTrees","Choose at each node splitting a random set of variables and *bagging*");
205  DeclareOptionRef(fUseNvars,"UseNvars","Number of variables used if randomised Tree option is chosen");
206  DeclareOptionRef(fUsePoissonNvars,"UsePoissonNvars", "Interpret \"UseNvars\" not as fixed number but as mean of a Possion distribution in each split with RandomisedTree option");
207  DeclareOptionRef(fUseYesNoLeaf=kTRUE, "UseYesNoLeaf",
208  "Use Sig or Bkg node type or the ratio S/B as classification in the leaf node");
209  DeclareOptionRef(fNodePurityLimit=0.5, "NodePurityLimit", "In boosting/pruning, nodes with purity > NodePurityLimit are signal; background otherwise.");
210  DeclareOptionRef(fSepTypeS="GiniIndex", "SeparationType", "Separation criterion for node splitting");
211  AddPreDefVal(TString("MisClassificationError"));
212  AddPreDefVal(TString("GiniIndex"));
213  AddPreDefVal(TString("CrossEntropy"));
214  AddPreDefVal(TString("SDivSqrtSPlusB"));
215  DeclareOptionRef(fMinNodeEvents=-1, "nEventsMin", "deprecated !!! Minimum number of events required in a leaf node");
216  DeclareOptionRef(fMinNodeSizeS, "MinNodeSize", "Minimum percentage of training events required in a leaf node (default: Classification: 10%, Regression: 1%)");
217  DeclareOptionRef(fNCuts, "nCuts", "Number of steps during node cut optimisation");
218  DeclareOptionRef(fPruneStrength, "PruneStrength", "Pruning strength (negative value == automatic adjustment)");
219  DeclareOptionRef(fPruneMethodS="NoPruning", "PruneMethod", "Pruning method: NoPruning (switched off), ExpectedError or CostComplexity");
220 
221  AddPreDefVal(TString("NoPruning"));
222  AddPreDefVal(TString("ExpectedError"));
223  AddPreDefVal(TString("CostComplexity"));
224 
225  if (DoRegression()) {
226  DeclareOptionRef(fMaxDepth=50,"MaxDepth","Max depth of the decision tree allowed");
227  }else{
228  DeclareOptionRef(fMaxDepth=3,"MaxDepth","Max depth of the decision tree allowed");
229  }
230 }
231 
233  // options that are used ONLY for the READER to ensure backward compatibility
234 
236 
237  DeclareOptionRef(fPruneBeforeBoost=kFALSE, "PruneBeforeBoost",
238  "--> removed option .. only kept for reader backward compatibility");
239 }
240 
241 ////////////////////////////////////////////////////////////////////////////////
242 /// the option string is decoded, for available options see "DeclareOptions"
243 
245 {
246  fSepTypeS.ToLower();
247  if (fSepTypeS == "misclassificationerror") fSepType = new MisClassificationError();
248  else if (fSepTypeS == "giniindex") fSepType = new GiniIndex();
249  else if (fSepTypeS == "crossentropy") fSepType = new CrossEntropy();
250  else if (fSepTypeS == "sdivsqrtsplusb") fSepType = new SdivSqrtSplusB();
251  else {
252  Log() << kINFO << GetOptions() << Endl;
253  Log() << kFATAL << "<ProcessOptions> unknown Separation Index option called" << Endl;
254  }
255 
256  // std::cout << "fSeptypes " << fSepTypeS << " fseptype " << fSepType << std::endl;
257 
260  else if (fPruneMethodS == "costcomplexity" ) fPruneMethod = DecisionTree::kCostComplexityPruning;
261  else if (fPruneMethodS == "nopruning" ) fPruneMethod = DecisionTree::kNoPruning;
262  else {
263  Log() << kINFO << GetOptions() << Endl;
264  Log() << kFATAL << "<ProcessOptions> unknown PruneMethod option:" << fPruneMethodS <<" called" << Endl;
265  }
266 
267  if (fPruneStrength < 0) fAutomatic = kTRUE;
268  else fAutomatic = kFALSE;
270  Log() << kFATAL
271  << "Sorry autmoatic pruning strength determination is not implemented yet for ExpectedErrorPruning" << Endl;
272  }
273 
274 
275  if (this->Data()->HasNegativeEventWeights()){
276  Log() << kINFO << " You are using a Monte Carlo that has also negative weights. "
277  << "That should in principle be fine as long as on average you end up with "
278  << "something positive. For this you have to make sure that the minimal number "
279  << "of (un-weighted) events demanded for a tree node (currently you use: MinNodeSize="
280  <<fMinNodeSizeS
281  <<", (or the deprecated equivalent nEventsMin) you can set this via the "
282  <<"MethodDT option string when booking the "
283  << "classifier) is large enough to allow for reasonable averaging!!! "
284  << " If this does not help.. maybe you want to try the option: IgnoreNegWeightsInTraining "
285  << "which ignores events with negative weight in the training. " << Endl
286  << Endl << "Note: You'll get a WARNING message during the training if that should ever happen" << Endl;
287  }
288 
289  if (fRandomisedTrees){
290  Log() << kINFO << " Randomised trees should use *bagging* as *boost* method. Did you set this in the *MethodBoost* ? . Here I can enforce only the *no pruning*" << Endl;
292  // fBoostType = "Bagging";
293  }
294 
295  if (fMinNodeEvents > 0){
297  Log() << kWARNING << "You have explicitly set *nEventsMin*, the min ablsolut number \n"
298  << "of events in a leaf node. This is DEPRECATED, please use the option \n"
299  << "*MinNodeSize* giving the relative number as percentage of training \n"
300  << "events instead. \n"
301  << "nEventsMin="<<fMinNodeEvents<< "--> MinNodeSize="<<fMinNodeSize<<"%"
302  << Endl;
303  }else{
305  }
306 }
307 
309  if (sizeInPercent > 0 && sizeInPercent < 50){
310  fMinNodeSize=sizeInPercent;
311 
312  } else {
313  Log() << kERROR << "you have demanded a minimal node size of "
314  << sizeInPercent << "% of the training events.. \n"
315  << " that somehow does not make sense "<<Endl;
316  }
317 
318 }
320  sizeInPercent.ReplaceAll("%","");
321  if (sizeInPercent.IsAlnum()) SetMinNodeSize(sizeInPercent.Atof());
322  else {
323  Log() << kERROR << "I had problems reading the option MinNodeEvents, which\n"
324  << "after removing a possible % sign now reads " << sizeInPercent << Endl;
325  }
326 }
327 
328 
329 
330 ////////////////////////////////////////////////////////////////////////////////
331 /// common initialisation with defaults for the DT-Method
332 
334 {
335  fMinNodeEvents = -1;
336  fMinNodeSize = 5;
337  fMinNodeSizeS = "5%";
338  fNCuts = 20;
340  fPruneStrength = 5; // -1 means automatic determination of the prune strength using a validation sample
343  fUseNvars = GetNvar();
345 
346  // reference cut value to distingiush signal-like from background-like events
349  fMaxDepth = 3;
350  }else {
351  fMaxDepth = 50;
352  }
353 }
354 
355 ////////////////////////////////////////////////////////////////////////////////
356 ///destructor
357 
359 {
360  delete fTree;
361 }
362 
363 ////////////////////////////////////////////////////////////////////////////////
364 
366 {
370  fTree->SetNVars(GetNvar());
371  if (fRandomisedTrees) Log()<<kWARNING<<" randomised Trees do not work yet in this framework,"
372  << " as I do not know how to give each tree a new random seed, now they"
373  << " will be all the same and that is not good " << Endl;
375 
376  //fTree->BuildTree(GetEventCollection(Types::kTraining));
378  UInt_t nevents = Data()->GetNTrainingEvents();
379  std::vector<const TMVA::Event*> tmp;
380  for (Long64_t ievt=0; ievt<nevents; ievt++) {
381  const Event *event = GetEvent(ievt);
382  tmp.push_back(event);
383  }
384  fTree->BuildTree(tmp);
386 
388 }
389 
390 ////////////////////////////////////////////////////////////////////////////////
391 /// prune the decision tree if requested (good for individual trees that are best grown out, and then
392 /// pruned back, while boosted decision trees are best 'small' trees to start with. Well, at least the
393 /// standard "optimal pruning algorithms" don't result in 'weak enough' classifiers !!
394 
396 {
397  // remember the number of nodes beforehand (for monitoring purposes)
398 
399 
400  if (fAutomatic && fPruneMethod == DecisionTree::kCostComplexityPruning) { // automatic cost complexity pruning
401  CCPruner* pruneTool = new CCPruner(fTree, this->Data() , fSepType);
402  pruneTool->Optimize();
403  std::vector<DecisionTreeNode*> nodes = pruneTool->GetOptimalPruneSequence();
405  for(UInt_t i = 0; i < nodes.size(); i++)
406  fTree->PruneNode(nodes[i]);
407  delete pruneTool;
408  }
410  /*
411 
412  Double_t alpha = 0;
413  Double_t delta = fDeltaPruneStrength;
414 
415  DecisionTree* dcopy;
416  std::vector<Double_t> q;
417  multimap<Double_t,Double_t> quality;
418  Int_t nnodes=fTree->GetNNodes();
419 
420  // find the maxiumum prune strength that still leaves some nodes
421  Bool_t forceStop = kFALSE;
422  Int_t troubleCount=0, previousNnodes=nnodes;
423 
424 
425  nnodes=fTree->GetNNodes();
426  while (nnodes > 3 && !forceStop) {
427  dcopy = new DecisionTree(*fTree);
428  dcopy->SetPruneStrength(alpha+=delta);
429  dcopy->PruneTree();
430  q.push_back(TestTreeQuality(dcopy));
431  quality.insert(std::pair<const Double_t,Double_t>(q.back(),alpha));
432  nnodes=dcopy->GetNNodes();
433  if (previousNnodes == nnodes) troubleCount++;
434  else {
435  troubleCount=0; // reset counter
436  if (nnodes < previousNnodes / 2 ) fDeltaPruneStrength /= 2.;
437  }
438  previousNnodes = nnodes;
439  if (troubleCount > 20) {
440  if (methodIndex == 0 && fPruneStrength <=0) {//maybe you need larger stepsize ??
441  fDeltaPruneStrength *= 5;
442  Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
443  << " for Tree " << methodIndex
444  << " --> first try to increase the step size"
445  << " currently Prunestrenght= " << alpha
446  << " stepsize " << fDeltaPruneStrength << " " << Endl;
447  troubleCount = 0; // try again
448  fPruneStrength = 1; // if it was for the first time..
449  } else if (methodIndex == 0 && fPruneStrength <=2) {//maybe you need much larger stepsize ??
450  fDeltaPruneStrength *= 5;
451  Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
452  << " for Tree " << methodIndex
453  << " --> try to increase the step size even more.. "
454  << " if that still didn't work, TRY IT BY HAND"
455  << " currently Prunestrenght= " << alpha
456  << " stepsize " << fDeltaPruneStrength << " " << Endl;
457  troubleCount = 0; // try again
458  fPruneStrength = 3; // if it was for the first time..
459  } else {
460  forceStop=kTRUE;
461  Log() << kINFO << "<PruneTree> trouble determining optimal prune strength"
462  << " for Tree " << methodIndex << " at tested prune strength: " << alpha << " --> abort forced, use same strength as for previous tree:"
463  << fPruneStrength << Endl;
464  }
465  }
466  if (fgDebugLevel==1) Log() << kINFO << "Pruneed with ("<<alpha
467  << ") give quality: " << q.back()
468  << " and #nodes: " << nnodes
469  << Endl;
470  delete dcopy;
471  }
472  if (!forceStop) {
473  multimap<Double_t,Double_t>::reverse_iterator it=quality.rend();
474  it++;
475  fPruneStrength = it->second;
476  // adjust the step size for the next tree.. think that 20 steps are sort of
477  // fine enough.. could become a tunable option later..
478  fDeltaPruneStrength *= Double_t(q.size())/20.;
479  }
480 
481  fTree->SetPruneStrength(fPruneStrength);
482  fTree->PruneTree();
483  */
484  }
485  else {
487  fTree->PruneTree();
488  }
489 
490  return fPruneStrength;
491 }
492 
493 ////////////////////////////////////////////////////////////////////////////////
494 
496 {
498  // test the tree quality.. in terms of Miscalssification
499  Double_t SumCorrect=0,SumWrong=0;
500  for (Long64_t ievt=0; ievt<Data()->GetNEvents(); ievt++)
501  {
502  const Event * ev = Data()->GetEvent(ievt);
503  if ((dt->CheckEvent(ev) > dt->GetNodePurityLimit() ) == DataInfo().IsSignal(ev)) SumCorrect+=ev->GetWeight();
504  else SumWrong+=ev->GetWeight();
505  }
507  return SumCorrect / (SumCorrect + SumWrong);
508 }
509 
510 ////////////////////////////////////////////////////////////////////////////////
511 
512 void TMVA::MethodDT::AddWeightsXMLTo( void* parent ) const
513 {
514  fTree->AddXMLTo(parent);
515  //Log() << kFATAL << "Please implement writing of weights as XML" << Endl;
516 }
517 
518 ////////////////////////////////////////////////////////////////////////////////
519 
521 {
522  if(fTree)
523  delete fTree;
524  fTree = new DecisionTree();
526 }
527 
528 ////////////////////////////////////////////////////////////////////////////////
529 
530 void TMVA::MethodDT::ReadWeightsFromStream( std::istream& istr )
531 {
532  delete fTree;
533  fTree = new DecisionTree();
534  fTree->Read(istr);
535 }
536 
537 ////////////////////////////////////////////////////////////////////////////////
538 /// returns MVA value
539 
541 {
542  // cannot determine error
543  NoErrorCalc(err, errUpper);
544 
546 }
547 
548 ////////////////////////////////////////////////////////////////////////////////
549 
551 {
552 }
553 ////////////////////////////////////////////////////////////////////////////////
554 
556 {
557  return 0;
558 }
Types::EAnalysisType fAnalysisType
Definition: MethodBase.h:517
virtual void * AddXMLTo(void *parent) const
add attributes to XML
Definition: BinaryTree.cxx:131
void Optimize()
determine the pruning sequence
Definition: CCPruner.cxx:94
MsgLogger & Endl(MsgLogger &ml)
Definition: MsgLogger.h:162
Bool_t fUsePoissonNvars
Definition: MethodDT.h:143
long long Long64_t
Definition: RtypesCore.h:69
Double_t CheckEvent(const TMVA::Event *, Bool_t UseYesNoLeaf=kFALSE) const
the event e is put into the decision tree (starting at the root node) and the output is NodeType (sig...
void Init(void)
common initialisation with defaults for the DT-Method
Definition: MethodDT.cxx:333
virtual void Read(std::istream &istr, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
Read the binary tree from an input stream.
Definition: BinaryTree.cxx:166
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition: TString.h:635
UInt_t GetNvar() const
Definition: MethodBase.h:309
Double_t GetNodePurityLimit() const
Definition: DecisionTree.h:170
TString fPruneMethodS
Definition: MethodDT.h:139
DecisionTree::EPruneMethod fPruneMethod
Definition: MethodDT.h:138
MsgLogger & Log() const
Definition: Configurable.h:130
OptionBase * DeclareOptionRef(T &ref, const TString &name, const TString &desc="")
EAnalysisType
Definition: Types.h:124
Float_t GetOptimalPruneStrength() const
Definition: CCPruner.h:92
Basic string class.
Definition: TString.h:137
void ToLower()
Change string to lower-case.
Definition: TString.cxx:1088
bool Bool_t
Definition: RtypesCore.h:59
virtual Bool_t HasAnalysisType(Types::EAnalysisType type, UInt_t numberClasses, UInt_t numberTargets)
FDA can handle classification with 2 classes and regression with one regression-target.
Definition: MethodDT.cxx:173
const Bool_t kFALSE
Definition: Rtypes.h:92
Double_t fNodePurityLimit
Definition: MethodDT.h:132
Double_t PruneTree()
prune the decision tree if requested (good for individual trees that are best grown out...
Definition: MethodDT.cxx:395
Bool_t fPruneBeforeBoost
Definition: MethodDT.h:151
void SetMinNodeSize(Double_t sizeInPercent)
Definition: MethodDT.cxx:308
Double_t fPruneStrength
Definition: MethodDT.h:137
void SetAnalysisType(Types::EAnalysisType t)
Definition: DecisionTree.h:197
Int_t fUseNvars
Definition: MethodDT.h:142
Bool_t fAutomatic
Definition: MethodDT.h:140
UInt_t fMaxDepth
Definition: MethodDT.h:133
UInt_t GetTrainingTMVAVersionCode() const
Definition: MethodBase.h:343
void ReadWeightsFromStream(std::istream &istr)
Definition: MethodDT.cxx:530
const Event * GetEvent() const
Definition: MethodBase.h:667
Int_t fMinNodeEvents
Definition: MethodDT.h:126
DataSet * Data() const
Definition: MethodBase.h:363
Bool_t IsAlnum() const
Returns true if all characters in string are alphanumeric.
Definition: TString.cxx:1789
void DeclareOptions()
define the options (their key words) that can be set in the option string UseRandomisedTrees choose a...
Definition: MethodDT.cxx:202
MethodDT(const TString &jobName, const TString &methodTitle, DataSetInfo &theData, const TString &theOption="", TDirectory *theTargetDir=0)
the standard constructor for just an ordinar "decision trees"
Definition: MethodDT.cxx:120
DataSetInfo & DataInfo() const
Definition: MethodBase.h:364
Bool_t DoRegression() const
Definition: MethodBase.h:392
Double_t fDeltaPruneStrength
Definition: MethodDT.h:146
Double_t GetWeight() const
return the event weight - depending on whether the flag IgnoreNegWeightsInTraining is or not...
Definition: Event.cxx:376
Long64_t GetNTrainingEvents() const
Definition: DataSet.h:90
void AddWeightsXMLTo(void *parent) const
Definition: MethodDT.cxx:512
void ProcessOptions()
the option string is decoded, for available options see "DeclareOptions"
Definition: MethodDT.cxx:244
void SetNVars(Int_t n)
Definition: DecisionTree.h:202
TString fSepTypeS
Definition: MethodDT.h:125
void SetPruneStrength(Double_t p)
Definition: DecisionTree.h:154
unsigned int UInt_t
Definition: RtypesCore.h:42
DecisionTree * fTree
Definition: MethodDT.h:122
virtual void ReadXML(void *node, UInt_t tmva_Version_Code=TMVA_VERSION_CODE)
read attributes from XML
Definition: BinaryTree.cxx:141
Float_t fMinNodeSize
Definition: MethodDT.h:127
Double_t TestTreeQuality(DecisionTree *dt)
Definition: MethodDT.cxx:495
Int_t fNCuts
Definition: MethodDT.h:130
#define ClassImp(name)
Definition: Rtypes.h:279
double Double_t
Definition: RtypesCore.h:55
std::vector< TMVA::DecisionTreeNode * > GetOptimalPruneSequence() const
return the prune strength (=alpha) corresponding to the prune sequence
Definition: CCPruner.cxx:210
Bool_t fRandomisedTrees
Definition: MethodDT.h:141
Describe directory structure in memory.
Definition: TDirectory.h:41
int type
Definition: TGX11.cxx:120
Double_t GetMvaValue(Double_t *err=0, Double_t *errUpper=0)
returns MVA value
Definition: MethodDT.cxx:540
void SetCurrentType(Types::ETreeType type) const
Definition: DataSet.h:111
TString fMinNodeSizeS
Definition: MethodDT.h:128
void AddPreDefVal(const T &)
Definition: Configurable.h:177
void Train(void)
Definition: MethodDT.cxx:365
const TString & GetOptions() const
Definition: Configurable.h:91
void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodDT.cxx:232
#define REGISTER_METHOD(CLASS)
for example
Double_t PruneTree(const EventConstList *validationSample=NULL)
prune (get rid of internal nodes) the Decision tree to avoid overtraining serveral different pruning ...
Abstract ClassifierFactory template that handles arbitrary types.
virtual void DeclareCompatibilityOptions()
options that are used ONLY for the READER to ensure backward compatibility they are hence without any...
Definition: MethodBase.cxx:599
UInt_t BuildTree(const EventConstList &eventSample, DecisionTreeNode *node=NULL)
building the decision tree by recursively calling the splitting of one (root-) node into two daughter...
virtual ~MethodDT(void)
destructor
Definition: MethodDT.cxx:358
void ReadWeightsFromXML(void *wghtnode)
Definition: MethodDT.cxx:520
Long64_t GetNEvents(Types::ETreeType type=Types::kMaxTreeType) const
Definition: DataSet.h:225
Bool_t IsSignal(const Event *ev) const
Double_t Atof() const
Return floating-point value contained in string.
Definition: TString.cxx:2030
Types::EAnalysisType GetAnalysisType() const
Definition: MethodBase.h:391
Bool_t fUseYesNoLeaf
Definition: MethodDT.h:131
const Bool_t kTRUE
Definition: Rtypes.h:91
const Event * GetEvent() const
Definition: DataSet.cxx:180
void GetHelpMessage() const
Definition: MethodDT.cxx:550
void NoErrorCalc(Double_t *const err, Double_t *const errUpper)
Definition: MethodBase.cxx:820
void SetSignalReferenceCut(Double_t cut)
Definition: MethodBase.h:329
const Ranking * CreateRanking()
Definition: MethodDT.cxx:555
void PruneNode(TMVA::DecisionTreeNode *node)
prune away the subtree below the node
SeparationBase * fSepType
Definition: MethodDT.h:124