GClasses
GClasses::GBayesianModelAveraging Class Reference

This is an ensemble that uses the bagging approach for training, and Bayesian Model Averaging to combine the models. That is, it trains each model with data drawn randomly with replacement from the original training data. It combines the models with weights proporitional to their likelihood as computed using Bayes' law. More...

#include <GEnsemble.h>

Inheritance diagram for GClasses::GBayesianModelAveraging:
GClasses::GBag GClasses::GEnsemble GClasses::GSupervisedLearner GClasses::GTransducer

Public Member Functions

 GBayesianModelAveraging ()
 General-purpose constructor. See also the comment for GSupervisedLearner::GSupervisedLearner. More...
 
 GBayesianModelAveraging (GDomNode *pNode, GLearnerLoader &ll)
 Deserializing constructor. More...
 
virtual ~GBayesianModelAveraging ()
 
virtual GDomNodeserialize (GDom *pDoc) const
 Marshal this object into a DOM, which can then be converted to a variety of serial formats. More...
 
- Public Member Functions inherited from GClasses::GBag
 GBag ()
 General-purpose constructor. See also the comment for GSupervisedLearner::GSupervisedLearner. More...
 
 GBag (GDomNode *pNode, GLearnerLoader &ll)
 Deserializing constructor. More...
 
virtual ~GBag ()
 
void addLearner (GSupervisedLearner *pLearner)
 Adds a learner to the bag. This takes ownership of pLearner (so it will delete it when it's done with it) More...
 
virtual void clear ()
 Calls clears on all of the learners, but does not delete them. More...
 
void flush ()
 Removes and deletes all the learners. More...
 
void setProgressCallback (EnsembleProgressCallback pCB, void *pThis)
 If you want to be notified when another instance begins training, you can set this callback. More...
 
- Public Member Functions inherited from GClasses::GEnsemble
 GEnsemble ()
 General-purpose constructor. See also the comment for GSupervisedLearner::GSupervisedLearner. More...
 
 GEnsemble (GDomNode *pNode, GLearnerLoader &ll)
 Deserializing constructor. More...
 
virtual ~GEnsemble ()
 
void castVote (double weight, const double *pOut)
 Adds the vote from one of the models. (This is called internally. Users typically do not need to call it.) More...
 
std::vector< GWeightedModel * > & models ()
 Returns a reference to the models in the ensemble. More...
 
virtual void predict (const double *pIn, double *pOut)
 See the comment for GSupervisedLearner::predict. More...
 
virtual void predictDistribution (const double *pIn, GPrediction *pOut)
 See the comment for GSupervisedLearner::predictDistribution. More...
 
void setWorkerThreads (size_t count)
 Specify the number of worker threads to use. If count is 1, then no additional threads will be spawned, but the work will all be done by the same thread. If count is 2 or more, that number of worker threads will be spawned. (Note that with fast models, the overhead associated with worker threads is often too high to be worthwhile.) The worker threads are spawned when the first prediction is made. They are kept alive until clear() is called or this object is deleted. If you only want to use worker threads during training, but not when making predictions, you can call this method again to set it back to 1 after training is complete. Since the inheriting class is responsible to implement the train method, some child classes may not implement multi-threaded training. GBag, GBomb, GBayesianModelAveraging, and GBayesianModelCombination all implement multi-threaded training. More...
 
- Public Member Functions inherited from GClasses::GSupervisedLearner
 GSupervisedLearner ()
 General-purpose constructor. More...
 
 GSupervisedLearner (GDomNode *pNode, GLearnerLoader &ll)
 Deserialization constructor. More...
 
virtual ~GSupervisedLearner ()
 Destructor. More...
 
void basicTest (double minAccuracy1, double minAccuracy2, double deviation=1e-6, bool printAccuracy=false, double warnRange=0.035)
 This is a helper method used by the unit tests of several model learners. More...
 
virtual bool canGeneralize ()
 Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows. More...
 
void confusion (GMatrix &features, GMatrix &labels, std::vector< GMatrix * > &stats)
 Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL. More...
 
virtual bool isFilter ()
 Returns false. More...
 
void precisionRecall (double *pOutPrecision, size_t nPrecisionSize, GMatrix &features, GMatrix &labels, size_t label, size_t nReps)
 label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision. More...
 
const GRelationrelFeatures ()
 Returns a reference to the feature relation (meta-data about the input attributes). More...
 
const GRelationrelLabels ()
 Returns a reference to the label relation (meta-data about the output attributes). More...
 
double sumSquaredError (const GMatrix &features, const GMatrix &labels)
 Computes the sum-squared-error for predicting the labels from the features. For categorical labels, Hamming distance is used. More...
 
void train (const GMatrix &features, const GMatrix &labels)
 Call this method to train the model. More...
 
virtual double trainAndTest (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels)
 Trains and tests this learner. Returns sum-squared-error. More...
 
- Public Member Functions inherited from GClasses::GTransducer
 GTransducer ()
 General-purpose constructor. More...
 
 GTransducer (const GTransducer &that)
 Copy-constructor. Throws an exception to prevent models from being copied by value. More...
 
virtual ~GTransducer ()
 
virtual bool canImplicitlyHandleContinuousFeatures ()
 Returns true iff this algorithm can implicitly handle continuous features. If it cannot, then the GDiscretize transform will be used to convert continuous features to nominal values before passing them to it. More...
 
virtual bool canImplicitlyHandleMissingFeatures ()
 Returns true iff this algorithm supports missing feature values. If it cannot, then an imputation filter will be used to predict missing values before any feature-vectors are passed to the algorithm. More...
 
virtual bool canImplicitlyHandleNominalFeatures ()
 Returns true iff this algorithm can implicitly handle nominal features. If it cannot, then the GNominalToCat transform will be used to convert nominal features to continuous values before passing them to it. More...
 
virtual bool canImplicitlyHandleNominalLabels ()
 Returns true iff this algorithm can implicitly handle nominal labels (a.k.a. classification). If it cannot, then the GNominalToCat transform will be used during training to convert nominal labels to continuous values, and to convert categorical predictions back to nominal labels. More...
 
virtual bool canTrainIncrementally ()
 Returns false because semi-supervised learners cannot be trained incrementally. More...
 
double crossValidate (const GMatrix &features, const GMatrix &labels, size_t nFolds, RepValidateCallback pCB=NULL, size_t nRep=0, void *pThis=NULL)
 Perform n-fold cross validation on pData. Returns sum-squared error. Uses trainAndTest for each fold. pCB is an optional callback method for reporting intermediate stats. It can be NULL if you don't want intermediate reporting. nRep is just the rep number that will be passed to the callback. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. More...
 
GTransduceroperator= (const GTransducer &other)
 Throws an exception to prevent models from being copied by value. More...
 
GRandrand ()
 Returns a reference to the random number generator associated with this object. For example, you could use it to change the random seed, to make this algorithm behave differently. This might be important, for example, in an ensemble of learners. More...
 
double repValidate (const GMatrix &features, const GMatrix &labels, size_t reps, size_t nFolds, RepValidateCallback pCB=NULL, void *pThis=NULL)
 Perform cross validation "nReps" times and return the average score. pCB is an optional callback method for reporting intermediate stats It can be NULL if you don't want intermediate reporting. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. More...
 
virtual bool supportedFeatureRange (double *pOutMin, double *pOutMax)
 Returns true if this algorithm supports any feature value, or if it does not implicitly handle continuous features. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...
 
virtual bool supportedLabelRange (double *pOutMin, double *pOutMax)
 Returns true if this algorithm supports any label value, or if it does not implicitly handle continuous labels. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...
 
GMatrixtransduce (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
 Predicts a set of labels to correspond with features2, such that these labels will be consistent with the patterns exhibited by features1 and labels1. More...
 
void transductiveConfusionMatrix (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels, std::vector< GMatrix * > &stats)
 Makes a confusion matrix for a transduction algorithm. More...
 

Static Public Member Functions

static void test ()
 
- Static Public Member Functions inherited from GClasses::GBag
static void test ()
 
- Static Public Member Functions inherited from GClasses::GSupervisedLearner
static void test ()
 Runs some unit tests related to supervised learning. Throws an exception if any problems are found. More...
 

Protected Member Functions

virtual bool canImplicitlyHandleContinuousLabels ()
 See the comment for GLearner::canImplicitlyHandleContinuousLabels. More...
 
virtual void determineWeights (GMatrix &features, GMatrix &labels)
 Determines the weights in the manner of Bayesian model averaging, with the assumption of uniform priors. More...
 
- Protected Member Functions inherited from GClasses::GBag
virtual void determineWeights (const GMatrix &features, const GMatrix &labels)
 Assigns uniform weight to all models. (This method is deliberately virtual so that you can overload it if you want non-uniform weighting.) More...
 
virtual void trainInnerInner (const GMatrix &features, const GMatrix &labels)
 See the comment for GEnsemble::trainInnerInner. More...
 
- Protected Member Functions inherited from GClasses::GEnsemble
virtual void clearBase ()
 Calls clear on all of the models, and resets the accumulator buffer. More...
 
void normalizeWeights ()
 Scales the weights of all the models so they sum to 1.0. More...
 
virtual void serializeBase (GDom *pDoc, GDomNode *pNode) const
 Base classes should call this method to serialize the base object as part of their implementation of the serialize method. More...
 
void tally (GPrediction *pOut)
 Counts all the votes from the models in the bag, assuming you are interested in knowing the distribution. More...
 
void tally (double *pOut)
 Counts all the votes from the models in the bag, assuming you only care to know the winner, and do not care about the distribution. More...
 
virtual void trainInner (const GMatrix &features, const GMatrix &labels)
 Sets up the accumulator buffer (ballot box) then calls trainInnerInner. More...
 
- Protected Member Functions inherited from GClasses::GSupervisedLearner
GDomNodebaseDomNode (GDom *pDoc, const char *szClassName) const
 Child classes should use this in their implementation of serialize. More...
 
size_t precisionRecallContinuous (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label)
 This is a helper method used by precisionRecall. More...
 
size_t precisionRecallNominal (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label, int value)
 This is a helper method used by precisionRecall. More...
 
void setupFilters (const GMatrix &features, const GMatrix &labels)
 This method determines which data filters (normalize, discretize, and/or nominal-to-cat) are needed and trains them. More...
 
virtual GMatrixtransduceInner (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
 See GTransducer::transduce. More...
 

Additional Inherited Members

- Public Attributes inherited from GClasses::GEnsemble
volatile const double * m_pPredictInput
 
- Protected Attributes inherited from GClasses::GBag
EnsembleProgressCallback m_pCB
 
void * m_pThis
 
double m_trainSize
 
- Protected Attributes inherited from GClasses::GEnsemble
std::vector< GWeightedModel * > m_models
 
size_t m_nAccumulatorDims
 
double * m_pAccumulator
 
GRelationm_pLabelRel
 
GMasterThreadm_pPredictMaster
 
size_t m_workerThreads
 
- Protected Attributes inherited from GClasses::GSupervisedLearner
GRelationm_pRelFeatures
 
GRelationm_pRelLabels
 
- Protected Attributes inherited from GClasses::GTransducer
GRand m_rand
 

Detailed Description

This is an ensemble that uses the bagging approach for training, and Bayesian Model Averaging to combine the models. That is, it trains each model with data drawn randomly with replacement from the original training data. It combines the models with weights proporitional to their likelihood as computed using Bayes' law.

Constructor & Destructor Documentation

GClasses::GBayesianModelAveraging::GBayesianModelAveraging ( )
inline

General-purpose constructor. See also the comment for GSupervisedLearner::GSupervisedLearner.

GClasses::GBayesianModelAveraging::GBayesianModelAveraging ( GDomNode pNode,
GLearnerLoader ll 
)
inline

Deserializing constructor.

virtual GClasses::GBayesianModelAveraging::~GBayesianModelAveraging ( )
inlinevirtual

Member Function Documentation

virtual bool GClasses::GBayesianModelAveraging::canImplicitlyHandleContinuousLabels ( )
inlineprotectedvirtual

See the comment for GLearner::canImplicitlyHandleContinuousLabels.

Reimplemented from GClasses::GTransducer.

virtual void GClasses::GBayesianModelAveraging::determineWeights ( GMatrix features,
GMatrix labels 
)
protectedvirtual

Determines the weights in the manner of Bayesian model averaging, with the assumption of uniform priors.

virtual GDomNode* GClasses::GBayesianModelAveraging::serialize ( GDom pDoc) const
virtual

Marshal this object into a DOM, which can then be converted to a variety of serial formats.

Reimplemented from GClasses::GBag.

static void GClasses::GBayesianModelAveraging::test ( )
static