GClasses
GClasses::GSupervisedLearner Class Reference

This is the base class of algorithms that learn with supervision and have an internal hypothesis model that allows them to generalize rows that were not available at training time. More...

#include <GLearner.h>

Inheritance diagram for GClasses::GSupervisedLearner:
GClasses::GTransducer GClasses::GBaselineLearner GClasses::GBucket GClasses::GDecisionTree GClasses::GEnsemble GClasses::GGaussianProcess GClasses::GIdentityFunction GClasses::GIncrementalLearner GClasses::GLinearDistribution GClasses::GLinearRegressor GClasses::GMeanMarginsTree GClasses::GPolynomial GClasses::GRandomForest GClasses::GSparseInstance GClasses::GWag

List of all members.

Public Member Functions

 GSupervisedLearner ()
 General-purpose constructor.
 GSupervisedLearner (GDomNode *pNode, GLearnerLoader &ll)
 Deserialization constructor.
virtual ~GSupervisedLearner ()
 Destructor.
void basicTest (double minAccuracy1, double minAccuracy2, double deviation=1e-6, bool printAccuracy=false)
 This is a helper method used by the unit tests of several model learners.
virtual bool canGeneralize ()
 Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows.
virtual void clear ()=0
 Discards all training for the purpose of freeing memory. If you call this method, you must train before making any predictions. No settings or options are discarded, so you should be able to train again without specifying any other parameters and still get a comparable model.
void confusion (GMatrix &features, GMatrix &labels, std::vector< GMatrix * > &stats)
 Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL.
virtual bool isFilter ()
 Returns false.
void precisionRecall (double *pOutPrecision, size_t nPrecisionSize, GMatrix &features, GMatrix &labels, size_t label, size_t nReps)
 label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision.
virtual void predict (const double *pIn, double *pOut)=0
 Evaluate pIn to compute a prediction for pOut. The model must be trained (by calling train) before the first time that this method is called. pIn and pOut should point to arrays of doubles of the same size as the number of columns in the training matrices that were passed to the train method.
virtual void predictDistribution (const double *pIn, GPrediction *pOut)=0
 Evaluate pIn and compute a prediction for pOut. pOut is expected to point to an array of GPrediction objects which have already been allocated. There should be labelDims() elements in this array. The distributions will be more accurate if the model is calibrated before the first time that this method is called.
const GRelationrelFeatures ()
 Returns a reference to the feature relation (meta-data about the input attributes).
const GRelationrelLabels ()
 Returns a reference to the label relation (meta-data about the output attributes).
virtual GDomNodeserialize (GDom *pDoc) const =0
 Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.)
double sumSquaredError (const GMatrix &features, const GMatrix &labels)
 Computes the sum-squared-error for predicting the labels from the features. For categorical labels, Hamming distance is used.
void train (const GMatrix &features, const GMatrix &labels)
 Call this method to train the model.
virtual double trainAndTest (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels)
 Trains and tests this learner. Returns sum-squared-error.

Static Public Member Functions

static void test ()
 Runs some unit tests related to supervised learning. Throws an exception if any problems are found.

Protected Member Functions

GDomNodebaseDomNode (GDom *pDoc, const char *szClassName) const
 Child classes should use this in their implementation of serialize.
size_t precisionRecallContinuous (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label)
 This is a helper method used by precisionRecall.
size_t precisionRecallNominal (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label, int value)
 This is a helper method used by precisionRecall.
void setupFilters (const GMatrix &features, const GMatrix &labels)
 This method determines which data filters (normalize, discretize, and/or nominal-to-cat) are needed and trains them.
virtual void trainInner (const GMatrix &features, const GMatrix &labels)=0
 This is the implementation of the model's training algorithm. (This method is called by train).
virtual GMatrixtransduceInner (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2)
 See GTransducer::transduce.

Protected Attributes

GRelationm_pRelFeatures
GRelationm_pRelLabels

Detailed Description

This is the base class of algorithms that learn with supervision and have an internal hypothesis model that allows them to generalize rows that were not available at training time.


Constructor & Destructor Documentation

General-purpose constructor.

Deserialization constructor.

Destructor.


Member Function Documentation

GDomNode* GClasses::GSupervisedLearner::baseDomNode ( GDom pDoc,
const char *  szClassName 
) const [protected]

Child classes should use this in their implementation of serialize.

void GClasses::GSupervisedLearner::basicTest ( double  minAccuracy1,
double  minAccuracy2,
double  deviation = 1e-6,
bool  printAccuracy = false 
)

This is a helper method used by the unit tests of several model learners.

virtual bool GClasses::GSupervisedLearner::canGeneralize ( ) [inline, virtual]

Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows.

Reimplemented from GClasses::GTransducer.

virtual void GClasses::GSupervisedLearner::clear ( ) [pure virtual]

Discards all training for the purpose of freeing memory. If you call this method, you must train before making any predictions. No settings or options are discarded, so you should be able to train again without specifying any other parameters and still get a comparable model.

Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GFilter, GClasses::GBucket, GClasses::GWag, GClasses::GResamplingAdaBoost, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GBag, GClasses::GLinearDistribution, GClasses::GKNN, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GPolynomial, and GClasses::GLinearRegressor.

void GClasses::GSupervisedLearner::confusion ( GMatrix features,
GMatrix labels,
std::vector< GMatrix * > &  stats 
)

Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL.

virtual bool GClasses::GSupervisedLearner::isFilter ( ) [inline, virtual]

Returns false.

Reimplemented in GClasses::GFilter, and GClasses::GIncrementalLearner.

void GClasses::GSupervisedLearner::precisionRecall ( double *  pOutPrecision,
size_t  nPrecisionSize,
GMatrix features,
GMatrix labels,
size_t  label,
size_t  nReps 
)

label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision.

size_t GClasses::GSupervisedLearner::precisionRecallContinuous ( GPrediction pOutput,
double *  pFunc,
GMatrix trainFeatures,
GMatrix trainLabels,
GMatrix testFeatures,
GMatrix testLabels,
size_t  label 
) [protected]

This is a helper method used by precisionRecall.

size_t GClasses::GSupervisedLearner::precisionRecallNominal ( GPrediction pOutput,
double *  pFunc,
GMatrix trainFeatures,
GMatrix trainLabels,
GMatrix testFeatures,
GMatrix testLabels,
size_t  label,
int  value 
) [protected]

This is a helper method used by precisionRecall.

virtual void GClasses::GSupervisedLearner::predict ( const double *  pIn,
double *  pOut 
) [pure virtual]

Evaluate pIn to compute a prediction for pOut. The model must be trained (by calling train) before the first time that this method is called. pIn and pOut should point to arrays of doubles of the same size as the number of columns in the training matrices that were passed to the train method.

Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GCalibrator, GClasses::GAutoFilter, GClasses::GLabelFilter, GClasses::GFeatureFilter, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GKNN, GClasses::GEnsemble, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.

virtual void GClasses::GSupervisedLearner::predictDistribution ( const double *  pIn,
GPrediction pOut 
) [pure virtual]

Evaluate pIn and compute a prediction for pOut. pOut is expected to point to an array of GPrediction objects which have already been allocated. There should be labelDims() elements in this array. The distributions will be more accurate if the model is calibrated before the first time that this method is called.

Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GCalibrator, GClasses::GAutoFilter, GClasses::GLabelFilter, GClasses::GFeatureFilter, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GKNN, GClasses::GEnsemble, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.

Returns a reference to the feature relation (meta-data about the input attributes).

Returns a reference to the label relation (meta-data about the output attributes).

void GClasses::GSupervisedLearner::setupFilters ( const GMatrix features,
const GMatrix labels 
) [protected]

This method determines which data filters (normalize, discretize, and/or nominal-to-cat) are needed and trains them.

double GClasses::GSupervisedLearner::sumSquaredError ( const GMatrix features,
const GMatrix labels 
)

Computes the sum-squared-error for predicting the labels from the features. For categorical labels, Hamming distance is used.

void GClasses::GSupervisedLearner::train ( const GMatrix features,
const GMatrix labels 
)

Call this method to train the model.

virtual double GClasses::GSupervisedLearner::trainAndTest ( const GMatrix trainFeatures,
const GMatrix trainLabels,
const GMatrix testFeatures,
const GMatrix testLabels 
) [virtual]

Trains and tests this learner. Returns sum-squared-error.

Reimplemented from GClasses::GTransducer.

virtual GMatrix* GClasses::GSupervisedLearner::transduceInner ( const GMatrix features1,
const GMatrix labels1,
const GMatrix features2 
) [protected, virtual]

Member Data Documentation