GClasses

This is the base class of algorithms that learn with supervision and have an internal hypothesis model that allows them to generalize rows that were not available at training time. More...
#include <GLearner.h>
Public Member Functions  
GSupervisedLearner ()  
Generalpurpose constructor.  
GSupervisedLearner (GDomNode *pNode, GLearnerLoader &ll)  
Deserialization constructor.  
virtual  ~GSupervisedLearner () 
Destructor.  
void  basicTest (double minAccuracy1, double minAccuracy2, double deviation=1e6, bool printAccuracy=false) 
This is a helper method used by the unit tests of several model learners.  
void  calibrate (GMatrix &features, GMatrix &labels) 
Calibrate the model to make predicted distributions reflect the training data. This method should be called after train is called, but before the first time predictDistribution is called. Typically, the same matrices passed as parameters to the train method are also passed as parameters to this method. By default, the mean of continuous labels is predicted as accurately as possible, but the variance only reflects a heuristic measure of confidence. If calibrate is called, however, then logistic regression will be used to map from the heuristic variance estimates to the actual variance as measured in the training data, such that the predicted variance becomes more reliable. Likewise with categorical labels, the mode is predicted as accurately as possible, but the distribution of probability among the categories may not be a very good prediction of the actual distribution of probability unless this method has been called to calibrate them. If you never plan to call predictDistribution, there is no reason to ever call this method.  
virtual bool  canGeneralize () 
Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows.  
virtual void  clear ()=0 
Discards all training for the purpose of freeing memory. If you call this method, you must train before making any predictions. No settings or options are discarded, so you should be able to train again without specifying any other parameters and still get a comparable model.  
virtual void  clearFeatureFilter () 
Clears the filter for features.  
virtual void  clearLabelFilter () 
Clears the filter for labels.  
void  confusion (GMatrix &features, GMatrix &labels, std::vector< GMatrix * > &stats) 
Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL.  
GIncrementalTransform *  featureFilter () 
Returns the current feature filter (or NULL if none has been set).  
GIncrementalTransform *  labelFilter () 
Returns the current label filter (or NULL if none has been set).  
void  precisionRecall (double *pOutPrecision, size_t nPrecisionSize, GMatrix &features, GMatrix &labels, size_t label, size_t nReps) 
label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision.  
void  predict (const double *pIn, double *pOut) 
Evaluate pIn to compute a prediction for pOut. The model must be trained (by calling train) before the first time that this method is called. pIn and pOut should point to arrays of doubles of the same size as the number of columns in the training matrices that were passed to the train method.  
void  predictDistribution (const double *pIn, GPrediction *pOut) 
Evaluate pIn and compute a prediction for pOut. pOut is expected to point to an array of GPrediction objects which have already been allocated. There should be labelDims() elements in this array. The distributions will be more accurate if the model is calibrated before the first time that this method is called.  
const GRelation &  relFeatures () 
Returns a reference to the feature relation (metadata about the input attributes). (Note that this relation describes outer data, and may contain types that are not supported by the inner algorithm.)  
const GRelation &  relLabels () 
Returns a reference to the label relation (metadata about the output attributes). (Note that this relation describes outer data, and may contain types that are not supported by the inner algorithm.)  
virtual GDomNode *  serialize (GDom *pDoc) const =0 
Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.)  
double  sumSquaredError (const GMatrix &features, const GMatrix &labels) 
Computes the sumsquarederror for predicting the labels from the features. For categorical labels, Hamming distance is used.  
void  train (const GMatrix &features, const GMatrix &labels) 
Call this method to train the model. It automatically determines which filters are needed to convert the training features and labels into a form that the model's training algorithm can handle, and then calls trainInner to do the actual training.  
virtual double  trainAndTest (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels) 
Trains and tests this learner. Returns sumsquarederror.  
void  wrapFeatures (GIncrementalTransform *pFilter) 
Wrap whatever feature filter is currently set with the specified filter. Takes ownership of the filter.  
void  wrapLabels (GIncrementalTransform *pFilter) 
Wrap whatever label filter is currently set with the specified filter. Takes ownership of the filter.  
Static Public Member Functions  
static void  test () 
Runs some unit tests related to supervised learning. Throws an exception if any problems are found.  
Protected Member Functions  
GDomNode *  baseDomNode (GDom *pDoc, const char *szClassName) const 
Child classes should use this in their implementation of serialize.  
size_t  precisionRecallContinuous (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label) 
This is a helper method used by precisionRecall.  
size_t  precisionRecallNominal (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label, int value) 
This is a helper method used by precisionRecall.  
virtual void  predictDistributionInner (const double *pIn, GPrediction *pOut)=0 
This is the implementation of the model's prediction algorithm. (This method is called by predictDistribution).  
virtual void  predictInner (const double *pIn, double *pOut)=0 
This is the implementation of the model's prediction algorithm. (This method is called by predict).  
void  setupFilters (const GMatrix &features, const GMatrix &labels) 
This method determines which data filters (normalize, discretize, and/or nominaltocat) are needed and trains them.  
double  sumSquaredErrorInternal (const GMatrix &features, const GMatrix &labels) 
Used to measure SSE with data that has already been converted to the internal format.  
virtual void  trainInner (const GMatrix &features, const GMatrix &labels)=0 
This is the implementation of the model's training algorithm. (This method is called by train).  
virtual GMatrix *  transduceInner (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2) 
See GTransducer::transduce.  
Protected Attributes  
GNeuralNet **  m_pCalibrations 
GIncrementalTransform *  m_pFilterFeatures 
GIncrementalTransform *  m_pFilterLabels 
GRelation *  m_pRelFeatures 
GRelation *  m_pRelLabels 
This is the base class of algorithms that learn with supervision and have an internal hypothesis model that allows them to generalize rows that were not available at training time.
Generalpurpose constructor.
GClasses::GSupervisedLearner::GSupervisedLearner  (  GDomNode *  pNode, 
GLearnerLoader &  ll  
) 
Deserialization constructor.
virtual GClasses::GSupervisedLearner::~GSupervisedLearner  (  )  [virtual] 
Destructor.
GDomNode* GClasses::GSupervisedLearner::baseDomNode  (  GDom *  pDoc, 
const char *  szClassName  
)  const [protected] 
Child classes should use this in their implementation of serialize.
void GClasses::GSupervisedLearner::basicTest  (  double  minAccuracy1, 
double  minAccuracy2,  
double  deviation = 1e6 , 

bool  printAccuracy = false 

) 
This is a helper method used by the unit tests of several model learners.
void GClasses::GSupervisedLearner::calibrate  (  GMatrix &  features, 
GMatrix &  labels  
) 
Calibrate the model to make predicted distributions reflect the training data. This method should be called after train is called, but before the first time predictDistribution is called. Typically, the same matrices passed as parameters to the train method are also passed as parameters to this method. By default, the mean of continuous labels is predicted as accurately as possible, but the variance only reflects a heuristic measure of confidence. If calibrate is called, however, then logistic regression will be used to map from the heuristic variance estimates to the actual variance as measured in the training data, such that the predicted variance becomes more reliable. Likewise with categorical labels, the mode is predicted as accurately as possible, but the distribution of probability among the categories may not be a very good prediction of the actual distribution of probability unless this method has been called to calibrate them. If you never plan to call predictDistribution, there is no reason to ever call this method.
virtual bool GClasses::GSupervisedLearner::canGeneralize  (  )  [inline, virtual] 
Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows.
Reimplemented from GClasses::GTransducer.
virtual void GClasses::GSupervisedLearner::clear  (  )  [pure virtual] 
Discards all training for the purpose of freeing memory. If you call this method, you must train before making any predictions. No settings or options are discarded, so you should be able to train again without specifying any other parameters and still get a comparable model.
Implemented in GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GBucket, GClasses::GWag, GClasses::GNeuralNet, GClasses::GResamplingAdaBoost, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GHingedLinear, GClasses::GMeanMarginsTree, GClasses::GBag, GClasses::GLinearDistribution, GClasses::GKNN, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GPolynomial, and GClasses::GLinearRegressor.
virtual void GClasses::GSupervisedLearner::clearFeatureFilter  (  )  [virtual] 
Clears the filter for features.
Reimplemented in GClasses::GReservoirNet.
virtual void GClasses::GSupervisedLearner::clearLabelFilter  (  )  [virtual] 
Clears the filter for labels.
void GClasses::GSupervisedLearner::confusion  (  GMatrix &  features, 
GMatrix &  labels,  
std::vector< GMatrix * > &  stats  
) 
Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL.
Returns the current feature filter (or NULL if none has been set).
Returns the current label filter (or NULL if none has been set).
void GClasses::GSupervisedLearner::precisionRecall  (  double *  pOutPrecision, 
size_t  nPrecisionSize,  
GMatrix &  features,  
GMatrix &  labels,  
size_t  label,  
size_t  nReps  
) 
label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision.
size_t GClasses::GSupervisedLearner::precisionRecallContinuous  (  GPrediction *  pOutput, 
double *  pFunc,  
GMatrix &  trainFeatures,  
GMatrix &  trainLabels,  
GMatrix &  testFeatures,  
GMatrix &  testLabels,  
size_t  label  
)  [protected] 
This is a helper method used by precisionRecall.
size_t GClasses::GSupervisedLearner::precisionRecallNominal  (  GPrediction *  pOutput, 
double *  pFunc,  
GMatrix &  trainFeatures,  
GMatrix &  trainLabels,  
GMatrix &  testFeatures,  
GMatrix &  testLabels,  
size_t  label,  
int  value  
)  [protected] 
This is a helper method used by precisionRecall.
void GClasses::GSupervisedLearner::predict  (  const double *  pIn, 
double *  pOut  
) 
Evaluate pIn to compute a prediction for pOut. The model must be trained (by calling train) before the first time that this method is called. pIn and pOut should point to arrays of doubles of the same size as the number of columns in the training matrices that were passed to the train method.
void GClasses::GSupervisedLearner::predictDistribution  (  const double *  pIn, 
GPrediction *  pOut  
) 
Evaluate pIn and compute a prediction for pOut. pOut is expected to point to an array of GPrediction objects which have already been allocated. There should be labelDims() elements in this array. The distributions will be more accurate if the model is calibrated before the first time that this method is called.
virtual void GClasses::GSupervisedLearner::predictDistributionInner  (  const double *  pIn, 
GPrediction *  pOut  
)  [protected, pure virtual] 
This is the implementation of the model's prediction algorithm. (This method is called by predictDistribution).
Implemented in GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GNeuralNet, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GHingedLinear, GClasses::GMeanMarginsTree, GClasses::GKNN, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GEnsemble, GClasses::GGaussianProcess, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.
virtual void GClasses::GSupervisedLearner::predictInner  (  const double *  pIn, 
double *  pOut  
)  [protected, pure virtual] 
This is the implementation of the model's prediction algorithm. (This method is called by predict).
Implemented in GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GNeuralNet, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GHingedLinear, GClasses::GMeanMarginsTree, GClasses::GKNN, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GEnsemble, GClasses::GGaussianProcess, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.
const GRelation& GClasses::GSupervisedLearner::relFeatures  (  )  [inline] 
Returns a reference to the feature relation (metadata about the input attributes). (Note that this relation describes outer data, and may contain types that are not supported by the inner algorithm.)
const GRelation& GClasses::GSupervisedLearner::relLabels  (  )  [inline] 
Returns a reference to the label relation (metadata about the output attributes). (Note that this relation describes outer data, and may contain types that are not supported by the inner algorithm.)
virtual GDomNode* GClasses::GSupervisedLearner::serialize  (  GDom *  pDoc  )  const [pure virtual] 
Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.)
Implemented in GClasses::GReservoirNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GBucket, GClasses::GWag, GClasses::GResamplingAdaBoost, GClasses::GSparseInstance, GClasses::GBayesianModelCombination, GClasses::GNeuralNet, GClasses::GBayesianModelAveraging, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GBomb, GClasses::GHingedLinear, GClasses::GMeanMarginsTree, GClasses::GBag, GClasses::GLinearDistribution, GClasses::GKNN, GClasses::GGaussianProcess, GClasses::GDecisionTree, GClasses::GNaiveInstance, GClasses::GPolynomial, GClasses::GLinearRegressor, and GClasses::GNaiveBayes.
void GClasses::GSupervisedLearner::setupFilters  (  const GMatrix &  features, 
const GMatrix &  labels  
)  [protected] 
This method determines which data filters (normalize, discretize, and/or nominaltocat) are needed and trains them.
double GClasses::GSupervisedLearner::sumSquaredError  (  const GMatrix &  features, 
const GMatrix &  labels  
) 
Computes the sumsquarederror for predicting the labels from the features. For categorical labels, Hamming distance is used.
double GClasses::GSupervisedLearner::sumSquaredErrorInternal  (  const GMatrix &  features, 
const GMatrix &  labels  
)  [protected] 
Used to measure SSE with data that has already been converted to the internal format.
static void GClasses::GSupervisedLearner::test  (  )  [static] 
Runs some unit tests related to supervised learning. Throws an exception if any problems are found.
Reimplemented in GClasses::GReservoirNet, GClasses::GBaselineLearner, GClasses::GBucket, GClasses::GResamplingAdaBoost, GClasses::GSparseInstance, GClasses::GBayesianModelCombination, GClasses::GNeuralNet, GClasses::GBayesianModelAveraging, GClasses::GRandomForest, GClasses::GBomb, GClasses::GHingedLinear, GClasses::GMeanMarginsTree, GClasses::GBag, GClasses::GLinearDistribution, GClasses::GKNN, GClasses::GGaussianProcess, GClasses::GDecisionTree, GClasses::GNaiveInstance, GClasses::GPolynomial, GClasses::GLinearRegressor, and GClasses::GNaiveBayes.
void GClasses::GSupervisedLearner::train  (  const GMatrix &  features, 
const GMatrix &  labels  
) 
Call this method to train the model. It automatically determines which filters are needed to convert the training features and labels into a form that the model's training algorithm can handle, and then calls trainInner to do the actual training.
virtual double GClasses::GSupervisedLearner::trainAndTest  (  const GMatrix &  trainFeatures, 
const GMatrix &  trainLabels,  
const GMatrix &  testFeatures,  
const GMatrix &  testLabels  
)  [virtual] 
Trains and tests this learner. Returns sumsquarederror.
Reimplemented from GClasses::GTransducer.
virtual void GClasses::GSupervisedLearner::trainInner  (  const GMatrix &  features, 
const GMatrix &  labels  
)  [protected, pure virtual] 
This is the implementation of the model's training algorithm. (This method is called by train).
Implemented in GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GNeuralNet, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GHingedLinear, GClasses::GMeanMarginsTree, GClasses::GKNN, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GEnsemble, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.
virtual GMatrix* GClasses::GSupervisedLearner::transduceInner  (  const GMatrix &  features1, 
const GMatrix &  labels1,  
const GMatrix &  features2  
)  [protected, virtual] 
Implements GClasses::GTransducer.
void GClasses::GSupervisedLearner::wrapFeatures  (  GIncrementalTransform *  pFilter  ) 
Wrap whatever feature filter is currently set with the specified filter. Takes ownership of the filter.
void GClasses::GSupervisedLearner::wrapLabels  (  GIncrementalTransform *  pFilter  ) 
Wrap whatever label filter is currently set with the specified filter. Takes ownership of the filter.
GNeuralNet** GClasses::GSupervisedLearner::m_pCalibrations [protected] 
GRelation* GClasses::GSupervisedLearner::m_pRelFeatures [protected] 
GRelation* GClasses::GSupervisedLearner::m_pRelLabels [protected] 