GClasses

This is the base class of algorithms that learn with supervision and have an internal hypothesis model that allows them to generalize rows that were not available at training time. More...
#include <GLearner.h>
Public Member Functions  
GSupervisedLearner ()  
Generalpurpose constructor. More...  
GSupervisedLearner (GDomNode *pNode, GLearnerLoader &ll)  
Deserialization constructor. More...  
virtual  ~GSupervisedLearner () 
Destructor. More...  
void  basicTest (double minAccuracy1, double minAccuracy2, double deviation=1e6, bool printAccuracy=false, double warnRange=0.035) 
This is a helper method used by the unit tests of several model learners. More...  
virtual bool  canGeneralize () 
Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows. More...  
virtual void  clear ()=0 
Discards all training for the purpose of freeing memory. If you call this method, you must train before making any predictions. No settings or options are discarded, so you should be able to train again without specifying any other parameters and still get a comparable model. More...  
void  confusion (GMatrix &features, GMatrix &labels, std::vector< GMatrix * > &stats) 
Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL. More...  
virtual bool  isFilter () 
Returns false. More...  
void  precisionRecall (double *pOutPrecision, size_t nPrecisionSize, GMatrix &features, GMatrix &labels, size_t label, size_t nReps) 
label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision. More...  
virtual void  predict (const double *pIn, double *pOut)=0 
Evaluate pIn to compute a prediction for pOut. The model must be trained (by calling train) before the first time that this method is called. pIn and pOut should point to arrays of doubles of the same size as the number of columns in the training matrices that were passed to the train method. More...  
virtual void  predictDistribution (const double *pIn, GPrediction *pOut)=0 
Evaluate pIn and compute a prediction for pOut. pOut is expected to point to an array of GPrediction objects which have already been allocated. There should be labelDims() elements in this array. The distributions will be more accurate if the model is calibrated before the first time that this method is called. More...  
const GRelation &  relFeatures () 
Returns a reference to the feature relation (metadata about the input attributes). More...  
const GRelation &  relLabels () 
Returns a reference to the label relation (metadata about the output attributes). More...  
virtual GDomNode *  serialize (GDom *pDoc) const =0 
Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.) More...  
double  sumSquaredError (const GMatrix &features, const GMatrix &labels) 
Computes the sumsquarederror for predicting the labels from the features. For categorical labels, Hamming distance is used. More...  
void  train (const GMatrix &features, const GMatrix &labels) 
Call this method to train the model. More...  
virtual double  trainAndTest (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels) 
Trains and tests this learner. Returns sumsquarederror. More...  
Public Member Functions inherited from GClasses::GTransducer  
GTransducer ()  
Generalpurpose constructor. More...  
GTransducer (const GTransducer &that)  
Copyconstructor. Throws an exception to prevent models from being copied by value. More...  
virtual  ~GTransducer () 
virtual bool  canImplicitlyHandleContinuousFeatures () 
Returns true iff this algorithm can implicitly handle continuous features. If it cannot, then the GDiscretize transform will be used to convert continuous features to nominal values before passing them to it. More...  
virtual bool  canImplicitlyHandleContinuousLabels () 
Returns true iff this algorithm can implicitly handle continuous labels (a.k.a. regression). If it cannot, then the GDiscretize transform will be used during training to convert nominal labels to continuous values, and to convert nominal predictions back to continuous labels. More...  
virtual bool  canImplicitlyHandleMissingFeatures () 
Returns true iff this algorithm supports missing feature values. If it cannot, then an imputation filter will be used to predict missing values before any featurevectors are passed to the algorithm. More...  
virtual bool  canImplicitlyHandleNominalFeatures () 
Returns true iff this algorithm can implicitly handle nominal features. If it cannot, then the GNominalToCat transform will be used to convert nominal features to continuous values before passing them to it. More...  
virtual bool  canImplicitlyHandleNominalLabels () 
Returns true iff this algorithm can implicitly handle nominal labels (a.k.a. classification). If it cannot, then the GNominalToCat transform will be used during training to convert nominal labels to continuous values, and to convert categorical predictions back to nominal labels. More...  
virtual bool  canTrainIncrementally () 
Returns false because semisupervised learners cannot be trained incrementally. More...  
double  crossValidate (const GMatrix &features, const GMatrix &labels, size_t nFolds, RepValidateCallback pCB=NULL, size_t nRep=0, void *pThis=NULL) 
Perform nfold cross validation on pData. Returns sumsquared error. Uses trainAndTest for each fold. pCB is an optional callback method for reporting intermediate stats. It can be NULL if you don't want intermediate reporting. nRep is just the rep number that will be passed to the callback. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. More...  
GTransducer &  operator= (const GTransducer &other) 
Throws an exception to prevent models from being copied by value. More...  
GRand &  rand () 
Returns a reference to the random number generator associated with this object. For example, you could use it to change the random seed, to make this algorithm behave differently. This might be important, for example, in an ensemble of learners. More...  
double  repValidate (const GMatrix &features, const GMatrix &labels, size_t reps, size_t nFolds, RepValidateCallback pCB=NULL, void *pThis=NULL) 
Perform cross validation "nReps" times and return the average score. pCB is an optional callback method for reporting intermediate stats It can be NULL if you don't want intermediate reporting. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. More...  
virtual bool  supportedFeatureRange (double *pOutMin, double *pOutMax) 
Returns true if this algorithm supports any feature value, or if it does not implicitly handle continuous features. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...  
virtual bool  supportedLabelRange (double *pOutMin, double *pOutMax) 
Returns true if this algorithm supports any label value, or if it does not implicitly handle continuous labels. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range. More...  
GMatrix *  transduce (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2) 
Predicts a set of labels to correspond with features2, such that these labels will be consistent with the patterns exhibited by features1 and labels1. More...  
void  transductiveConfusionMatrix (const GMatrix &trainFeatures, const GMatrix &trainLabels, const GMatrix &testFeatures, const GMatrix &testLabels, std::vector< GMatrix * > &stats) 
Makes a confusion matrix for a transduction algorithm. More...  
Static Public Member Functions  
static void  test () 
Runs some unit tests related to supervised learning. Throws an exception if any problems are found. More...  
Protected Member Functions  
GDomNode *  baseDomNode (GDom *pDoc, const char *szClassName) const 
Child classes should use this in their implementation of serialize. More...  
size_t  precisionRecallContinuous (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label) 
This is a helper method used by precisionRecall. More...  
size_t  precisionRecallNominal (GPrediction *pOutput, double *pFunc, GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, size_t label, int value) 
This is a helper method used by precisionRecall. More...  
void  setupFilters (const GMatrix &features, const GMatrix &labels) 
This method determines which data filters (normalize, discretize, and/or nominaltocat) are needed and trains them. More...  
virtual void  trainInner (const GMatrix &features, const GMatrix &labels)=0 
This is the implementation of the model's training algorithm. (This method is called by train). More...  
virtual GMatrix *  transduceInner (const GMatrix &features1, const GMatrix &labels1, const GMatrix &features2) 
See GTransducer::transduce. More...  
Protected Attributes  
GRelation *  m_pRelFeatures 
GRelation *  m_pRelLabels 
Protected Attributes inherited from GClasses::GTransducer  
GRand  m_rand 
This is the base class of algorithms that learn with supervision and have an internal hypothesis model that allows them to generalize rows that were not available at training time.
GClasses::GSupervisedLearner::GSupervisedLearner  (  ) 
Generalpurpose constructor.
GClasses::GSupervisedLearner::GSupervisedLearner  (  GDomNode *  pNode, 
GLearnerLoader &  ll  
) 
Deserialization constructor.

virtual 
Destructor.

protected 
Child classes should use this in their implementation of serialize.
void GClasses::GSupervisedLearner::basicTest  (  double  minAccuracy1, 
double  minAccuracy2,  
double  deviation = 1e6 , 

bool  printAccuracy = false , 

double  warnRange = 0.035 

) 
This is a helper method used by the unit tests of several model learners.

inlinevirtual 
Returns true because fully supervised learners have an internal model that allows them to generalize previously unseen rows.
Reimplemented from GClasses::GTransducer.

pure virtual 
Discards all training for the purpose of freeing memory. If you call this method, you must train before making any predictions. No settings or options are discarded, so you should be able to train again without specifying any other parameters and still get a comparable model.
Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GFilter, GClasses::GBucket, GClasses::GWag, GClasses::GResamplingAdaBoost, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GBag, GClasses::GLinearDistribution, GClasses::GKNN, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GPolynomial, and GClasses::GLinearRegressor.
void GClasses::GSupervisedLearner::confusion  (  GMatrix &  features, 
GMatrix &  labels,  
std::vector< GMatrix * > &  stats  
) 
Generates a confusion matrix containing the total counts of the number of times each value was expected and predicted. (Rows represent target values, and columns represent predicted values.) stats should be an empty vector. This method will resize stats to the number of dimensions in the label vector. The caller is responsible to delete all of the matrices that it puts in this vector. For continuous labels, the value will be NULL.

inlinevirtual 
Returns false.
Reimplemented in GClasses::GFilter, and GClasses::GIncrementalLearner.
void GClasses::GSupervisedLearner::precisionRecall  (  double *  pOutPrecision, 
size_t  nPrecisionSize,  
GMatrix &  features,  
GMatrix &  labels,  
size_t  label,  
size_t  nReps  
) 
label specifies which output to measure. (It should be 0 if there is only one label dimension.) The measurement will be performed "nReps" times and results averaged together nPrecisionSize specifies the number of points at which the function is sampled pOutPrecision should be an array big enough to hold nPrecisionSize elements for every possible label value. (If the attribute is continuous, it should just be big enough to hold nPrecisionSize elements.) If bLocal is true, it computes the local precision instead of the global precision.

protected 
This is a helper method used by precisionRecall.

protected 
This is a helper method used by precisionRecall.

pure virtual 
Evaluate pIn to compute a prediction for pOut. The model must be trained (by calling train) before the first time that this method is called. pIn and pOut should point to arrays of doubles of the same size as the number of columns in the training matrices that were passed to the train method.
Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GCalibrator, GClasses::GAutoFilter, GClasses::GLabelFilter, GClasses::GFeatureFilter, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GKNN, GClasses::GEnsemble, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.

pure virtual 
Evaluate pIn and compute a prediction for pOut. pOut is expected to point to an array of GPrediction objects which have already been allocated. There should be labelDims() elements in this array. The distributions will be more accurate if the model is calibrated before the first time that this method is called.
Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GCalibrator, GClasses::GAutoFilter, GClasses::GLabelFilter, GClasses::GFeatureFilter, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GGaussianProcess, GClasses::GKNN, GClasses::GEnsemble, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.
const GRelation& GClasses::GSupervisedLearner::relFeatures  (  ) 
Returns a reference to the feature relation (metadata about the input attributes).
const GRelation& GClasses::GSupervisedLearner::relLabels  (  ) 
Returns a reference to the label relation (metadata about the output attributes).
Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.)
Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GCalibrator, GClasses::GAutoFilter, GClasses::GLabelFilter, GClasses::GFeatureFilter, GClasses::GBucket, GClasses::GWag, GClasses::GResamplingAdaBoost, GClasses::GSparseInstance, GClasses::GBayesianModelCombination, GClasses::GBayesianModelAveraging, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GBomb, GClasses::GMeanMarginsTree, GClasses::GBag, GClasses::GLinearDistribution, GClasses::GKNN, GClasses::GGaussianProcess, GClasses::GDecisionTree, GClasses::GNaiveInstance, GClasses::GPolynomial, GClasses::GLinearRegressor, and GClasses::GNaiveBayes.

protected 
This method determines which data filters (normalize, discretize, and/or nominaltocat) are needed and trains them.
double GClasses::GSupervisedLearner::sumSquaredError  (  const GMatrix &  features, 
const GMatrix &  labels  
) 
Computes the sumsquarederror for predicting the labels from the features. For categorical labels, Hamming distance is used.

static 
Runs some unit tests related to supervised learning. Throws an exception if any problems are found.
Call this method to train the model.

virtual 
Trains and tests this learner. Returns sumsquarederror.
Reimplemented from GClasses::GTransducer.

protectedpure virtual 
This is the implementation of the model's training algorithm. (This method is called by train).
Implemented in GClasses::GReservoirNet, GClasses::GNeuralNet, GClasses::GIdentityFunction, GClasses::GBaselineLearner, GClasses::GCalibrator, GClasses::GAutoFilter, GClasses::GLabelFilter, GClasses::GFeatureFilter, GClasses::GBucket, GClasses::GWag, GClasses::GSparseInstance, GClasses::GInstanceTable, GClasses::GRandomForest, GClasses::GMeanMarginsTree, GClasses::GKNN, GClasses::GLinearDistribution, GClasses::GDecisionTree, GClasses::GEnsemble, GClasses::GGaussianProcess, GClasses::GNaiveInstance, GClasses::GNaiveBayes, GClasses::GLinearRegressor, and GClasses::GPolynomial.

protectedvirtual 
Implements GClasses::GTransducer.

protected 

protected 