Back to the table of contents

Previous      Next

Representing your data

The GMatrix class is a container for a two-dimensional array of values. These values can be continuous (real-valued) or nominal (categorical). This class provides many useful operations for matrices, tables, datasets, ordered sets of vectors, etc.

If you are going to work with the GMatrix class, you need to include GMatrix.h.

	#include <GClasses/GMatrix.h>

Then, you can load a GMatrix from ARFF format as follows:

	GMatrix M;
	M.loadArff("mydata.arff");

You can also load from a text file of comma-separated values (a CSV file):

	M.loadCsv("mydata.csv", ',');

or tab-separated values:

	M.loadCsv("mydata.csv", '\t');

or space-separated values:

	M.loadCsv("mydata.csv", ' ');

You can also construct an uninitialized GMatrix of continuous values by specifying the number of rows and columns:

	GMatrix M(5, 7); // 5 rows, 7 columns

The "row" method returns a row vector of the matrix as a pointer to an array of doubles:

	double* pRow = M.row(3);

You can also use array notation. The following line does exactly the same thing:

	double* pRow = M[3];

This will set element 3,5 to 3.14159, and element 0,2 to 3.34159:

	M[3][5] = 3.14159;
	M[0][2] = M[3][5] + 0.2;

You can also manually construct a matrix where some columns (or attributes) have nominal (categorical) values. This is done using a vector that specifies how many values are supported in each column. (Use 0 for continuous values.) This example will create a matrix with two rows and three columns. The first column is continuous. The second column is binary, and the third column supports three class values:

	vector<size_t> vals;
	vals.push_back(0);
	vals.push_back(2);
	vals.push_back(3);
	GMatrix mydata(vals);
	double* pRow0 = mydata.newRow();
	double* pRow1 = mydata.newRow();

Nominal elements contain an integer value that specifies the index of its class or category. For example, a nominal attribute with 4 possible class values supports the values 0, 1, 2, and 3. (These values are stored internally as doubles, but most operations will cast them to integers, thus dropping the decimal, when working with nominal values.)

Now, to tie it all together, here is a simple example program that will load some data, mess with it a little bit, and then print the results to stdout.

	#include <GClasses/GMatrix.h>
	#include <GClasses/GHolders.h>
	#include <GClasses/GRand.h>
	#include <iostream>

	using namespace GClasses;
	using std::cout;

	int main(int argc, char *argv[])
	{
		GMatrix M; // make a matrix
		M.loadArff("mydata.arff"); // load some data
		GRand rand(0); // construct a random number generator with seed 0
		M.shuffle(&rand); // shuffle the row order
		M.sort(2); // sort rows by column 2 in ascending order
		M.reverseRows(); // reverse the row order
		M.swapColumns(2, 4); // swap columns 2 and 4
		M[0][0] = M.median(3); // set element 0,0 to the median of column 3
		M[1][0] = M.entropy(5); // compute the entropy in column 5
		M.deleteRow(7); // delete a row
		M.print(cout); // print M to stdout
		return 0;
	}

Previous      Next

Back to the table of contents



Hosting for this project generously provided by:
SourceForge.net Logo