|
Back to the table of contents Previous Next waffles_dimredA command-line tool for dimensionality reduction, manifold learning, attribute selection, and tools related to NLDR. Here's the usage information:
Full Usage Information
[Square brackets] are used to indicate required arguments.
<Angled brackets> are used to indicate optional arguments.
waffles_dimred [command]
Reduce dimensionality, attribute selection, operations related to manifold
learning, NLDR, etc.
attributeselector [dataset] <data_opts> <options>
Make a ranked list of attributes from most to least salient. The ranked
list is printed to stdout. Attributes are zero-indexed.
[dataset]
The filename of a dataset.
<data_opts>
-labels [attr_list]
Specify which attributes to use as labels. (If not specified, the
default is to use the last attribute for the label.) [attr_list] is
a comma-separated list of zero-indexed columns. A hypen may be used
to specify a range of columns. A '*' preceding a value means to
index from the right instead of the left. For example, "0,2-5"
refers to columns 0, 2, 3, 4, and 5. "*0" refers to the last
column. "0-*1" refers to all but the last column.
-ignore [attr_list]
Specify attributes to ignore. [attr_list] is a comma-separated list
of zero-indexed columns. A hypen may be used to specify a range of
columns. A '*' preceding a value means to index from the right
instead of the left. For example, "0,2-5" refers to columns 0, 2,
3, 4, and 5. "*0" refers to the last column. "0-*1" refers to all
but the last column.
<options>
-out [n] [filename]
Save a dataset containing only the [n]-most salient features to
[filename].
-seed [value]
Specify a seed for the random number generator.
-labeldims [n]
Specify the number of dimensions in the label (output) vector. The
default is 1. (Don't confuse this with the number of class labels.
It only takes one dimension to specify a class label, even if there
are k possible labels.)
blendembeddings [data-orig] [neighbor-finder] [data-a] [data-b] <options>
Compute a blended "average" embedding from two reduced-dimensionality
embeddings of some data.
[data-orig]
The filename of the original high-dimensional data.
[data-a]
The first reduced dimensional embedding of [data-orig]
[data-b]
The second reduced dimensional embedding of [data-orig]
<options>
-seed [value]
Specify a seed for the random number generator.
breadthfirstunfolding [dataset] [neighbor-finder] [target_dims] <options>
A manifold learning algorithm.
<options>
-seed [value]
Specify a seed for the random number generator.
-reps [n]
The number of times to compute the embedding and blend the results
together. If not specified, the default is 1.
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
isomap [dataset] [neighbor-finder] [target_dims] <options>
Use the Isomap algorithm to reduce dimensionality.
<options>
-seed [value]
Specify a seed for the random number generator.
-tolerant
If there are points that are disconnected from the rest of the
graph, just drop them from the data. (This may cause the results to
contain fewer rows than the input.)
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
scalingunfolder [dataset] [neighbor-finder] [target_dims] <options>
Use the ScalingUnfolder algorithm to reduce dimensionality. (This
algorithm was inspired by Maximum Variance Unfolding (MVU). It
iteratively scales up the data, then restores distances in local
neighborhoods. Unlike MVU, however, it does not use semidefinite
programming.)
<options>
-seed [value]
Specify a seed for the random number generator.
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
som [dataset] [dimensions] <options>
Give the output of a Kohonen self-organizing map with the given
dimensions trained on the input dataset. Ex: "som foo 10 11" would train
a 10x11 map on the input data and then give its 2D output for each of the
input points as a row in the output file.
[dataset]
The filename of a .arff file to be transformed.
[dimensions]
A list of integers, one for each dimension of the map being created,
giving the number of nodes in that dimension.
<options>
-tofile [filename]
Write the trained map to the given filename
-fromfile [filename]
Read a map from the file rather than training it
-seed [integer]
Seed the random number generator with integer to obtain
reproducible results
-neighborhood [gaussian|uniform]
Use the specified neighborhood type to determine the influence of a
node on its neighbors.
-printMeshEvery [numIter] [baseFilename] [xDim] [yDim] <showTrain>
Print a 2D-Mesh visualization every numIter training iteratons to
an svg file generated from baseFilename. The x dimension and y
dimension will be chosen from the zero-indexed dimensions of the
input using xDim and yDim. If the option "showTrain" is present
then the training data is displayed along with the mesh. Ex.
"-printMeshEvery 2 foo 0 1 showTrain" will write foo_01.svg
foo_02.svg etc. every other iteration using the first two
dimensions of the input and also display the training data in the
svg image. Note that including this option twice will create two
different printing actions, allowing multiple dimension pairs to be
visualized at once.
-batchTrain [startWidth] [endWidth] [numEpochs] [numConverge]
Trains the network using the batch training algorithm. Neighborhood
decreases exponentially from startWidth to endWidth over numEpochs
epochs. Each epoch lasts at most numConverge passes through the
dataset waiting for the network to converge. Do not ignore
numConverge=1. There has been good performance with this on some
datasets. This is the default training algorithm.
-stdTrain [startWidth] [endWidth] [startRate] [endRate] [numIter]
Trains the network using the standard incremental training
algorithm with the network width decreasing exponentially from
startWidth to endWidth and the learning rate also decreasing
exponentially from startRate to endRate, this will happen in
exactly numIter data point presentations.
svd [matrix] <options>
Compute the singular value decomposition of a matrix.
[matrix]
The filename of the matrix.
<options>
-ufilename [filename]
Set the filename to which U will be saved. U is the matrix in which
the columns are the eigenvectors of [matrix] times its transpose.
The default is u.arff.
-sigmafilename [filename]
Set the filename to which Sigma will be saved. Sigma is the matrix
that contains the singular values on its diagonal. All values in
Sigma except the diagonal will be zero. If this option is not
specified, the default is to only print the diagonal values (not
the whole matrix) to stdout. If this options is specified, nothing
is printed to stdout.
-vfilename [filename]
Set the filename to which V will be saved. V is the matrix in which
the row are the eigenvectors of the transpose of [matrix] times
[matrix]. The default is v.arff.
-maxiters [n]
Specify the number of times to iterate before giving up. The
default is 100, which should be sufficient for most problems.
lle [dataset] [neighbor-finder] [target_dims] <options>
Use the LLE algorithm to reduce dimensionality.
<options>
-seed [value]
Specify a seed for the random number generator.
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
manifoldsculpting [dataset] [neighbor-finder] [target_dims] <options>
Use the Manifold Sculpting algorithm to reduce dimensionality. (This
algorithm is specified in Gashler, Michael S. and Ventura, Dan and
Martinez, Tony. Iterative non-linear dimensionality reduction with
manifold sculpting. In Advances in Neural Information Processing Systems
20, pages 513-520, MIT Press, Cambridge, MA, 2008.)
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
<options>
-seed [value]
Specify a seed for the random number generator.
-continue [dataset]
Continue refining the specified reduced-dimensional results. (This
feature enables Manifold Sculpting to improve upon its own results,
or to refine the results from another dimensionality reduction
algorithm.)
-scalerate [value]
Specify the scaling rate. If not specified, the default is 0.999. A
value close to 1 will give better results, but will cause the
algorithm to take longer.
multidimensionalscaling [distance-matrix] [target-dims]
Perform MDS on the specified [distance-matrix].
[distance-matrix]
The filename of an arff file that contains the pair-wise distances (or
dissimilarities) between every pair of points. It must be a square
matrix of real values. Only the upper-triangle of this matrix is
actually used. The lower-triangle and diagonal is ignored.
<options>
-squareddistances
The distances in the distance matrix are squared distances, instead
of just distances.
neuropca [dataset] [target_dims] <options>
Projects the data into the specified number of dimensions with a
non-linear generalization of principle component analysis. (Prints
results to stdout. The input file is not modified.)
<options>
-seed [value]
Specify a seed for the random number generator.
-clampbias
Do not let the bias drift from the centroid. (Leaving the bias
unclamped typically gives better results with non-linear activation
functions. Clamping them to the centroid is necessary if you want
results equivalent with PCA.)
-linear
Use a linear activation function instead of the default logistic
activation function. (The logistic activation function typically
gives better results with most problems, but the linear activation
function may be used to obtain results equivalent to PCA.)
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
pca [dataset] [target_dims] <options>
Projects the data into the specified number of dimensions with principle
component analysis. (Prints results to stdout. The input file is not
modified.)
<options>
-seed [value]
Specify a seed for the random number generator.
-roundtrip [filename]
Do a lossy round-trip of the data and save the results to the
specified file.
-eigenvalues [filename]
Save the eigenvalues to the specified file.
-components [filename]
Save the centroid and principal component vectors (in order of
decreasing corresponding eigenvalue) to the specified file.
-aboutorigin
Compute the principal components about the origin. (The default is
to compute them relative to the centroid.)
-modelin [filename]
Load the PCA model from a json file.
-modelout [filename]
Save the trained PCA model to a json file.
[dataset]
The filename of the high-dimensional data to reduce.
[target_dims]
The number of dimensions to reduce the data into.
usage
Print usage information.
Previous Next Back to the table of contents |