A collection of command-line tools for researchers in machine learning, data mining, and related fields. All of the functionality is also provided in a clean C++ class library. Demo apps are included to show how to use the class library.
by: Mike Gashler

Builds on Linux, Windows, OSX, etc.
Distributed under the LGPL license.


Documentation
Change Log
Download
Forums







Contact info:










Thanks to those that have helped make this project possible:



First and foremost:
My supportive family


I learned about machine learning at:



Hosting of this project by:
SourceForge.net Logo


Graphics built with:



Licensed under the LGPL:
Gnu


Image compression by:



And people that have taken some time to help code, test, port, debug, doc, etc:

  • Mike Gashler
  • Desire Gashler
  • Kevin Kemp
  • Helaman Ferguson
  • Roger Pack
  • Marcelo Hashimoto
(... and your name could be here too!)


Change Log / Release Notes




2009-9-27

Added the Locally-Linear Embedding (LLE) to the transform tool and improved the Breadth First Unfolding manifold learning algorithm. Added the Kabsch algorithm for aligning data. Added singular value decomposition to the transform tool. Improved api docs. Further simplified the learning interface. Repaired some regressions with serialization. Added several unit tests.


2009-9-16

Ported to 64-bit Linux. Ported to VC++ 2008. Added classes for hidden Markov Models, equation parsing, intelligent neighbor-finding, drawing random values from various distributions, function plotting, improved algorithms for computing principal components, pruning manifold shortcuts, significance testing, singular value decomposition, kernel machines, Moore-Penrose pseudo-inverse, Dijkstra's algorithm, Floyd Warshall, and Brandes' betweeness centrality. Improved the runtime performance of Manifold Sculpting. Added a tool for generating various datasets. Did a complete interface overhaul. (Yes, this will break your code when you upgrade. That's the price of moving forward.) Improved standards compliance and type safety. Added another transduction algorithm. Added a new demo for a machine learning journal site. improved plotting tools. Added support for measuring transductive accuracy. Dumped some demos that I grew tired of maintaining. Fixed a regression in the naive Bayes algorithm. Added several unit tests. Dumped some dead code.


2009-2-1

Added a script-friendly command-line interface for all of the data mining tools. Converted to standard containers and did a whole lot of clean-up, maintenance, and polishing on the code.


2008-5-5

Models can now be persisted to/from a text-based format. Added incremental kd-tree. Added calibrator. Restructured some interfaces. Added new modelers. Added incremental support to some modelers. Added significance testing. Added chess demo and evolutionary jumper demo. Improved api docs, threw out dead code, and of course fixed a lot of bugs.


2007-11-26

Split the demos into separate apps. Added some Q-learning classes and a couple demos for it. Added several new supervised learning algorithms. Made a few GUI improvements. Redesigned the supervised learning interface to support output distributions instead of just classes or values, and to support semi-supervised learning. Added a semi-supervised learning algorithm. Added some code for Bayesian inference by MCMC using Metropolis and Gibbs sampling. Integrated a better pseudo-random-number-generator. Added code for doing Mixture of Gaussians by expectation maximization. Added code for Self Organizing Map. Added another hill-climbing algorithm. Added support for neural nets to the graphical data mining tool. Fixed 64-bit compatability issues. And fixed a lot of bugs.


2007-04-21

Added a new unified data mining tool that replaces the rank tool, the charting tool, and the predictive accuracy tool. Added confidence estimates to all the learning algorithms. Added a tool to make precision/recall charts. Added a tool for augmenting data sets. Seriously improved the GUI. Added various tools for data mining. Redesigned the GSupervisdedLearner class. Made the charting tools smarter and more capable. Added support to run Waffles experiments on a cluster sans the GUI. Improved error checking, and of course, fixed a bunch of bugs.


2007-01-11

Added some new image processing tools: a max-flow graph cut class, a region ajacency graph class, a video class, methods to compute gradient magnitude images, and a morphing class. Added a K-means clustering class, a couple new learning algorithms, some code for computing eigenvectors, a new tool for ranking learning algorithms, improved the documentation, and fixed many bugs.


2006-09-06

Added a GBag class (for bagging ensembles), Random Forest, Arbitrary Arboretum, and PC Forest. Added some code for computing eigenvectors, an algorithm that computes principle components of data in many dimensions without needing to compute the covariance matrix, and code for generating random vectors (by generating random numbers with a gaussian distribution). Fixed some bugs in A-star search, the KNN algorithm, and the Decision Tree class.


2006-08-03

Added A-star search, a relational table class, made the KNN instance learner work better incrementally, fixed several stability bugs in the socket and HTTP server classses, and added a face-sorting manifold learning demo.


2006-05-25

Added a new clustering algorithm, a discreet path search algorithm, did a good deal of general code clean up, integrated some changes contributed by Roger Pack into various classes, and fixed about ten bugs. Also thanks to Kevin Kemp for getting it to build on Mac without having to include a special framework in the package. I also added some unpolished tools for making charts and added a link to the API docs from the main menu.


2006-03-23

This is a bug fix release. If you were getting a build error in GKeyboard.cpp on Windows, that's fixed now.


2006-03-11

Ported to Mac OSX Tiger. Thanks to Helaman Ferguson for much of this work. It now works on Mac, Linux, and Windows.


2006-02-25

Added a new efficient neighbor-finding class. The KNN algorithm is much faster now. Fixed many bugs. Added a ray-tracing demo. This isn't really related to machine learning, but I have some plans in the future to try combining it with learning algorithms to do model reconstruction. The manifold learning demo demonstrates both unsupervised and semi-supervised manifold learning now.


2005-12-31

Added a particle swarm algorithm and some other search algorithms. Fixed some issues with the genetic algorithm. The Neural Net interpolation demo now compares several search algorithms. (Backprop is the clear winner, but it's not a totally fair comparison because backprop is running in incremental mode while the others are doing batch mode.)


2005-12-17

This is mostly a bug-fixing release.


2005-12-03

Added automated tests for several of the classes.


2005-11-27

The project is now released to the public. It builds in VS6 on Windows and with g++ on Linux.