Sunday, September 20, 2009

Redefining Classification

HyperPipes is a linear-time dumb-as-a-box-of-hammers data miner that usually scores very badly, except on very spare data sets.

Aaron Riesbeck and Adam Brady added a wriggle to HyperPipes:
  • If N classes score the same, then return all N.
  • In this framework "success" means that the real class is within the N.
Based on that definition of "success", their HyperPipes2 algorithm (on self tests) scores accuracies of 100%. And in data sets with dozens of classes, it usually return 1,2,3,4 classes.

So if the task is knowing what it IS, HyperPipes2 may not be useful. But if the task is what it AIN'T, then HyperPipes2 is a useful tool for quickly throw away most of the competing hypotheses.

No comments:

Post a Comment