Monday, February 10, 2014

Learner Noise Oddities

While re-generating some plots for the data-quality paper I stumbled into some oddities in how PEEKER handles some forms of noise in some types of data. Below are the results of a small experiment I ran comparing PEEKER's performance to Naive Bayes on the same sets using the same methods as before, but with all 10 Data Sets on one plot for each performance category.

PEEKER:
Noise Methods from here

Note: NB Results are nearly identical, they are not included to save space.


PEEKER:
Random Swap







NB:
Random Swap








Based on these results there seems to be something strange about the way PEEKER handles the "Random Swap" case of noise for certain data. This is particularly odd because it handles the other forms of noise with ease as we can see in the first set of results--most of which are far more complex types of noise than simply swapping the class value, or so one would think...

It is possible the culprit could be as simple as the set becoming is too heavily weighted in one direction or the other (defective/non-defective) after Feature/Instance Selection causing the decline in performance for some of the data-sets. However as of right now I cannot determine the cause of the performance decline for certain.

This didn't seem evident before we started comparing the results in this fashion. It is worthwhile to note however that the decline seems to kick into effect around the 30%+ Noise mark, which is still at least 10% higher than the study linked above suggests in the most generous of estimates of their data. However their reported figures are somewhat unclear and unspecific.

The results for RF and KNN mirror the results for NB so they were not included in this post. They can be found here... KNN  RF

No comments:

Post a Comment