Monday, March 1, 2010

Locality in Defect Prediction

Here, we show how a group of learners performs on 7 of the NASA datasets for software defect prediction both for Within Company and Cross Company.

We also are looking at varying the K for LWL. The greatest change between PD/PF with K values of {5,10,25,50} is 5%, usually around 3%. With changes in PD/PF this small, the actual value of K does not seem to matter.

Two algorithms, Clump and Naive Bayes, are shown both with and without relevancy filtering via the Burak Filter. Although applying the Burak Filter can reduce the variance of the results (for within and cross company when the data is logged), it does not significantly affect the median PD/PF's.

Observing the interaction between PD and PF for all algorithms explored, I cannot see a case where locality is beneficial. The PD vs PF ratio of both the locality based methods and the global methods are almost identical, showing that any gain in PD/loss in PF is at the cost of additional PF/less PD.

1 comment: