Wednesday, January 9, 2013

Vasil - Progress Report

1. From the previous meeting, I showed that we can approximate the NEMS score results by training on 10% of the population.
However, by clustering (cluster+LR and cluster+Centroid) we did not get better results than by applying linear regression on the entire instances space.

I tried a new method of grid clustering. Instead of using Recursive FastMap, the projected instance space was split into a specific number of equally-sized quadrants, then the quadrants were clustered using the variance filter.
I used different number of quadrants to split the space, but still clustering could not outperform the linear regression on the entire space.

2. Next I have shifted by attention on applying the clustering methods on defect prediction data. (I am currently using the ant data set from promise repository).

- The instance space is projected using FastMap.
- Then the instance space is recursively split into quadrants on the median of x and y dimensions.
- The clusters are created using Entropy to combine quadrants (i.e. a quadrant is not added if the entropy is increased).

- Apply Naive Bayes on the entire space and on clusters. (The implementation will be finished today and I can report the results).
- Use information gain to define the attributes that contribute on clustering.


No comments:

Post a Comment