Tuesday, August 31, 2010

Problem with Stopping Rule in Compass Defect Prediction

I've found that the stopping rule for walking through the Compass tree stops too early. Compass currently stops when the variance of the node is less than the weighted variance of the node's children.

(if (< (node-variance c-node)
(weighted-variance c-node))

(defun weighted-variance (c-node)
(if (and (null (node-right c-node)) (null (node-left c-node)))
(node-variance c-node)
(if (or (null (node-right c-node)) (null (node-left c-node)))
(if (null (node-right c-node))
(node-variance (node-left c-node))
(node-variance (node-right c-node)))
(/ (+ (* (node-variance (node-right c-node))
(length (node-contents (node-right c-node))))
(* (node-variance (node-left c-node))
(length (node-contents (node-left c-node)))))
(+ (length (node-contents (node-right c-node)))
(length (node-contents (node-left c-node))))))))

http://github.com/abutcher/compass/raw/master/trunk/src/lisp/variance.lisp

In the defect prediction data sets, this condition occurs at a high level (usually 1 or 2). The result is that the cluster for majority voting is often most of the data set rather than similar instances.

Example:

http://github.com/abutcher/compass/raw/master/doc/dot/defect/pruned/jm1.dot.png








No comments:

Post a Comment