In the defect prediction data sets, this condition occurs at a high level (usually 1 or 2). The result is that the cluster for majority voting is often most of the data set rather than similar instances.(if (< (node-variance c-node)(weighted-variance c-node))

(defun weighted-variance (c-node)(if (and (null (node-right c-node)) (null (node-left c-node)))(node-variance c-node)(if (or (null (node-right c-node)) (null (node-left c-node)))(if (null (node-right c-node))(node-variance (node-left c-node))(node-variance (node-right c-node)))(/ (+ (* (node-variance (node-right c-node))(length (node-contents (node-right c-node))))(* (node-variance (node-left c-node))(length (node-contents (node-left c-node)))))(+ (length (node-contents (node-right c-node)))(length (node-contents (node-left c-node))))))))

http://github.com/abutcher/compass/raw/master/trunk/src/lisp/variance.lisp

Example:

http://github.com/abutcher/compass/raw/master/doc/dot/defect/pruned/jm1.dot.png

## No comments:

## Post a Comment