## Method

1. Cluster the data
2. Find deltas of interest between the clusters
• Score each cluster
• Let each row have objective scores, normalized 0..1, min..max
• Let the score of a row by the sum of the normalized scores
• Let the score of a cluster be the mean of the score of its rows
• Technically, this is almost the cdom predicate used in IBEA
• For each cluster C1
• Find its nearest neighbor C2 with a better score
• Assert one  (leave,goto)  tuple for  (C1,C2)
3. Build and prune a decision tree on the clusters
• Label each instance with the cluster it belongs to
• Build a decision tree on the labelled data set.
• Find the clusters that are only weakly recognized by the decision tree learner
• e.g. use a three-way cross val and prune anything with F < 0.5
• Remove the weakly recognized clusters
4. For each (C1,C2) tuple where both are not weakly recognized,
• Query the tree to find the delta
Observation: the trees are so small that this can be done manually.

## Example

Nasa93 clustered into 2D (one color per cluster)

Cluster details
• All the following values are normalized 0..1, min..max
• Defects and months are connected, but not always
• Effort is not what separates the projects- its more about defects and calender time to develop
• Clearly, cluster 2 is a bad place and 10 and 13 look nicest.

A decision tree learned from the data labelled with each cluster (ignoring the objectives) generated:

acap = h
|   apex = h
|   |   pmat = h
|   |   |   plex = h: _2 (6.0/1.0)
|   |   |   plex = n: _4 (3.0)
|   |   pmat = l
|   |   |   cplx = vh: _3 (2.0)
|   |   |   cplx = h
|   |   |   |   time = vh: _3 (3.0)
|   |   |   |   time = n: _6 (4.0/1.0)
|   |   |   cplx = n: _5 (2.0)
|   |   pmat = n: _6 (4.0/1.0)
|   apex = n
|   |   data = h: _6 (2.0/1.0)
|   |   data = n: _4 (3.0/1.0)
|   |   data = l: _13 (1.0)
|   apex = vh
|   |   pcap = h: _10 (3.0)
|   |   pcap = vh: _7 (2.0/1.0)
acap = n
|   sced = n
|   |   stor = xh: _7 (1.0)
|   |   stor = n
|   |   |   cplx = h
|   |   |   |   pcap = h: _10 (3.0/1.0)
|   |   |   |   pcap = n: _13 (3.0)
|   |   |   cplx = n: _7 (3.0/1.0)
|   |   stor = vh: _11 (3.0)
|   |   stor = h: _11 (2.0)
|   sced = l
|   |   \$kloc <= 16.3: _9 (5.0)
|   |   \$kloc > 16.3: _8 (6.0)
acap = vh: _12 (7.0/1.0)

A 3-way cross-val yielded following confusion matrix.
• The underlined and bold entries are the correctly classified rows.
• The red entries are errors.
•  Note the poor performance for recognizing clusters 4,5,6,7,10,13

a b c d e f g h i j k l   <-- as="" classified="" font="">
5 0 0 0 0 0 0 0 0 0 0 0 | a = _2
0 4 1 1 0 0 0 0 0 0 0 0 | b = _3
1 0 2 0 1 0 0 0 1 0 0 0 | c = _4
0 1 1 0 3 0 0 0 0 0 0 0 | d = _5
0 1 2 3 0 0 0 0 1 0 0 1 | e = _6
0 0 0 0 0 0 0 0 1 2 1 1 | f = _7
0 0 0 0 0 0 6 0 0 0 0 0 | g = _8
0 0 0 0 0 0 1 4 0 0 0 0 | h = _9
0 0 1 0 0 0 0 0 3 0 0 2 | i = _10
0 0 0 0 0 1 0 0 0 4 0 0 | j = _11
0 0 0 0 0 1 0 0 0 0 6 0 | k = _12
0 0 1 0 0 0 0 0 1 1 0 2 | l = _13

The above confusion matrix is mapped into the "f" measures of the following table.
• The "goto" column marks the deltas of interest.
•  Low "f" values are marked in gray.
• Any "goto" that comes or goes into gray is marked with gray.

 cluster n effort defects months f goto 2 5 43% 25% 72% 91% 3 3 6 5% 32% 42% 67% 6 4 5 6% 17% 37% 31% 8 5 5 7% 29% 43% 0% 6 6 8 6% 24% 40% 0% 10 7 5 7% 16% 36% 0% 13 8 6 2% 6% 17% 92% 9 9 5 0% 1% 3% 89% 10 6 2% 9% 22% 46% 12 11 5 7% 18% 31% 67% 13 12 7 2% 7% 18% 86% 13 5 7% 15% 26% 36% total: 68

If we prune the above tree of any branch that leads only to gray classes, we get, as promised above, a very small tree.

acap = h
|   apex = h
|   |   pmat = h
|   |   |   plex = h: _2 (6.0/1.0)
|   |   pmat = l
|   |   |   cplx = vh: _3 (2.0)
|   |   |   cplx = h
|   |   |   |   time = vh: _3 (3.0)
acap = n
|   sced = n
|   |   stor = vh: _11 (3.0)
|   |   stor = h: _11 (2.0)
|   sced = l
|   |   \$kloc <= 16.3: _9 (5.0)
|   |   \$kloc > 16.3: _8 (6.0)
acap = vh: _12 (7.0/1.0)

## Summary

• The definite statements that clearly make changes in SE data are very succinct.
• But they might not cover everything.
Question: what would you baseline this against? I.e. how would you certify this as a good/crappy idea?