## Tuesday, December 3, 2013

### Explanation in MOEA = Planning

```
.-''-.
__.....__        _..._ .----.     .----.               .' .-.  )
.-''         '.    .'     '.\    \   /    /.-.          .-/ .'  / /
/     .-''"'-.  `. .   .-.   .'   '. /'   /  \ \        / (_/   / /
/     /________\   \|  '   '  ||    |'    /    \ \      / /     / /
|                  ||  |   |  ||    ||    |     \ \    / /     / /
\    .-------------'|  |   |  |'.   `'   .'      \ \  / /     . '
\    '-.____...---.|  |   |  | \        /        \ `  /     / /    _.-')
`.             .' |  |   |  |  \      /          \  /    .' '  _.'.-''
`''-...... -'   |  |   |  |   '----'           / /    /  /.-'_.'
|  |   |  |                |`-' /    /    _.'
'--'   '--'                 '..'    ( _.-'
```

Here's  problem :

The above shows 93 projects (with 23 decisions and 3 objectives) mapped into a 2-d space.
• The x-axis is the cosine dimension (where the two end points are found via the FastMap heuristic)
• The y-axis is the Pythagorus dimension: sqrt(a^2 -x^2)
• The colors denote clusters (so all the red ones at the bottom left are in one cluster).
• Clustering done via 4-way recursive splits on mean x,y points
The principle of envy says
• Find your nearest cluster with a better class score
• We use the mean IBEA score of the objectives = sum_i e^(obj[i] - min[i]) where all objectives have been normalized (x-min)/(max - min).
• Find the the delta from you to them
• That is is your plan on how to make your life better.
And here's the problem:
• how to explain that plan
• when "nearest" is expressed in the above weirdo  cosine and Pythagorus dimensions.

### Here's one solution:

• Given a pair  of centroids (asIs, Envied)
• Sample:
• N times repeat
• Generate  an example Eg at random from the examples in those two clusters
• Let value(Eg) be the score of each example (and  its expressed as the distance to the Envied cluster)
• So smallest scores are better
• For each range in the example
• Prune:
• Sort ranges by their value
• Let gap = distance(centroid(asIs), centroid(Envied))
• Score ranges by their ability to move rows in asIs towards Envied, as follows.
• For i = 1 to 10 (some magic number)
• Let elite be the best i ranges e.g.
• ((colNum, range),
[(18, 'l'),
(15, 'vh'),
(19, 'n'),
(8, 'xh'), ....]
• For each row in asIs (i.e. the things you want to change)
• Let before = distance(row,centroid(Envied))
• Let mutant = copy(row)
• Inject all the ranges into elite into mutant
• Let after = distance(mutant, centroid(Envied))
• Let movement(elite) = (before - after)/gap
• Policy = the ranges with  most movement

### Details:

• Estimates of defect, effort, months from Vasil's algorithm (interpolation between 2 nearest clusters)
• To generate an example, I used the DE trick of interpolating between values. This has the benefit of preserving known distributions
```def smear(rows,cols,cr=0.5,f=0.5):
i    = one(rows)
j    = one(rows)
k    = one(rows)
must = any(0, len(i))
new  = []
for n,(a,b,c,col) in enumerate(zip(i,j,k,cols)):
if a == "?" or b == "?" or c == "?":
x = "?"
else:
x = a
if n == must or rand() < cr:
if isa(col,Num):
x = a + f*(b-c)
else:
if rand() < f:
x = b if rand() < 0.5 else c
new += [x]
return new```

### Results

When applied in a 5*5 cross-val, the 25,50,75,100th percentiles of defect, effort, months were as follows. Note the large decreases at the 75th percentile:

effort
before: [0.44, 0.98, 4.52, 199.5]
after: [0.39, 0.76, 2.01, 120.2]

defects
before: [0.34, 0.84, 3.57, 124.76]
after: [0.37, 0.72, 2.09, 127.61]

months
before: [0.15, 0.33, 0.58, 3.7]
after: [0.16, 0.35, 0.51, 3.76]