```
.-''-.
__.....__ _..._ .----. .----. .' .-. )
.-'' '. .' '.\ \ / /.-. .-/ .' / /
/ .-''"'-. `. . .-. .' '. /' / \ \ / (_/ / /
/ /________\ \| ' ' || |' / \ \ / / / /
| || | | || || | \ \ / / / /
\ .-------------'| | | |'. `' .' \ \ / / . '
\ '-.____...---.| | | | \ / \ ` / / / _.-')
`. .' | | | | \ / \ / .' ' _.'.-''
`''-...... -' | | | | '----' / / / /.-'_.'
| | | | |`-' / / _.'
'--' '--' '..' ( _.-'
```

Here's problem :

The above shows 93 projects (with 23 decisions and 3 objectives) mapped into a 2-d space.

- The x-axis is the cosine dimension (where the two end points are found via the FastMap heuristic)
- The y-axis is the Pythagorus dimension:
**sqrt(a^2 -x^2)** - The colors denote clusters (so all the red ones at the bottom left are in one cluster).
- Clustering done via 4-way recursive splits on mean x,y points

The principle of envy says

- Find your nearest cluster with a better class score
- We use the mean IBEA score of the objectives =
**sum_i e^(obj[i] - min[i])**where all objectives have been normalized (**x-min)/(max - min).** - Find the the delta from you to them
- That is is your plan on how to make your life better.

And here's the problem:

- how to explain that plan
- when "nearest" is expressed in the above weirdo cosine and Pythagorus dimensions.

### Here's one solution:

- Given a pair of centroids (
**asIs, Envied**) - Sample:
- N times repeat
- Generate an example
**Eg**at random from the examples in those two clusters - Let
**value(Eg)**be the score of each example (and its expressed as the distance to the**Envied**cluster) - So smallest scores are better
- For each range in the example
- Add v
**alue(Eg)**to**value(range)** - Prune:
- Sort ranges by their value
- Let
**gap = distance(centroid(asIs), centroid(Envied))** - Score
**ranges**by their ability to move rows in asIs towards**Envied**, as follows. - For i = 1 to 10 (some magic number)
- Let
**elite**be the best i ranges e.g. - ((colNum, range),

[(18, 'l'),

(15, 'vh'),

(19, 'n'),

(8, 'xh'), ....] - For each row in
**asIs**(i.e. the things you want to change) - Let
**before = distance(row,centroid(Envied))** - Let
**mutant = copy(row)** - Inject all the ranges into elite into mutant
- Let
**after = distance(mutant, centroid(Envied))** - Let
**movement(elite) = (before - after)/gap** **Policy**= the ranges with most movement

### Details:

- Estimates of defect, effort, months from Vasil's algorithm (interpolation between 2 nearest clusters)
- To generate an example, I used the DE trick of interpolating between values. This has the benefit of preserving known distributions

def smear(rows,cols,cr=0.5,f=0.5): i = one(rows) j = one(rows) k = one(rows) must = any(0, len(i)) new = [] for n,(a,b,c,col) in enumerate(zip(i,j,k,cols)): if a == "?" or b == "?" or c == "?": x = "?" else: x = a if n == must or rand() < cr: if isa(col,Num): x = a + f*(b-c) else: if rand() < f: x = b if rand() < 0.5 else c new += [x] return new

### Results

When applied in a 5*5 cross-val, the 25,50,75,100th percentiles of defect, effort, months were as follows. Note the large decreases at the 75th percentile:effort

before: [0.44, 0.98, 4.52, 199.5]

after: [0.39, 0.76, 2.01, 120.2]

defects

before: [0.34, 0.84, 3.57, 124.76]

after: [0.37, 0.72, 2.09, 127.61]

months

before: [0.15, 0.33, 0.58, 3.7]

after: [0.16, 0.35, 0.51, 3.76]

## No comments:

## Post a Comment