.-''-.
__.....__ _..._ .----. .----. .' .-. )
.-'' '. .' '.\ \ / /.-. .-/ .' / /
/ .-''"'-. `. . .-. .' '. /' / \ \ / (_/ / /
/ /________\ \| ' ' || |' / \ \ / / / /
| || | | || || | \ \ / / / /
\ .-------------'| | | |'. `' .' \ \ / / . '
\ '-.____...---.| | | | \ / \ ` / / / _.-')
`. .' | | | | \ / \ / .' ' _.'.-''
`''-...... -' | | | | '----' / / / /.-'_.'
| | | | |`-' / / _.'
'--' '--' '..' ( _.-'
Here's problem :
The above shows 93 projects (with 23 decisions and 3 objectives) mapped into a 2-d space.
- The x-axis is the cosine dimension (where the two end points are found via the FastMap heuristic)
- The y-axis is the Pythagorus dimension: sqrt(a^2 -x^2)
- The colors denote clusters (so all the red ones at the bottom left are in one cluster).
- Clustering done via 4-way recursive splits on mean x,y points
The principle of envy says
- Find your nearest cluster with a better class score
- We use the mean IBEA score of the objectives = sum_i e^(obj[i] - min[i]) where all objectives have been normalized (x-min)/(max - min).
- Find the the delta from you to them
- That is is your plan on how to make your life better.
And here's the problem:
- how to explain that plan
- when "nearest" is expressed in the above weirdo cosine and Pythagorus dimensions.
Here's one solution:
- Given a pair of centroids (asIs, Envied)
- Sample:
- N times repeat
- Generate an example Eg at random from the examples in those two clusters
- Let value(Eg) be the score of each example (and its expressed as the distance to the Envied cluster)
- So smallest scores are better
- For each range in the example
- Add value(Eg) to value(range)
- Prune:
- Sort ranges by their value
- Let gap = distance(centroid(asIs), centroid(Envied))
- Score ranges by their ability to move rows in asIs towards Envied, as follows.
- For i = 1 to 10 (some magic number)
- Let elite be the best i ranges e.g.
- ((colNum, range),
[(18, 'l'),
(15, 'vh'),
(19, 'n'),
(8, 'xh'), ....] - For each row in asIs (i.e. the things you want to change)
- Let before = distance(row,centroid(Envied))
- Let mutant = copy(row)
- Inject all the ranges into elite into mutant
- Let after = distance(mutant, centroid(Envied))
- Let movement(elite) = (before - after)/gap
- Policy = the ranges with most movement
Details:
- Estimates of defect, effort, months from Vasil's algorithm (interpolation between 2 nearest clusters)
- To generate an example, I used the DE trick of interpolating between values. This has the benefit of preserving known distributions
def smear(rows,cols,cr=0.5,f=0.5): i = one(rows) j = one(rows) k = one(rows) must = any(0, len(i)) new = [] for n,(a,b,c,col) in enumerate(zip(i,j,k,cols)): if a == "?" or b == "?" or c == "?": x = "?" else: x = a if n == must or rand() < cr: if isa(col,Num): x = a + f*(b-c) else: if rand() < f: x = b if rand() < 0.5 else c new += [x] return new
Results
When applied in a 5*5 cross-val, the 25,50,75,100th percentiles of defect, effort, months were as follows. Note the large decreases at the 75th percentile:effort
before: [0.44, 0.98, 4.52, 199.5]
after: [0.39, 0.76, 2.01, 120.2]
defects
before: [0.34, 0.84, 3.57, 124.76]
after: [0.37, 0.72, 2.09, 127.61]
months
before: [0.15, 0.33, 0.58, 3.7]
after: [0.16, 0.35, 0.51, 3.76]
No comments:
Post a Comment