## Wednesday, February 27, 2013

### Not Global vs Local

I'm playing with the idea of a global and local rules family, where local rules are the children and global rules the grandparents. The parents are the middle rules. In other words, the local rules are subsets of global and middle rules.

The goal - provide the project manager with a family of options. If a local rule is best but would prefer a more global solution, find its global and middle relatives and see if they work just as well.

My initial experiment is a little flawed by my simple rule score metric, P(defects), but results are useful.

## Wednesday, February 20, 2013

### TSE paper outline

1- The ICSE'13 result: All algorithms perform almost the same at 2 or 3 objectives. IBEA beats all by a large margin when optimizing 4 or 5 objectives.
2- The CMSBSE'13 result: Lower rates are better. This is a brief section, IBEA vs NSGAII vs SPEA2, E-Shop only, high rates vs low rates. It can be extended to other algorithms and feature models, but then it's dwarfed by the tree mutation results.
3- The comparison of algorithms is run for 20 feature models, with effect size calculations... Confirms superiority of IBEA. Lower rates are used here to give all algorithms their best chance to perform.
4- Tree Mutation: By removing random crossover and mutation and deploying tree mutation [at a low rate 0.01], IBEA performs better (converges faster) than 1 & 2 above.
4-a) The improvement is 3-orders-of-magnitude when comparing E-Shop with ICSE'13 setup vs Tree Mutation. In fact, we now have good results at 2 sec and very good at 10 sec, thus interactive responses are possible. [Comparison: With ICSE'13 IBEA, We achieve HV=0.28 & 52% correct solutions after 3 hours. With Tree Mutation IBEA, We achieve HV=0.25 & 99% correct solutions after 8 seconds. The 0.28 figure is inflated by incorrect solutions that achieve supreme objective values... 3hours/8sec = 1350].
4-b) The improvement can also be demonstrated on all 20 feature models, only the numbers are less than 3-orders-of-magnitude... [This needs to be repeated because of the bug we've just discovered].
5- Tree Mutation does not help other algorithms much... This can be demonstrated on all 20 models.  [This needs to be repeated because of the bug we've just discovered]. Should this be included? If we don't include it here, is it worthy of a whole new paper?
5-a) Comparison of 6 algorithms with and without Tree Mutation. found at
http://www.unbox.org/things/var/abdel/13/tse/Comparison of Methods Feb-Mar 2013.pdf

### Biclustering Comparisons

BicBin comparisons VanUitert08

Comparison using Artificial Db
p {.001:.05}  # prob of 1s
M = 10, 100, 1000 # rows
N = 50, 500, 5000 # cols
m x n = {2:10:50} # size of biclusters

create 5 dbs for each combination
run each algorithm 50 X per alg
average the fp rates for each combination
results shown in heat maps

Real Dbs
TRANSFAC v 10.3 Wingender00
transaction factor binding site data

STRING vonMering07
protein to protein interactions

GO categories Prelic06

measuring distances between the motifs found by these dbs
don't understand all of this group yet

## Wednesday, February 13, 2013

### Over the horizon: ML + SBSE = What?

here's the talk i gave at UCL last week

so come see what i say about you behind your back (sorry, only could get in references to ekrem, joe krall, fayola into this talk- but clearly vasil's work fits this framework as well)

## Many problems with the standard story, especially when comparing (say) 20 different configurations of a learner.

• Say each test has confidence of 0.95
• Net test has confidence of 0.95^20 = 0.36
• The complexity of the test makes me lose confidence in my ability to understand the test
• For the parametric tests, too many parametric assumptions.
• The blurred telescope effect (Friedman-Nemenyi)

## A12

The Vargha and Delaney A12 statistics is a non-parametric effect size measure.

Given a performance measure M seen in m measures of X and n measures of Y, the A12 statistics measures the probability that running algorithm X yields higher M values than running another algorithm Y.

A12 = #(X > Y)/mn + 0.5*#(X=Y)/mn

## rA12

A12 studies two treatments. rA12 handles mulitple treatments using a Scott Knott procedure; i.e. divide a list of treatments into two lists  L1 and  L2 _ by finding the division that maximizes the expected value of the sum of square of the mean differences before and after divisions; i.e.

L = union(L1,L2)
|L1|  *  abs(L1.mu - L.mu)^2 + |L2|  *  abs(L2.mu - L.mu)^2

rA12 calles itself recursively on the lsts terminating when no further  useful  division can be found (and  useful  is checked by the A12 procedure.

For more on the Scott Knott procedure (where ANOVA is used to define  useful), see

•  N. Mittas and L. Angelis, "Ranking and clustering software cost estimation models through a multiple comparisons algorithm" IEEE Transactions on Software Engineering (pre-print), 2012.

## Source code

### active learning for multi-objective optimization

• same orders of magnitude as joe
• technique seems a little more complex
• no execution on standard models (i think)

Marcela ZuluagaAndreas KrauseGuillaume Sergent and Markus PĆ¼schel (to appear in Proc. International Conference on Machine Learning (ICML), 2013)
Active Learning for Multi-Objective Optimization
In many fields one encounters the challenge of identifying, out of a pool of possible designs, those that simultaneously optimize multiple objectives. This means that usually there is not one optimal design but an entire set of Pareto-optimal ones with optimal trade-offs in the objectives. In many applications, evaluating one design is expensive; thus, an exhaustive search for the Pareto-optimal set is unfeasible. To address this challenge, we propose the Pareto Active Learning (PAL) algorithm which intelligently samples the design space to predict the Pareto-optimal set. Key features of PAL include (1) modeling the objectives as samples from a Gaussian process distribution to capture structure and accomodate noisy evaluation; (so parametric assumptions)  (2) a method to carefully choose the next design to evaluate to maximize progress; and (3) the ability to control prediction accuracy and sampling cost. We provide theoretical bounds on PAL’s sampling cost required to achieve a desired accuracy. Further, we show an experimental evaluation on three real-world data sets (only 3???). The results show PAL’s effectiveness; in particular it improves significantly over a state-of-the-art multi-objective optimization method, saving in many cases about 33% evaluations to achieve the same accuracy.

### Wednesday 2-6-2013

Linux early results http://www.unbox.org/things/var/abdel/13/linux/
CMSBSE'13 http://www.unbox.org/things/var/abdel/13/cmsbse13/
RAISE'13 http://www.unbox.org/things/var/abdel/13/raise13/