Tuesday, December 11, 2012

Kernel Mean Match (KMM) for cross-project defect prediction

Context: cross-project defect prediction

Algorithm: Kernel Mean Match



where  B limits the scope of discrepancy between Pr and Prand esp  ensures that the measure β(x) Pr(x) is close to a probability distribution. 

Data: 5 open source project, process metrics (change data)

performance indicators: AUC, F, Recall, Precision


Experiments:

parameter tuning: B=1,3,5,10,20,50; esp=0.01, 0.1, 0.5, 1, 5, 10.

Naive Bayes (B=5, esp=0.01)
AUC Acc F F_2 Recall Precision PF TestSet TrainSet
0.6749 0.8517 0.2238 0.1752 0.1531 0.4156 0.0349 PDE_change JDT_change 
0.6731 0.6451 0.2282 0.1586 0.1318 0.8500 0.0154 equinox_change JDT_change 
0.7555 0.9204 0.3529 0.2708 0.2344 0.7143 0.0096 lucene_change JDT_change 
0.6424 0.8598 0.1092 0.0778 0.0653 0.3333 0.0198 mylyn_change JDT_change 
0.7185 0.6229 0.4551 0.6002 0.7621 0.3244 0.4134 JDT_change PDE_change 
0.7953 0.6944 0.4343 0.3381 0.2946 0.8261 0.0410 equinox_change PDE_change 
0.7772 0.8857 0.4060 0.4154 0.4219 0.3913 0.0670 lucene_change PDE_change 
0.5753 0.8448 0.2684 0.2345 0.2163 0.3533 0.0600 mylyn_change PDE_change 
0.7488 0.6810 0.4838 0.6037 0.7233 0.3634 0.3300 JDT_change equinox_change 
0.6985 0.7508 0.3935 0.4871 0.5789 0.2980 0.2213 PDE_change equinox_change 
0.7118 0.7424 0.3101 0.4444 0.6250 0.2062 0.2456 lucene_change equinox_change 
0.5615 0.7873 0.2826 0.3030 0.3184 0.2541 0.1416 mylyn_change equinox_change 
0.7411 0.6911 0.4901 0.6056 0.7184 0.3719 0.3161 JDT_change lucene_change 
0.6889 0.7916 0.3735 0.4133 0.4450 0.3218 0.1522 PDE_change lucene_change 
0.7825 0.7222 0.5588 0.4822 0.4419 0.7600 0.0923 equinox_change lucene_change 
0.6489 0.8281 0.3043 0.2929 0.2857 0.3256 0.0897 mylyn_change lucene_change 
0.6040 0.3400 0.3286 0.5038 0.7816 0.2080 0.7750 JDT_change mylyn_change 
0.5824 0.5311 0.2626 0.3958 0.5981 0.1682 0.4798 PDE_change mylyn_change 
0.5599 0.6636 0.5068 0.4605 0.4341 0.6087 0.1846 equinox_change mylyn_change 
0.6228 0.7829 0.2857 0.3731 0.4688 0.2055 0.1850 lucene_change mylyn_change 


KMM vs. non-weights
B=5 esp=0.01 non-weights
AUC AUC improvement mean(imp) median(imp)
0.6749 0.6313 0.0436 0.0218 0.0228
0.6731 0.7702 -0.0970
0.7555 0.7149 0.0406
0.6424 0.4153 0.2271
0.7185 0.7207 -0.0022
0.7953 0.7843 0.0110
0.7772 0.7413 0.0359
0.5753 0.5716 0.0038
0.7488 0.7105 0.0384
0.6985 0.6843 0.0142
0.7118 0.7257 -0.0139
0.5615 0.5131 0.0484
0.7411 0.7098 0.0313
0.6889 0.6954 -0.0065
0.7825 0.7739 0.0086
0.6489 0.6140 0.0349
0.6040 0.6305 -0.0265
0.5824 0.6424 -0.0601
0.5599 0.5102 0.0497
0.6228 0.5678 0.0550



Observations:
1) after parameter tuning, KMM helps to improve prediction performance, but not so significant (2%)
2) long running time! 900 training instances + 300 test instances: 30min; 10,000 training instances + 300 test instance: 130 min! 

Questions:
1) KMM may be a stupid idea for cross defect prediction though it's beautiful on mathematics?  


New results(2012-12-18):
parameter tuning: B=1,3,5,10,20,50; esp=0.01, 0.1, 0.5, 1, 5, 10.



Fig. 1 Improvement of PD



Fig. 2. Improvement of PF


POM2 RRSL SPEA2 and NSGAII

Charts: 

http://unbox.org/stuff/var/joe/joe/fall2012/pom2rrslTesting/charts/

RRSL significantly worse than SPEA2 and NSGAII but runtimes and evaluations used are significantly less.  POM2 Idle & Incompletion are not very well controlled, and RRSL does better with these confoundedly.  

Datatable:

http://unbox.org/stuff/var/joe/joe/fall2012/pom2rrslTesting/charts/rrsl_spea2_nsgaii_datatable.txt

Tuesday, December 4, 2012

POM2 Baselines

Using Inspyred for Python.  (http://pypi.python.org/pypi/inspyred)
May switch to DEAP, since no SPEA2 in Inspyred.  (http://code.google.com/p/deap/)


NSGA2:
table of data: http://justpaste.it/1la9
charts image: http://i.imgur.com/Ceuw8.png

PAES:
table of data: http://justpaste.it/1laa
charts image: http://i.imgur.com/SAl9D.png

NumEvals above are > 10,000
NumEvals below are < 20

RRSL:
tables of data: http://unbox.org/stuff/var/joe/joe/fall2012/pom2rrslTesting/data/

Workshop on Data Analysis Patterns in Software Engineering

http://research.microsoft.com/en-us/events/dapse2013/


DAPSE’13: International Workshop on Data Analysis Patterns in Software Engineering
The workshop has been accepted to the International Conference on Software Engineering. 
 Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. 
In the workshop we will compile a catalog of such patterns that will help both experienced and emerging data scientists to better communicate about data analysis. The workshop is intended for anyone interested in how to analyze data correctly and efficiently in a community accepted way.

EXAMPLE of a PATTERN

For illustrative purposes, here’s an example pattern in short and simplified form. For the workshop, we expect the discussion to be more comprehensive. We do welcome both simple and complex analysis patterns.
Pattern name: Contrast

Problem: 
Determine if there is a difference in one or more properties between twopopulations.

Solution: 
1. Apply a hypothesis test (student t-test for parametric data, Mann Whitney test for non-parametric test) to check if the property is statistically different between populations.

2. Determine the magnitude of the difference, either through visualization (e.g., boxplot) or when appropriate through mean or median.

Discussion: 
Either step without the other can be misleading. For large populations, tiny differences might be statistically significant. In contrast for small populations large differences might not be statistically significant.

Choosing the wrong hypothesis test is a common mistake.

Examples: 
For example, at ICSE 2009, Bird et al. used a Mann Whitney test to compare the defect proneness (=the property) between distributed and co-located binaries (=two populations). See Figure 5 in their paper for a sample visualization of the differences between the two population.

The magic tuning problem

Improved Knowledge Acquisition for High-Performance Heuristic Search
J.P. Bekmann and Achim Hoffmann

http://www.ijcai.org/papers/1668.pdf

We present a new incremental knowledge acquisition approach that incrementally improves the performance of a probabilistic search algorithm. The approach addresses the known dif culty of tuning probabilistic search algorithms, such as genetic algorithms or simulated annealing, for a given search problem by the introduction of domain knowledge. We show that our approach is effective for developing heuristic algorithms for dif ficult combinatorial problems by solving benchmarks from the industrially relevant domain of VLSI detailed routing. In this paper we present advanced techniques for improving our knowledge acquisition approach. We also present a novel method that uses domain knowledge for the prioritisation of mutation operators, increasing the GA's ef ciency noticeably.




Some ICSE 2013 papers...


Papers from ICSE 2013:
Process
Not Going to Take This Anymore: Multi-objective Overtime Planning for Software Engineering Projects
Filomena Ferrucci, Mark Harman, Jian Ren, and Federica Sarro
(University of Salerno, Italy; University College London, UK)
NO RESPONSE

Search-Based SE
How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms
Annibale Panichella, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia
(University of Salerno, Italy; College of William and Mary, USA; University of Molise, Italy; University of Sannio, Italy)
INFORMATION RETRIEVAL.
GA, single objective.

Search-Based Genetic Optimization for Deployment and Reconfiguration of Software in the Cloud
Sören Frey, Florian Fittkau, and Wilhelm Hasselbring
(Kiel University, Germany)
Problem: cloud deployment options (CDOs)
Simulator CDOSim can evaluate CDOs, e.g., regarding response times and costs.

Three objectives:
1-     cost refer to the total amount of monetary units owed to a cloud provider because of utilizing provided services.
2-     Response times refer to the average response times of the methods that are included in a workload profile.
3-     SLA violations indicate the number of method calls with response times that exceed a given threshold.
  
Custom algorithms based on NSGA-II.
Improvement by 60% over other tools.


Product Lines
Beyond Boolean Product-Line Model Checking: Dealing with Feature Attributes and Multi-features
Maxime Cordy, Pierre-Yves Schobbens, Patrick Heymans, and Axel Legay
(University of Namur, Belgium; INRIA, France)
Still a Verification method.
Multi-features: features that can appear several times in the same product, e.g. processing units.

On the Value of User Preferences in Search-Based Software Engineering: A Case Study in Software Product Lines
Abdel Salam Sayyad, Tim Menzies, and Hany Ammar
(West Virginia University, USA)

Reviewer #1:
 I suggest that the title be more direct in its description of the paper's content (i.e., make clear that the paper's focus is on evaluating the use of MEOAs in the SPL domain).

Reviewer #2:
The paper has a very nice survey of previous work on SBSE and multi objective SBSE that establishes the context in which the contributions of the paper are made. This is effectively a kind of "mini focused literature review” that focusses on previous work on multi objective SBSE. It is a small contribution
in itself.
The claims are very much overstated and need to be toned down. 
You can't say "who" when referring to an algorithm; they are not alive, no matter how "AI" this SBSE stuff might be :-)

Reviewer #3:
This is a very good paper. It reads like a page-turner
Also quite original and appealing is the idea of reducing the correctness criteria to just another dimension to optimize.

Monday, December 3, 2012

Tukutuku paper

The latest version of the paper, re-worded, re-positioned according to transfer learning viewpoint is here.