The source data is a combination of the following OO data: poi-3.0 ant-1.7 camel-1.6 ivy-2.0 jEdit-4.1 and the target data is jm1 (Halstead metrics).
- x% of the target is labelled and all others are unlabeled.
- Only 50% of the target data are used as test instances (these are from the unlabeled bunch).
- BORE is applied separately to the labelled x% from the target and the source data.
- Each instance now has a score that is the product of the ranks from the power ranges (the scores are normalized).
- Each target instance gets a BORE score by using the ranks from the x%.
- These are then matched to their nearest [instances scores] from the source data and the majority defect label is assigned to the target instance.
- For the within experiment, the x% of labelled target data is used as the train set and the 50% test instances are the test.
- The above is also benchmarked with a 10 x 10 cross-validation experiment on jm1 with Naive Bayes.
Initial ResultsClick here
So far there are four things offered
- Synonyms - (if technology, or data collect methods, or metrics change, can we still use previous projects).
- Cross Prediction method for synonyms based on relational transfer of different data-sets.
- The percentage of labelled data used - second opinion paper is at 6% for the lowest and mixed paper experiments with 10%
- Methods closely resembles the second opinion paper, BORE is linear.