## Tuesday, March 16, 2010

### Effect of various Bandwidth&Kernel Combinations in Neighbor Weighting

What is the effect of weighting your neighbors in kNN? Acc. to 3 datasets (Coc81, Nasa93, Desharnais), the effects are as follows:

- Weighting your neighbors acc. to a kernel function creates significantly different results
- However, in these 3 datasets subject to different kernels (Uniform, Triangular, Epanechnikov, Gaussian) and to inverse rank weighted mean (proposed by Mendes et. al.) with various bandwidths, we could not observe an improvement in accuracies.
- Detailed experimental results under http://unbox.org/wisp/var/ekrem/kernel/

## Monday, March 15, 2010

### Presentation - Finding Robust Solutions to Requirements Models

Presentation to Tsinghua University - Thursday, March 18, 2010.

Based on: Gay, Gregory and Menzies, Tim and Jalali, Omid and Mundy, Gregory and Gilkerson, Beau and Feather, Martin and Kiper, James. Finding robust solutions in requirements models. Automated Software Engineering, 17(1): 87-116, 2010.

Labels:
GregG,
KEYS,
presentation,
SBSE,
treatment learning

### Active Machine Learning

Unlabeled data is easy to come by and labeling that data can be a tedious task. Imagine that you've been tasked with gathering sound bytes. All you have to do is walk around with a microphone and you've got your data. Once you have the data you have to label each individual sound byte and catalogue it. Obviously, combing and labeling all the data wouldn't be fun -- regardless of the domain.

Active machine learning is a supervised learning technique whose goal is to produce adequately labeled (classified) data with as little human interference as possible. The active learning process takes in a small chunk of data which has already been assigned a classification by a human (oracle) with extensive domain knowledge. The learner then uses that data to create a classifier and applies it to a larger set of unlabeled data. The entire learning process aims at keeping human annotation to a minimum -- only referring to the oracle when the cost of querying the data is high.

Active learning typically allows for monitoring of the learning process and offers the human expert the ability to halt learning. If the classification error grows beyond a heuristic the oracle can pull the plug on the learner and attempt to rectify the problem...

For a less high level view of Active Machine Learning see the following literature survey on the topic.

Active machine learning is a supervised learning technique whose goal is to produce adequately labeled (classified) data with as little human interference as possible. The active learning process takes in a small chunk of data which has already been assigned a classification by a human (oracle) with extensive domain knowledge. The learner then uses that data to create a classifier and applies it to a larger set of unlabeled data. The entire learning process aims at keeping human annotation to a minimum -- only referring to the oracle when the cost of querying the data is high.

Active learning typically allows for monitoring of the learning process and offers the human expert the ability to halt learning. If the classification error grows beyond a heuristic the oracle can pull the plug on the learner and attempt to rectify the problem...

For a less high level view of Active Machine Learning see the following literature survey on the topic.

## Tuesday, March 9, 2010

### Brittleness vs. Number of Classes

## Monday, March 8, 2010

### Is kernel effect statistical?

See file kernelWinTieLossValues.xls from http://unbox.org/wisp/var/ekrem/kernel/

In summary:

- Different kernels did not yield a significant difference in performance when compared to each other
- Kernel weighting did not significantly improve the performance of different k values, i.e. there is not a significant difference between k=x and k=x plus kernel

### Fastmap: Mapping objects into a k-dimensional space

The Fastmap algorithm turns data-set instances into a k-dimensional coordinate which can be primarily used for visualization and clustering purposes. Given a set of n instances from a dataset, Fastmap operates by doing the following. Assume we're using euclidean distance as the dissimilarity function and that we're operating in a 2-D space.

1. Arbitrarily choose an instance from the set and then find the farthest instance from it. These instances will serve as pivots to map the rest of our instances. Call these objects Oa and Ob. Each incoming instance Oi will be mapped using the line that exists between Oa and Ob.

2. As each new instance enters the space, apply geometric rules to extrapolate the coordinates of the new point.

Using the Cosine Law we can find Xi and then use the Pythagorean theorem to determine the coordinates of Oi as it relates to the line formed between Oa and Ob.

3. From there we can easily turn our coordinates into a visualization or use the coordinates to aid in clustering.

To move from a 2-D space to a k-D space, see http://portal.acm.org/citation.cfm?id=223812

## Tuesday, March 2, 2010

## Monday, March 1, 2010

### Locality in Defect Prediction

Here, we show how a group of learners performs on 7 of the NASA datasets for software defect prediction both for Within Company and Cross Company.

We also are looking at varying the K for LWL. The greatest change between PD/PF with K values of {5,10,25,50} is 5%, usually around 3%. With changes in PD/PF this small, the actual value of K does not seem to matter.

Two algorithms, Clump and Naive Bayes, are shown both with and without relevancy filtering via the Burak Filter. Although applying the Burak Filter can reduce the variance of the results (for within and cross company when the data is logged), it does not significantly affect the median PD/PF's.

Observing the interaction between PD and PF for all algorithms explored, I cannot see a case where locality is beneficial. The PD vs PF ratio of both the locality based methods and the global methods are almost identical, showing that any gain in PD/loss in PF is at the cost of additional PF/less PD.

We also are looking at varying the K for LWL. The greatest change between PD/PF with K values of {5,10,25,50} is 5%, usually around 3%. With changes in PD/PF this small, the actual value of K does not seem to matter.

Two algorithms, Clump and Naive Bayes, are shown both with and without relevancy filtering via the Burak Filter. Although applying the Burak Filter can reduce the variance of the results (for within and cross company when the data is logged), it does not significantly affect the median PD/PF's.

Observing the interaction between PD and PF for all algorithms explored, I cannot see a case where locality is beneficial. The PD vs PF ratio of both the locality based methods and the global methods are almost identical, showing that any gain in PD/loss in PF is at the cost of additional PF/less PD.

Subscribe to:
Posts (Atom)