ai @ wvu: The Peters Filter

Thursday, November 8, 2012

The Peters Filter

The cross-company problem: how to find what train is relevant to you:

Why do cross-company learning?

Cause when you don't have enough local data, you do very badly
In the following, we are training and test on very small data sets (lo, median, hi) = 6, 20, 65 instances

So, lets reach out across data sets and compare.

Two cross-company selection filters

Burak: N things nearest the test data (shown in gray)
Peters: Cluster the train data, find the clusters with the test data

Note that the Peters filter uses the structure of the train data to guide the initial selection of the data.

Why?

Intuition behind Peters' filter:
there is more experience in the repo than with you. So use it to guide you

In the following

Train on selected members of the 46 data sets in the repo (lo, med, hi) = (109, 293,885) instances
g = 2*(1 - pf)*pd / (1 - pf + pd)
The last column is the delta between peters and burak Filter
Delta is usually positive and large

The

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)