The cross-company problem: how to find what train is relevant to you:
Why do cross-company learning?
- Cause when you don't have enough local data, you do very badly
- In the following, we are training and test on very small data sets (lo, median, hi) = 6, 20, 65 instances
So, lets reach out across data sets and compare.
Two cross-company selection filters
- Burak: N things nearest the test data (shown in gray)
- Peters: Cluster the train data, find the clusters with the test data
Note that the Peters filter uses the structure of the train data to guide the initial selection of the data.
- Intuition behind Peters' filter:
- there is more experience in the repo than with you. So use it to guide you
In the following
- Train on selected members of the 46 data sets in the repo (lo, med, hi) = (109, 293,885) instances
- g = 2*(1 - pf)*pd / (1 - pf + pd)
- The last column is the delta between peters and burak Filter
- Delta is usually positive and large