Why Bother?
- Its good science
- Its fun
- I've become a little tired of the same dull old data mining in SE paper. Sure, we can get results from data miners on SE data sets. Ok. Next?
- Recent papers report that there’s little to be gained from such algorithm mining because the “improvements” found from this approach are marginal, at best:
- for example, for effort estimation and defect prediction,
- simpler data miners do just as well or better than more elaborate ones [Hall12], [Dejaeger12]
- And, there is tremendous variance in outputs when the same learners are applied to the same data by different people
And if it works, what would we expect?
- Simpler implementations.
- If we cut down to the essence, then the essence should be less than the confusion we cleared away.
- Faster implementations.
- Again, if we cut the crap, the rest should run faster.
- Scalable inference.
- Era of big data. Need scalability.
- More services:
- the parts should be mashable into new services, not currently offered.
What new services?
- Local learning
- Decision mining:
- aka weighted contrast set learning
- planning (what to do)
- learning decisions from now to better
- monitoring (what to avoid)
- learning decisions from now to worse
- Discussion mining
- Learn how a community's bias by what decisions they take, avoid, retake, undo
- Temporal reasoning.
- Anytime inference. Recognition of anomalies. Pinpointing which part of it all needs updating. Updating just that section.
- Privacy:
- Can do
- Quality assurance
- Addressing the issues of Shepperd et al. TSE 2013
- Transfer learning:
- keeping what can be kept, changing what needs to change as we move
- across organizations (cross-company learning)
- across time (dataset shit)
- MOEA :
- the distinction between traditional learning and optimization is bogus.
- Inference at the business level:
- ability to model business goals and tune the learners appropriately.
What business goals:
From Buse & Zimmermann: Information needs, 100+ managers, Microsoft
Under the hood (the current implementation):
Fast spectral learning (map to eigenvectors, cluster, reason from there)
- O(N) inference with Fastmap
- Defect data set (POI-3.). Green=less defective
- Cluster in 2D space (fast)
- Find neighboring clusters C1,C2 where scores(C1) > scores(C2)
- Learn contrast C1 to C2
Papers
- Learning to change projects, PROMISE'12 (preliminary)
- Peeking : submitted to ASE'13
No comments:
Post a Comment