ai @ wvu: Timm: What I do

Sunday, June 23, 2013

Timm: What I do

For years I've been thinking that our learners are really a menu of options that we can mix and mash to achieve other goals.

Why Bother?

Its good science
Its fun
I've become a little tired of the same dull old data mining in SE paper. Sure, we can get results from data miners on SE data sets. Ok. Next?
Recent papers report that there’s little to be gained from such algorithm mining because the “improvements” found from this approach are marginal, at best:

for example, for effort estimation and defect prediction,
simpler data miners do just as well or better than more elaborate ones [Hall12], [Dejaeger12]
And, there is tremendous variance in outputs when the same learners are applied to the same data by different people

And if it works, what would we expect?

Simpler implementations.

If we cut down to the essence, then the essence should be less than the confusion we cleared away.

Faster implementations.

Again, if we cut the crap, the rest should run faster.

Scalable inference.

Era of big data. Need scalability.

More services:

the parts should be mashable into new services, not currently offered.

What new services?

Local learning
Decision mining:

aka weighted contrast set learning
planning (what to do)

learning decisions from now to better

monitoring (what to avoid)

learning decisions from now to worse

Discussion mining

Learn how a community's bias by what decisions they take, avoid, retake, undo

Temporal reasoning.

Anytime inference. Recognition of anomalies. Pinpointing which part of it all needs updating. Updating just that section.

Privacy:

Can do

Quality assurance

Addressing the issues of Shepperd et al. TSE 2013

Transfer learning:

keeping what can be kept, changing what needs to change as we move

across organizations (cross-company learning)
across time (dataset shit)

MOEA :

the distinction between traditional learning and optimization is bogus.

Inference at the business level:

ability to model business goals and tune the learners appropriately.

What business goals:

From Buse & Zimmermann: Information needs, 100+ managers, Microsoft

Under the hood (the current implementation):

Fast spectral learning (map to eigenvectors, cluster, reason from there)

O(N) inference with Fastmap
Defect data set (POI-3.). Green=less defective

Cluster in 2D space (fast)
Find neighboring clusters C1,C2 where scores(C1) > scores(C2)
Learn contrast C1 to C2

Papers

Learning to change projects, PROMISE'12 (preliminary)
Peeking : submitted to ASE'13

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)