Sunday, June 23, 2013

Timm: What I do

For years I've been thinking that our learners are really a menu of options that we can mix and mash to achieve other goals.

Why Bother?

  1. Its good science
  2. Its fun
  3. I've become a little tired of the same dull old data mining in SE paper. Sure, we can get results from data miners on SE data sets. Ok. Next? 
  4. Recent papers report that there’s little to be gained from such algorithm mining because the “improvements” found from this approach are marginal, at best:

And if it works, what would we expect?

  • Simpler implementations.
    •  If we cut down to the essence, then the essence should be less than the confusion we cleared away.
  • Faster implementations. 
    • Again, if we cut the crap, the rest should run faster.
  • Scalable inference. 
    • Era of big data. Need scalability. 
  • More services: 
    • the parts should be mashable into new services, not currently offered.

What new services?

  • Local learning
  • Decision mining:
    • aka weighted contrast set learning
    • planning (what to do)
      • learning decisions from now to better
    • monitoring (what to avoid)
      • learning decisions from now to worse
  • Discussion mining
    • Learn how a community's bias by what decisions they take, avoid, retake, undo
  • Temporal reasoning. 
    • Anytime inference. Recognition of anomalies. Pinpointing which part of it all needs updating. Updating just that section.
  • Privacy:
    • Can do
  • Quality assurance
  • Transfer learning:
    •  keeping what can be kept, changing what needs to change as we move
      •  across organizations (cross-company learning)
      •  across time (dataset shit)
  • MOEA :
    •  the distinction between traditional learning and optimization is bogus. 
  • Inference at the business level: 
    • ability to model business goals and tune the learners appropriately.

What business goals:

From Buse & Zimmermann: Information needs, 100+ managers, Microsoft

Under the hood (the current implementation):

Fast spectral learning (map to eigenvectors, cluster, reason from there)
  • O(N) inference with Fastmap
  • Defect data set (POI-3.). Green=less defective
  • Cluster in 2D space (fast)
  • Find neighboring clusters C1,C2 where scores(C1) > scores(C2)
  • Learn contrast C1 to C2 

