Executive summary:
- For our current rig, we have a new optimizer that does interesting things for 2-d goals.
- If we added more dimensions to the goals, then we'd may some really interesting results.
- Abdel:
- We need those distributions for size, effort, defects, have we done this before. That would blow us out to 6-d.
- We need to baseline this against something else. GAs? DEs?
Rapid recursive spectral learning (RRSL):
- A spectral learner learns important parameters by reflecting over training data’s eigenvectors.
- If applied recursively, this approach generates a tree of clusters where each split is defined in terms of the principle component of the data at that level of the tree .
- Traditional recursive spectral learners are slow since the matrix calculations of PCA take polynomial time. Our rapid RRSL algorithm splits data on parameters found by a very fast Nyström approximation [1].
- Any any level in the recursion, RRSL finds two distant points then divides the data on the median position between them. RRSL checks if one these two points dominates the other. If so, then it prunes the dominated half of the data.
- For each leaf cluster generated after domination pruning, RRSL sorts attribute ranges according to how much more frequently they occur in the leaf cluster (compared to the rest of the data). RRSL then generates contrast rules from the first i-th items in that sort. The rules are then applied to data not used in training.
For example:
Recursive spectral learning
Here's RRSL working of a 2-d objective space from feature maps (the goal is most features with fewest violations). In the following:- the thickest black line is the first found dimension
- each thinner line represents a binary split of some thicker line.
Domination pruning
After applying domination pruning, we only get the following eight classes:Range sorting
When RRSL took one of the above clusters and sorted attribute ranges (bu how more often it appears in that cluster vs anything else), we get the following (this list is sorted most important at the top to least important at the bottom).(Aside: Curiously most of the advise is about what not to do.)(_ID_192 0)
(REGISTRATION_ENFORCEMENT 1)
(REGISTRATION 1)
(_ID_14 1)
(_ID_13 1)
(CUSTOMER_PREFERENCES 0)
(WISH_LIST_CONTENT 0)
(SHIPPING_OPTIONS 0)
(_ID_92 0)
(_ID_193 0)
(_ID_237 0)
(_ID_89 0)
(_ID_226 0)
(_ID_73 0)
(PERMISSIONS 0)
(_ID_191 0)
(_ID_233 0)
(_ID_194 0)
(_ID_187 0)
(_ID_186 0)
(_ID_190 0)
(_ID_189 0)
(PREVIOUSLY_VISITED_PAGES 0)
(_ID_196 0)
(DISCOUNTS 0)
(_ID_239 0)
(_ID_76 0)
(_ID_90 0)
(_ID_75 0)
(_ID_68 0)
(_ID_77 0)
(TARGETING_CRITERIA_PREVIOUS_PURCHASES 0)
(WISH_LIST 0)
(_ID_53 0)
(_ID_235 0)
(_ID_236 0)
(_ID_88 0)
(_ID_91 0)
(_ID_66 0)
(EMAIL_WISH_LIST 0)
(_ID_69 0)
Contrast set rule testing
If we build rules using the first "x" items in the above list, then generate 10,000 more projects (rejecting any that contradict the rules), we get the following effects.Note that
- After the first 16 items shown above, further changes become mostly noise.
- In terms of max features with least violations, 6 changes look most interesting.
References
[1] John C. Platt, FastMap, MetricMap, and Landmark MDS are all Nyström Algorithms, Proceedings 10th In. Workshop on Artificial Intelligence and Statistics, 2005..