Thursday, September 25, 2014

New Results Format

10/30/14 - Rank Sums, NSGAII-style Selection

Random Forest

Table of Rank Sums Across All Data-sets
60  : Default Cur -> Cur
35  : Tuned Prev -> Cur
28  : Tuned Cur -> Cur
  0  : Default Prev -> Cur

Overall Rankings




Params (387 permutations)
('bootstrap', ['values', True])
('min_samples_leaf', ['values', 1])
('n_estimators', ['values', 8, 16, 32])
('min_samples_split', ['values', 2])
('criterion', ['values', 'gini'])
('max_features', ['values', 2, 4, 8, 16])
('max_depth', ['values', 2, 4, 6, 8, 10, 12, 14, 16, 18])

Bernoulli Bayes

Table of Rank Sums Across All Data-sets
29  : Tuned Cur -> Cur
28  : Tuned Prev -> Cur
21  : Default Cur -> Cur
  9  : Default Prev -> Cur

Overall Rankings




Params (50 permutations)
('binarize', ['values', 0.0, 0.2, 0.4, 0.6, 0.8])
('alpha', ['values', 0.0, 0.2, 0.4, 0.6, 0.8])
('fit_prior', ['values', True, False])

Logistic Regression

Table of Rank Sums Across All Data-sets
31  : Default Cur -> Cur
27  : Tuned Cur -> Cur
24  : Tuned Prev -> Cur
  5  : Default Prev -> Cur

Overall Rankings




Params (36 permutations)
('penalty', ['values', 'l1', 'l2'])
('C', ['values', 0.5, 1, 2])
('class_weight', ['values', None])
('intercept_scaling', ['values', 0.5, 1, 2])
('fit_intercept', ['values', True, False])


10/16/14 - Summary results, 3 frontiers, and F-measure rankings.

The rig used for these results is different (simplified) in the following ways:
-RF only
-One parameter (maxDepth) swept from 0 to 16
-All other parameters fixed
-Three ND frontiers are returned from tuning instead of one.
-Due to multiple frontiers, pD/pF AUC is out. F-measure is used as ranking field instead.



New Results Format

Link to the results: (I think landscape is easiest to see)
https://www.dropbox.com/sh/iowdac5ki9hyyw4/AAAzRAdOf0pI571udSY0qf-ya?dl=0

In this format, the top item is a comparison of the prev and current version dataset stats. These are the usual suspects plus "overlapping instances" where the software module name is the same in both versions and "identical instances" where the software module's metrics are unchanged from one version to the next.

The rest of the chart shows the results of parameter tuning on both the previous and current versions. There is a table for each learner which lists all of its explored parameter values and the frequency with which they were selected by the grid search in both the previous and current versions. For example, a parameter value of "False" appearing in 90% of the selected combinations in the previous version and 43% of the current version selections would be represented as "False: (90/43)".

It also shows the pD/pF performance of each learner's non-dominated turnings applied in-set and out of set. in this case we have four combinations:

  • tune on prev -> apply in-version
  • tune on prev -> apply out-of-version (current)
  • tune on current -> apply in-version
  • tune on current -> apply out-of-version (prev)
In this case, the major effect that we see is that the green sticks with the blue and the red sticks with the purple. This scenario arises when one dataset is more difficult to preform well on than the other. Beyond that, the performance in-version and out-of version seem pretty comparable. There are occasional exceptions, but not a real trend towards in-version or out-of-version doing better.

No comments:

Post a Comment