ai @ wvu: April 2014

Monday, April 28, 2014

Results of contrast set learning techniques

What was done using different techniques:

1. Cluster jplflight(-->C1) with xy_proj.py -->C2

2. Build decision trees using xy_dt.py

3. Use diff.py to get decisions(contrast sets) to be made for worse cluster to be better cluster.

4. Using the contrast sets generate 500 samples with gen.py. (used xomo)-->C3

5. Compare initial clusters to newly generated data.

6. Represent results as in fig 9 of http://menzies.us/pdf/12gense.pdf .

Techniques:
T0: asIs
T2 =C1+C3
T3 = C2+C3

Flight data

 Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              14               9    #
           T2 m               0               3               0               1    #
           T3 m               0               4               0               0    #
           T0 q              32              17              21              26    #
           T2 q               0               0               1               2    #
           T3 q               0               0               1               2    #
           T0 w             100             100             100             100    #
           T2 w               2               7              25              15    #
           T3 w               2               7              21              13    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          4340.4             0.2    #

Ground data

  Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              15              10    #
           T2 m               0               3               0               1    #
           T3 m               0               3               0               0    #
           T0 q              32              17              21              27    #
           T2 q               0               0               1               3    #
           T3 q               0               0               1               3    #
           T0 w             100             100             100             100    #
           T2 w               1               7              18              17    #
           T3 w               1               6              16              15    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          4340.4             0.2    #

Osp data

 Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              12               0    #
           T2 m               0               4               0              12    #
           T3 m               0               4               0              12    #
           T0 q              32              18              19              19    #
           T2 q               0               0               1               9    #
           T3 q               0               0               0               8    #
           T0 w             100             100             100             100    #
           T2 w               1               6              24              33    #
           T3 w               1               6              20              33    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          6021.0             0.2    #

Osp2 data

Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              14               3    #
           T2 m               0               4               0               0    #
           T3 m               0               4               0               0    #
           T0 q              32              18              21              22    #
           T2 q               0               0               0               2    #
           T3 q               0               0               0               2    #
           T0 w             100             100             100             100    #
           T2 w               1               6              14              14    #
           T3 w               1               6              11              14    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          6021.0             0.2    #

Tuesday, April 22, 2014

Learning from Version Deltas: A side-quest wrapped up... mostly

Update 5/6: Business case with test on i+1

Update 4/24: No tuning, all tunings, and top tunings

To complicate things a little more, let's add another variable!

Curiously, I was unable to replicate the previous results without parameter tuning

Using paramaterless Gaussian Bayes only, there is little difference between HI

I repeated using parameter tuning, but calculating stats based on ALL results rather than only top-ranked results

These are using 30 random train/test splits, but...
These are only using 2 out of 3 learners to save param tune time

3-learner results can come later, but from what I've seen, 2vs3 doesn't matter

Results with top-ranked parameters > results with all parameters > results with no parameters

Results with No Parameters:

Label , A12, U, p, meanA, meanB

HI 4 > HI 0 , 0.000, 0, 0.500000, 0.771908, 0.680063

HI 4 > HI 1 , 0.000, 0, 0.500000, 0.771908, 0.758529

HI 4 > HI 2 , 0.000, 0, 0.500000, 0.771908, 0.758749

HI 4 > HI 3 , 1.000, 0, 0.500000, 0.771908, 0.790568

HI 3 > HI 0 , 0.600, 11, 0.417266, 0.574222, 0.547370

HI 3 > HI 1 , 0.560, 9, 0.265435, 0.574222, 0.595885

HI 3 > HI 2 , 0.320, 9, 0.265435, 0.574222, 0.582948

HI 2 > HI 0 , 0.281, 85, 0.282867, 0.601895, 0.561865

HI 2 > HI 1 , 0.862, 90, 0.365195, 0.601895, 0.595265

HI 1 > HI 0 , 0.023, 216, 0.145822, 0.596752, 0.544316

Results with All Parameters:

Label , A12, U, p, meanA, meanB

HI 4 > HI 0 , 0.000, 0, 0.500000, 0.730214, 0.677388

HI 4 > HI 1 , 1.000, 0, 0.500000, 0.730214, 0.745145

HI 4 > HI 2 , 1.000, 0, 0.500000, 0.730214, 0.754882

HI 4 > HI 3 , 1.000, 0, 0.500000, 0.730214, 0.750597

HI 3 > HI 0 , 0.200, 8, 0.201698, 0.589507, 0.527802

HI 3 > HI 1 , 0.360, 11, 0.417266, 0.589507, 0.619794

HI 3 > HI 2 , 0.400, 12, 0.500000, 0.589507, 0.609074

HI 2 > HI 0 , 0.281, 74, 0.140122, 0.623904, 0.552588

HI 2 > HI 1 , 0.699, 97, 0.490836, 0.623904, 0.618135

HI 1 > HI 0 , 0.234, 188, 0.047494, 0.618359, 0.550335

Results with Top-Ranked Parameters:

Label , A12, U, p, meanA, meanB

HI 4 > HI 0 , 0.000, 0, 0.500000, 0.909430, 0.766430

HI 4 > HI 1 , 0.000, 0, 0.500000, 0.909430, 0.869284

HI 4 > HI 2 , 0.000, 0, 0.500000, 0.909430, 0.857104

HI 4 > HI 3 , 1.000, 0, 0.500000, 0.909430, 0.911470

HI 3 > HI 0 , 0.160, 6, 0.105038, 0.905671, 0.787974

HI 3 > HI 1 , 0.160, 5, 0.071836, 0.905671, 0.808840

HI 3 > HI 2 , 0.160, 7, 0.148135, 0.905671, 0.824976

HI 2 > HI 0 , 0.066, 64, 0.061872, 0.831493, 0.760456

HI 2 > HI 1 , 0.071, 85, 0.282867, 0.831493, 0.801677

HI 1 > HI 0 , 0.119, 192, 0.056850, 0.788835, 0.734083

Original Post

OK, to start off with, HI = History Index = number of past deltas included

ant 1.7 with HI=3 would include deltas from ant 1.6, and 1.5, and ant 1.4
ant 1.7 with HI=0 would included no deltas (just the original set)

The results below come from comparing only the top-ranked param tuning results on each delta

Tuesday, April 1, 2014

JPL results in new format

Techniques         -effort         -months        -defects          -risks    #
           T0 m            35.0            73.0            11.0             9.0    #
           T1 m             8.0            55.0             3.0             0.0    #
           T2 m             2.0            28.0             0.0             2.0    #
           T3 m             2.0            28.0             1.0             2.0    #
           T0 q            19.0             9.0            17.0            28.0    #
           T1 q             1.0             5.0             7.0            30.0    #
           T2 q             0.0             0.0             2.0            16.0    #
           T3 q             0.0             0.0             3.0            16.0    #
           T0 w           100.0           100.0            76.0           100.0    #
           T1 w            49.0            83.0           100.0            47.0    #
           T2 w            50.0            47.0            35.0            34.0    #
           T3 w            48.0            47.0            42.0            34.0    #