Monday, April 28, 2014

Results of contrast set learning techniques


Results of contrast set learning techniques 

What was done using different techniques:

1. Cluster jplflight(-->C1) with xy_proj.py -->C2
2. Build decision trees using xy_dt.py
3. Use diff.py to get decisions(contrast sets) to be made for worse cluster to be better cluster.
4. Using the contrast sets generate 500 samples with gen.py. (used xomo)-->C3
5. Compare initial clusters to newly generated data.
6. Represent results as in fig 9 of http://menzies.us/pdf/12gense.pdf .

Techniques:
T0: asIs
T2 =C1+C3 
T3 = C2+C3


Flight data

 Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              14               9    #
           T2 m               0               3               0               1    #
           T3 m               0               4               0               0    #
           T0 q              32              17              21              26    #
           T2 q               0               0               1               2    #
           T3 q               0               0               1               2    #
           T0 w             100             100             100             100    #
           T2 w               2               7              25              15    #
           T3 w               2               7              21              13    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          4340.4             0.2    #

Ground data

  Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              15              10    #
           T2 m               0               3               0               1    #
           T3 m               0               3               0               0    #
           T0 q              32              17              21              27    #
           T2 q               0               0               1               3    #
           T3 q               0               0               1               3    #
           T0 w             100             100             100             100    #
           T2 w               1               7              18              17    #
           T3 w               1               6              16              15    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          4340.4             0.2    #


Osp data

 Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              12               0    #
           T2 m               0               4               0              12    #
           T3 m               0               4               0              12    #
           T0 q              32              18              19              19    #
           T2 q               0               0               1               9    #
           T3 q               0               0               0               8    #
           T0 w             100             100             100             100    #
           T2 w               1               6              24              33    #
           T3 w               1               6              20              33    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          6021.0             0.2    #


Osp2 data

Techniques         -effort         -months        -defects          -risks    #
           T0 m              43              74              14               3    #
           T2 m               0               4               0               0    #
           T3 m               0               4               0               0    #
           T0 q              32              18              21              22    #
           T2 q               0               0               0               2    #
           T3 q               0               0               0               2    #
           T0 w             100             100             100             100    #
           T2 w               1               6              14              14    #
           T3 w               1               6              11              14    #
            100         30166.1            88.6         27118.6             1.8    #
              0          9598.8            16.4          6021.0             0.2    #

Tuesday, April 22, 2014

Learning from Version Deltas: A side-quest wrapped up... mostly

Update 5/6: Business case with test on i+1







Update 4/24: No tuning, all tunings, and top tunings

To complicate things a little more, let's add another variable!

  • Curiously, I was unable to replicate the previous results without parameter tuning
    • Using paramaterless Gaussian Bayes only, there is little difference between HI
  • I repeated using parameter tuning, but calculating stats based on ALL results rather than only top-ranked results
    • These are using 30 random train/test splits, but...
    • These are only using 2 out of 3 learners to save param tune time
      • 3-learner results can come later, but from what I've seen, 2vs3 doesn't matter
  • Results with top-ranked parameters > results with all parameters > results with no parameters

Results with No Parameters:


             Label             ,   A12,   U,        p,    meanA,    meanB
         HI 4  >  HI 0         , 0.000,   0, 0.500000, 0.771908, 0.680063
         HI 4  >  HI 1         , 0.000,   0, 0.500000, 0.771908, 0.758529
         HI 4  >  HI 2         , 0.000,   0, 0.500000, 0.771908, 0.758749
         HI 4  >  HI 3         , 1.000,   0, 0.500000, 0.771908, 0.790568
         HI 3  >  HI 0         , 0.600,  11, 0.417266, 0.574222, 0.547370
         HI 3  >  HI 1         , 0.560,   9, 0.265435, 0.574222, 0.595885
         HI 3  >  HI 2         , 0.320,   9, 0.265435, 0.574222, 0.582948
         HI 2  >  HI 0         , 0.281,  85, 0.282867, 0.601895, 0.561865
         HI 2  >  HI 1         , 0.862,  90, 0.365195, 0.601895, 0.595265
         HI 1  >  HI 0         , 0.023, 216, 0.145822, 0.596752, 0.544316


Results with All Parameters:


             Label             ,   A12,   U,        p,    meanA,    meanB
         HI 4  >  HI 0         , 0.000,   0, 0.500000, 0.730214, 0.677388
         HI 4  >  HI 1         , 1.000,   0, 0.500000, 0.730214, 0.745145
         HI 4  >  HI 2         , 1.000,   0, 0.500000, 0.730214, 0.754882
         HI 4  >  HI 3         , 1.000,   0, 0.500000, 0.730214, 0.750597
         HI 3  >  HI 0         , 0.200,   8, 0.201698, 0.589507, 0.527802
         HI 3  >  HI 1         , 0.360,  11, 0.417266, 0.589507, 0.619794
         HI 3  >  HI 2         , 0.400,  12, 0.500000, 0.589507, 0.609074
         HI 2  >  HI 0         , 0.281,  74, 0.140122, 0.623904, 0.552588
         HI 2  >  HI 1         , 0.699,  97, 0.490836, 0.623904, 0.618135
         HI 1  >  HI 0         , 0.234, 188, 0.047494, 0.618359, 0.550335

Results with Top-Ranked Parameters:


    Label             ,   A12,   U,        p,    meanA,    meanB
         HI 4  >  HI 0         , 0.000,   0, 0.500000, 0.909430, 0.766430
         HI 4  >  HI 1         , 0.000,   0, 0.500000, 0.909430, 0.869284
         HI 4  >  HI 2         , 0.000,   0, 0.500000, 0.909430, 0.857104
         HI 4  >  HI 3         , 1.000,   0, 0.500000, 0.909430, 0.911470
         HI 3  >  HI 0         , 0.160,   6, 0.105038, 0.905671, 0.787974
         HI 3  >  HI 1         , 0.160,   5, 0.071836, 0.905671, 0.808840
         HI 3  >  HI 2         , 0.160,   7, 0.148135, 0.905671, 0.824976
         HI 2  >  HI 0         , 0.066,  64, 0.061872, 0.831493, 0.760456
         HI 2  >  HI 1         , 0.071,  85, 0.282867, 0.831493, 0.801677
         HI 1  >  HI 0         , 0.119, 192, 0.056850, 0.788835, 0.734083


Original Post

OK, to start off with, HI = History Index = number of past deltas included
  • ant 1.7 with HI=3 would include deltas from ant 1.6, and 1.5, and ant 1.4 
  • ant 1.7 with HI=0 would included no deltas (just the original set)

The results below come from comparing only the top-ranked param tuning results on each delta












Tuesday, April 1, 2014

JPL results in new format



Techniques         -effort         -months        -defects          -risks    #
           T0 m            35.0            73.0            11.0             9.0    #
           T1 m             8.0            55.0             3.0             0.0    #
           T2 m             2.0            28.0             0.0             2.0    #
           T3 m             2.0            28.0             1.0             2.0    #
           T0 q            19.0             9.0            17.0            28.0    #
           T1 q             1.0             5.0             7.0            30.0    #
           T2 q             0.0             0.0             2.0            16.0    #
           T3 q             0.0             0.0             3.0            16.0    #
           T0 w           100.0           100.0            76.0           100.0    #
           T1 w            49.0            83.0           100.0            47.0    #
           T2 w            50.0            47.0            35.0            34.0    #
           T3 w            48.0            47.0            42.0            34.0    #