Sunday, July 13, 2014

Results all models


Flight

 Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              23              67               5               0    #
   2 Bef disc m               3              14               4               2    #
   3 Aft disc m               9              30               6               2    #
    4 T9:j/j_ m               0              33               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              21              16               8               1    #
   2 Bef disc q               3               0               5               3    #
   3 Aft disc q               9               6               8               6    #
    4 T9:j/j_ q               3              21               1              38    #
-------------------------------------------------------------------------------------
         1 T0 w              79             100              36              17    #
   2 Bef disc w              29              28              46               8    #
   3 Aft disc w             100              66             100              14    #
    4 T9:j/j_ w              24              98              19             100    #
-------------------------------------------------------------------------------------
            100         3500.22           42.38        50660.52             8.6    #
              0          117.21            2.51          381.83             0.0    # 
 
Ground

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m               5              43               3              12    #
   2 Bef disc m               0              12               3               6    #
   3 Aft disc m               9              49               3               0    #
4 T9:j/jground m               0              32               0              10    #
-------------------------------------------------------------------------------------
         1 T0 q               4               8               5              17    #
   2 Bef disc q               0               0               5               7    #
   3 Aft disc q              10              12               5               9    #
4 T9:j/jground q               5              35               1              16    #
-------------------------------------------------------------------------------------
         1 T0 w              31              69              34              48    #
   2 Bef disc w              16              24              37              22    #
   3 Aft disc w             100             100             100              49    #
4 T9:j/jground w              40              78              34             100    #
-------------------------------------------------------------------------------------
            100         6628.14           55.18         76485.1             4.8    #
              0          254.49            2.97          806.33            0.05    #
 
Osp
 
 Techniques         -effort         -months        -defects          -risks    #
         1 T0 m               9              36               5              36    #
   2 Bef disc m               0               8               1              12    #
   3 Aft disc m               8              44               3               0    #
  4 T9:j/josp m               1              29               0              11    #
-------------------------------------------------------------------------------------
         1 T0 q               2               2               3              32    #
   2 Bef disc q               0               0               1               5    #
   3 Aft disc q               9              12               4               3    #
  4 T9:j/josp q               1              18               1              29    #
-------------------------------------------------------------------------------------
         1 T0 w              17              43              14              65    #
   2 Bef disc w               7              17              15              32    #
   3 Aft disc w             100             100             100              29    #
  4 T9:j/josp w              31              64               5             100    #
-------------------------------------------------------------------------------------
            100         9665.91           63.82       103492.82             7.5    #
              0          142.78            2.31          722.41            0.27    #
 
Osp2
 
Techniques         -effort         -months        -defects          -risks    #
         1 T0 m               9              65               1              10    #
   2 Bef disc m               9              61               1               0    #
   3 Aft disc m               7              42               3               3    #
 4 T9:j/josp2 m               0              39               0              13    #
-------------------------------------------------------------------------------------
         1 T0 q               1               0               0              12    #
   2 Bef disc q               2               4               0              20    #
   3 Aft disc q               8               8               5               5    #
 4 T9:j/josp2 q               3              17               2              31    #
-------------------------------------------------------------------------------------
         1 T0 w              17              79               5              20    #
   2 Bef disc w              21              85               8              20    #
   3 Aft disc w             100             100             100              26    #
 4 T9:j/josp2 w              18              93              12             100    #
-------------------------------------------------------------------------------------
            100         5937.27            35.8        59880.52             5.1    #
              0          130.08            2.84           637.4             0.0    #
  
  
All
 
Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              13              47               4              20    #
   2 Bef disc m              11              32              10               7    #
   3 Aft disc m               6              24               3               0    #
  4 T9:j/jall m               0              34               0              12    #
-------------------------------------------------------------------------------------
         1 T0 q              14              12               7              37    #
   2 Bef disc q              11               3              13              10    #
   3 Aft disc q               7               0               4               5    #
  4 T9:j/jall q               5              18               2              36    #
-------------------------------------------------------------------------------------
         1 T0 w              80             100              52             100    #
   2 Bef disc w              88              64             100              25    #
   3 Aft disc w             100              55              69              20    #
  4 T9:j/jall w              22              79              10              84    #
-------------------------------------------------------------------------------------
            100         3597.84           49.22        55387.16            5.67    #
              0          262.02            5.17         1028.96            0.11    #
  
Flight tree
 58  flex <= 27.5 samples = 941
    59 |- docu <= 6.5 samples = 483
    60 |-|- ['__7']  # samples = 31 # branch_id = 0
    61 |-|- team <= 27.0 samples = 452
    62 |-|-|- ['__2']  # samples = 440 # branch_id = 1
    63 |-|-|- ['__6']  # samples = 12 # branch_id = 2
    64 |- ruse <= 35.5 samples = 458
    65 |-|- docu <= 38.5 samples = 50
    66 |-|-|- team <= 21.5 samples = 30
    67 |-|-|-|- ['__4']  # samples = 14 # branch_id = 3
    68 |-|-|-|- ['__1']  # samples = 16 # branch_id = 4
    69 |-|-|- ['__2']  # samples = 20 # branch_id = 5
    70 |-|- ['__1']  # samples = 408 # branch_id = 6
 
Ground tree
 pr <= 12.5 samples = 918
    63 |- resl <= 17.5 samples = 191
    64 |-|- ['__7']  # samples = 28 # branch_id = 0
    65 |-|- ['__2']  # samples = 163 # branch_id = 1
    66 |- rely <= 19.5 samples = 727
    67 |-|- pcon <= 24.5 samples = 44
    68 |-|-|- ['__5']  # samples = 28 # branch_id = 2
    69 |-|-|- ['__4']  # samples = 16 # branch_id = 3
    70 |-|- resl <= 30.0 samples = 683
    71 |-|-|- ['__1']  # samples = 651 # branch_id = 4
    72 |-|-|- ['__1']  # samples = 32 # branch_id = 5
 
Osp Tree
 etat <= 5.5 samples = 939
    49 |- ['__2']  # samples = 47 # branch_id = 0
    50 |- time <= 1.5 samples = 892
    51 |-|- ['__4']  # samples = 12 # branch_id = 1
    52 |-|- ['__1']  # samples = 880 # branch_id = 2
 
Osp2 Tree
 69  etat <= 22.5 samples = 932
    70 |- ltex <= 33.5 samples = 369
    71 |-|- ruse <= 11.0 samples = 52
    72 |-|-|- sced <= 20.5 samples = 28
    73 |-|-|-|- ['__4']  # samples = 11 # branch_id = 0
    74 |-|-|-|- ['__7']  # samples = 17 # branch_id = 1
    75 |-|-|- ['__5']  # samples = 24 # branch_id = 2
    76 |-|- ltex <= 40.5 samples = 317
    77 |-|-|- pr <= 12.5 samples = 47
    78 |-|-|-|- ['__2']  # samples = 35 # branch_id = 3
    79 |-|-|- ['__2']  # samples = 270 # branch_id = 4
    80 |- aa <= 6.5 samples = 563
    81 |-|- ['__1']  # samples = 146 # branch_id = 5
    82 |-|- prec <= 22.0 samples = 417
    83 |-|-|- ['__3']  # samples = 43 # branch_id = 6
    84 |-|-|- ['__3']  # samples = 374 # branch_id = 7
 
All Tree
75  pr <= 25.5 samples = 824
    76 |- site <= 20.5 samples = 283
    77 |-|- site <= 9.5 samples = 101
    78 |-|-|- site <= 7.5 samples = 69
    79 |-|-|-|- ['__3']  # samples = 14 # branch_id = 0
    80 |-|-|-|- ['__9']  # samples = 55 # branch_id = 1
    81 |-|-|- ruse <= 16.5 samples = 32
    82 |-|-|-|- ['__5']  # samples = 15 # branch_id = 2
    83 |-|-|-|- ['__7']  # samples = 17 # branch_id = 3
    84 |-|- flex <= 26.5 samples = 182
    85 |-|-|- ['__2']  # samples = 150 # branch_id = 4
    86 |-|-|- team <= 48.0 samples = 32
    87 |-|-|-|- ['__8']  # samples = 17 # branch_id = 5
    88 |-|-|-|- ['__2']  # samples = 15 # branch_id = 6
    89 |- prec <= 35.0 samples = 541
    90 |-|- flex <= 22.0 samples = 289
    91 |-|-|- docu <= 24.5 samples = 47
    92 |-|-|-|- ['__6']  # samples = 14 # branch_id = 7
    93 |-|-|-|- ['__5']  # samples = 33 # branch_id = 8
    94 |-|-|- team <= 37.5 samples = 242
    95 |-|-|-|- aexp <= 23.5 samples = 26
    96 |-|-|-|-|- ['__8']  # samples = 11 # branch_id = 9
    97 |-|-|-|-|- ['__1']  # samples = 15 # branch_id = 10
    98 |-|-|-|- ['__1']  # samples = 216 # branch_id = 11
    99 |-|- aexp <= 10.5 samples = 252
   100 |-|-|- ['__1']  # samples = 14 # branch_id = 12
   101 |-|-|- ['__3']  # samples = 238 # branch_id = 13
 

TEAK on dtree learning results


TEAK works:

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              29              66              10               0    #
   2 Bef disc m               9              17               9               2    #
   3 Aft disc m               7              17              11               2    #
    4 T9:j/j_ m               0              32               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              28              16              16               2    #
   2 Bef disc q               9               0              12               3    #
   3 Aft disc q               7               0              14               3    #
    4 T9:j/j_ q               3              20               3              38    #
-------------------------------------------------------------------------------------
         1 T0 w             100             100              71              17    #
   2 Bef disc w              56              33              89               8    #
   3 Aft disc w              54              34             100               8    #
    4 T9:j/j_ w              30              97              38             100    #
-------------------------------------------------------------------------------------
            100         2856.84            42.5        25882.79             8.6    #
              0          117.21            3.05          381.83             0.0    #
 
But trees are still big:
 
Bef disc:
 
111  $docu <= 3.5 samples = 500
   112 |- $rely <= 3.5 samples = 386
   113 |-|- $pcap <= 3.5 samples = 195
   114 |-|-|- $flex <= 1.5 samples = 102
   115 |-|-|-|- $aexp <= 2.5 samples = 31
   116 |-|-|-|-|- ['__3', '__5']  # samples = 13 # branch_id = 0
   117 |-|-|-|-|- ['__2']  # samples = 18 # branch_id = 1
   118 |-|-|-|- $pcon <= 1.5 samples = 71
   119 |-|-|-|-|- ['__11']  # samples = 15 # branch_id = 2
   120 |-|-|-|-|- $team <= 3.5 samples = 56
   121 |-|-|-|-|-|- $kloc <= 161.0 samples = 42
   122 |-|-|-|-|-|-|- ['__7']  # samples = 16 # branch_id = 3
   123 |-|-|-|-|-|-|- $aexp <= 1.5 samples = 26
   124 |-|-|-|-|-|-|-|- ['__3']  # samples = 12 # branch_id = 4
   125 |-|-|-|-|-|-|-|- ['__3']  # samples = 14 # branch_id = 5
   126 |-|-|-|-|-|- ['__2']  # samples = 14 # branch_id = 6
   127 |-|-|- $ruse <= 3.5 samples = 93
   128 |-|-|-|- $team <= 3.5 samples = 40
   129 |-|-|-|-|- $cplx <= 3.5 samples = 22
   130 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 7
   131 |-|-|-|-|-|- ['__9']  # samples = 11 # branch_id = 8
   132 |-|-|-|-|- ['__3', '__8', '__18']  # samples = 18 # branch_id = 9
   133 |-|-|-|- $resl <= 3.5 samples = 53
   134 |-|-|-|-|- $resl <= 1.5 samples = 39
   135 |-|-|-|-|-|- ['__12', '__13']  # samples = 16 # branch_id = 10
   136 |-|-|-|-|-|- $prec <= 2.5 samples = 23
   137 |-|-|-|-|-|-|- ['__11']  # samples = 11 # branch_id = 11
   138 |-|-|-|-|-|-|- ['__14']  # samples = 12 # branch_id = 12
   139 |-|-|-|-|- ['__5', '__10']  # samples = 14 # branch_id = 13
   140 |-|- $team <= 3.5 samples = 191
   141 |-|-|- $flex <= 1.5 samples = 141
   142 |-|-|-|- $kloc <= 262.0 samples = 30
   143 |-|-|-|-|- ['__12', '__14']  # samples = 16 # branch_id = 14
   144 |-|-|-|-|- ['__1', '__5', '__10', '__12', '__13']  # samples = 14 # branch_id = 15
   145 |-|-|-|- $site <= 4.5 samples = 111
   146 |-|-|-|-|- $flex <= 2.5 samples = 89
   147 |-|-|-|-|-|- $ruse <= 3.5 samples = 27
   148 |-|-|-|-|-|-|- ['__4', '__16']  # samples = 12 # branch_id = 16
   149 |-|-|-|-|-|-|- ['__12']  # samples = 15 # branch_id = 17
   150 |-|-|-|-|-|- $docu <= 2.5 samples = 62
   151 |-|-|-|-|-|-|- $pr <= 3.5 samples = 47
   152 |-|-|-|-|-|-|-|- $flex <= 3.5 samples = 25
   153 |-|-|-|-|-|-|-|-|- ['__13']  # samples = 11 # branch_id = 18
   154 |-|-|-|-|-|-|-|-|- ['__13']  # samples = 14 # branch_id = 19
   155 |-|-|-|-|-|-|-|- $pr <= 4.5 samples = 22
   156 |-|-|-|-|-|-|-|-|- ['__15']  # samples = 11 # branch_id = 20
   157 |-|-|-|-|-|-|-|-|- ['__16']  # samples = 11 # branch_id = 21
   158 |-|-|-|-|-|-|- ['__12']  # samples = 15 # branch_id = 22
   159 |-|-|-|-|- $pcap <= 3.5 samples = 22
   160 |-|-|-|-|-|- ['__12']  # samples = 11 # branch_id = 23
   161 |-|-|-|-|-|- ['__14']  # samples = 11 # branch_id = 24
   162 |-|-|- $site <= 2.5 samples = 50
   163 |-|-|-|- ['__1']  # samples = 19 # branch_id = 25
   164 |-|-|-|- $ruse <= 2.5 samples = 31
   165 |-|-|-|-|- ['__17']  # samples = 14 # branch_id = 26
   166 |-|-|-|-|- ['__12']  # samples = 17 # branch_id = 27
   167 |- $rely <= 3.5 samples = 114
   168 |-|- $pcon <= 2.5 samples = 58
   169 |-|-|- $acap <= 3.5 samples = 33
   170 |-|-|-|- ['__1', '__2', '__3']  # samples = 17 # branch_id = 28
   171 |-|-|-|- ['__10']  # samples = 16 # branch_id = 29
   172 |-|-|- $pvol <= 2.5 samples = 25
   173 |-|-|-|- ['__1', '__2', '__8']  # samples = 13 # branch_id = 30
   174 |-|-|-|- ['__2']  # samples = 12 # branch_id = 31
   175 |-|- $pcap <= 3.5 samples = 56
   176 |-|-|- $plex <= 1.5 samples = 29
   177 |-|-|-|- ['__19']  # samples = 12 # branch_id = 32
   178 |-|-|-|- ['__18']  # samples = 17 # branch_id = 33
   179 |-|-|- $pcon <= 1.5 samples = 27
   180 |-|-|-|- ['__17']  # samples = 11 # branch_id = 34
   181 |-|-|-|- ['__17']  # samples = 16 # branch_id = 35

After disc:
 
44 rows after pruned 304
    45  $pcap <= 3.5 samples = 304
    46 |- $ltex <= 2.5 samples = 166
    47 |-|- $plex <= 1.5 samples = 104
    48 |-|-|- $site <= 2.5 samples = 36
    49 |-|-|-|- ['__2', '__3']  # samples = 18 # branch_id = 0
    50 |-|-|-|- ['__5']  # samples = 18 # branch_id = 1
    51 |-|-|- $flex <= 2.5 samples = 68
    52 |-|-|-|- $etat <= 2.5 samples = 31
    53 |-|-|-|-|- ['__6']  # samples = 17 # branch_id = 2
    54 |-|-|-|-|- ['__4']  # samples = 14 # branch_id = 3
    55 |-|-|-|- $prec <= 2.5 samples = 37
    56 |-|-|-|-|- ['__6']  # samples = 20 # branch_id = 4
    57 |-|-|-|-|- ['__2']  # samples = 17 # branch_id = 5
    58 |-|- $etat <= 3.5 samples = 62
    59 |-|-|- $aexp <= 2.5 samples = 35
    60 |-|-|-|- ['__5']  # samples = 21 # branch_id = 6
    61 |-|-|-|- ['__3']  # samples = 14 # branch_id = 7
    62 |-|-|- $kloc <= 232.5 samples = 27
    63 |-|-|-|- ['__3']  # samples = 14 # branch_id = 8
    64 |-|-|-|- ['__2']  # samples = 13 # branch_id = 9
    65 |- $acap <= 3.5 samples = 138
    66 |-|- $etat <= 1.5 samples = 64
    67 |-|-|- ['__3', '__9']  # samples = 11 # branch_id = 10
    68 |-|-|- $ruse <= 4.5 samples = 53
    69 |-|-|-|- $pvol <= 3.5 samples = 37
    70 |-|-|-|-|- $team <= 3.5 samples = 22
    71 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 11
    72 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 12
    73 |-|-|-|-|- ['__6']  # samples = 15 # branch_id = 13
    74 |-|-|-|- ['__5']  # samples = 16 # branch_id = 14
    75 |-|- $ltex <= 2.5 samples = 74
    76 |-|-|- $aexp <= 3.5 samples = 50
    77 |-|-|-|- $cplx <= 3.5 samples = 38
    78 |-|-|-|-|- ['__8']  # samples = 13 # branch_id = 15
    79 |-|-|-|-|- $rely <= 3.5 samples = 25
    80 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 16
    81 |-|-|-|-|-|- ['__8']  # samples = 14 # branch_id = 17
    82 |-|-|-|- ['__6']  # samples = 12 # branch_id = 18
    83 |-|-|- $rely <= 3.5 samples = 24
    84 |-|-|-|- ['__6']  # samples = 11 # branch_id = 19
    85 |-|-|-|- ['__8']  # samples = 13 # branch_id = 20
 
Tree is  almost halved with improvement in performance, but trees are still big enough for business users.

TEAK with everything:
 Infogain, Dtree prune, distance prune, discretization 

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              20              66               6               0    #
   2 Bef disc m               3              15               7               1    #
   3 Aft disc m              12              39               6               0    #
    4 T9:j/j_ m               0              32               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              19              16               9               0    #
   2 Bef disc q               3               0               8               3    #
   3 Aft disc q              13               9               9               4    #
    4 T9:j/j_ q               2              21               1              38    #
-------------------------------------------------------------------------------------
         1 T0 w              73             100              37              15    #
   2 Bef disc w              28              30              57               9    #
   3 Aft disc w             100              78             100              18    #
    4 T9:j/j_ w              21              96              21             100    #
-------------------------------------------------------------------------------------
            100         3954.71           42.89        46036.21             8.6    #
              0          117.21            2.71          381.83            0.01    #
 
Tree size:
106  kloc <= 3.5 samples = 407
   107 |- resl <= 7.5 samples = 234
   108 |-|- cplx <= 12.0 samples = 77
   109 |-|-|- ['__7']  # samples = 40 # branch_id = 0
   110 |-|-|- ['__8']  # samples = 37 # branch_id = 1
   111 |-|- flex <= 3.5 samples = 157
   112 |-|-|- ['__7']  # samples = 86 # branch_id = 2
   113 |-|-|- team <= 9.0 samples = 71
   114 |-|-|-|- ['__7']  # samples = 26 # branch_id = 3
   115 |-|-|-|- ['__6']  # samples = 45 # branch_id = 4
   116 |- prec <= 1.5 samples = 173
   117 |-|- site <= 3.5 samples = 45
   118 |-|-|- ['__3']  # samples = 15 # branch_id = 5
   119 |-|-|- ['__1']  # samples = 30 # branch_id = 6
   120 |-|- etat <= 4.5 samples = 128
   121 |-|-|- kloc <= 24.0 samples = 94
   122 |-|-|-|- ['__3']  # samples = 19 # branch_id = 7
   123 |-|-|-|- kloc <= 90.5 samples = 75
   124 |-|-|-|-|- ['__4']  # samples = 21 # branch_id = 8
   125 |-|-|-|-|- ruse <= 2.5 samples = 54
   126 |-|-|-|-|-|- ['__3']  # samples = 15 # branch_id = 9
   127 |-|-|-|-|-|- kloc <= 299.0 samples = 39
   128 |-|-|-|-|-|-|- ['__5']  # samples = 13 # branch_id = 10
   129 |-|-|-|-|-|-|- pvol <= 2.5 samples = 26
   130 |-|-|-|-|-|-|-|- ['__4']  # samples = 15 # branch_id = 11
   131 |-|-|-|-|-|-|-|- ['__2']  # samples = 11 # branch_id = 12
   132 |-|-|- ['__2']  # samples = 34 # branch_id = 13

Best result so far:
 
Cluster size: 44,66
Dtree pruned, Distance pruned, 50% infogained, discretized.
 
 Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              23              67               5               0    #
   2 Bef disc m               3              14               4               2    #
   3 Aft disc m               9              30               6               2    #
    4 T9:j/j_ m               0              33               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              21              16               8               1    #
   2 Bef disc q               3               0               5               3    #
   3 Aft disc q               9               6               8               6    #
    4 T9:j/j_ q               3              21               1              38    #
-------------------------------------------------------------------------------------
         1 T0 w              79             100              36              17    #
   2 Bef disc w              29              28              46               8    #
   3 Aft disc w             100              66             100              14    #
    4 T9:j/j_ w              24              98              19             100    #
-------------------------------------------------------------------------------------
            100         3500.22           42.38        50660.52             8.6    #
              0          117.21            2.51          381.83             0.0    #
 
Tree size:
58  flex <= 27.5 samples = 941
    59 |- docu <= 6.5 samples = 483
    60 |-|- ['__7']  # samples = 31 # branch_id = 0
    61 |-|- team <= 27.0 samples = 452
    62 |-|-|- ['__2']  # samples = 440 # branch_id = 1
    63 |-|-|- ['__6']  # samples = 12 # branch_id = 2
    64 |- ruse <= 35.5 samples = 458
    65 |-|- docu <= 38.5 samples = 50
    66 |-|-|- team <= 21.5 samples = 30
    67 |-|-|-|- ['__4']  # samples = 14 # branch_id = 3
    68 |-|-|-|- ['__1']  # samples = 16 # branch_id = 4
    69 |-|-|- ['__2']  # samples = 20 # branch_id = 5
    70 |-|- ['__1']  # samples = 408 # branch_id = 6
    71 500
   

Results 10 repeats, loo, scrott knott,

### repeats = 10 
coc81
 0      coc2000 *                              33 35
 0      coconut *                              37 41
 1   coco2000s3 **                             41 43
 2    coconuts3 **                             44 31
 3   coco2000s5 **                             52 49
 3    coconuts5 **                             52 53
 4         cart ***                            78 135
 4       loc(3) ****                           80 237
 5        k=3nn ****                           80 352
 5        k=5nn ****                           85 368
 5        k=1nn ****                           92 164
 6       (c=1)n ****                           95 594
 6 (c=1)n-noloc ****                           97 657
 6       (c=2)n ******                         125 588
 6 (c=2)n-noloc *******                        140 692
 7        guess ****************************** 616 1529
726.86 45 45
xyz14
..........=

 0      coc2000 **                             42 34
 1        k=3nn **                             44 77
 1      coconut **                             49 30
 2       loc(3) **                             49 96
 2        k=5nn **                             51 86
 2       (c=2)n **                             52 48
 2 (c=1)n-noloc **                             53 51
 2 (c=2)n-noloc **                             56 46
 2       (c=1)n **                             58 42
 2         cart **                             58 48
 2        k=1nn **                             59 25
 3        guess ***                            66 74
107.2 25 70
newCIIdata
..........=

 0      coc2000 *                              39 97
 0        k=1nn **                             49 73
 0      coconut **                             54 50
 0       loc(3) **                             57 43
 1         cart ***                            72 69
 1        k=3nn ****                           90 97
 1        k=5nn ****                           90 161
 2 (c=2)n-noloc *************                  279 717
 3       (c=2)n **************                 293 989
 4       (c=1)n ******************             374 1632
 4 (c=1)n-noloc *******************            391 1044
 4        guess ******************************* 631 1248
127.99 23 93
nasa93
...   
 0    coconuts3 *                              34 45
 0   coco2000s3 *                              35 45
 0      coconut *                              36 37
 0      coc2000 *                              38 38
 0    coconuts5 *                              39 40
 1         cart **                             41 65
 1   coco2000s5 **                             51 30
 2        k=1nn **                             56 75
 3        k=5nn ***                            63 62
 3        k=3nn ***                            64 70
 3       loc(3) ***                            75 101
 4       (c=1)n ****                           91 549
 4 (c=1)n-noloc ****                           100 0
 4 (c=2)n-noloc ****                           100 0
 5       (c=2)n *******                        144 574
 5        guess *******                        149 785
1250.34 48 141

real 55m35.975s
user 53m35.948s
sys 0m3.708s
"

Wednesday, July 9, 2014

New results dtree learning



Results

The samples generated are equal to original samples we started with.
sample work flow:

1. 500 samples
2. cluster into C1.C2....Cn
3. build Dtrees. form branches B1,B2,...Bn. Find best and worst between clusters in B1..Bn.
4. CS1,CS2,...CSn between worse clusters to become better clusters.
5. Regenerate samples using CS1,CS2...CSn. =Total 500. Maintaining the ratio.
6. Generate result tables.

Techniques:


1. Distance pruning: Prune clusters that are less than 0.3 (normalized distance) each other into one big cluster.

2. Dtree pruning: Prune leaves of decision trees -remove leaves with multiple majority classes - remove subtrees with same majority class

3. Discretize: Discretize continous data into discrete values and generate trees. Reduces trees a lot! Affects performance.

4. Infogain: Prune columns with infogain.

Distance pruning works:

Techniques: Distance pruning and  DTree pruning

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              31              67               6               0    #
  2 Bef prune m              11              20               8               2    #
  3 Aft prune m               8              20              11               2    #
    4 T9:j/j_ m               0              31               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              26              13               9               2    #
  2 Bef prune q              10               0              10               3    #
  3 Aft prune q               8               0              14               4    #
    4 T9:j/j_ q               3              19               2              38    #
-------------------------------------------------------------------------------------
         1 T0 w             100             100              42              16    #
  2 Bef prune w              67              38              63               9    #
  3 Aft prune w              58              39             100               9    #
    4 T9:j/j_ w              31              98              25             100    #
-------------------------------------------------------------------------------------
            100         2761.71            42.2        39498.09             8.6    #
              0          117.21            3.48          381.83             0.0    #
 
 
but tree is big:

MAIN_TREE:

173     $rely <= 3.5 samples = 500
   174    |- $ltex <= 2.5 samples = 253
   175    |-|- $cplx <= 3.5 samples = 167
   176    |-|-|- $site <= 2.5 samples = 54
   177    |-|-|-|- $kloc <= 249.0 samples = 24
   178    |-|-|-|-|- ['__2']  # samples = 13 # branch_id = 0
   179    |-|-|-|-|- ['__4', '__9']  # samples = 11 # branch_id = 1
   180    |-|-|-|- $pr <= 2.5 samples = 30
   181    |-|-|-|-|- ['__7']  # samples = 11 # branch_id = 2
   182    |-|-|-|-|- ['__5']  # samples = 19 # branch_id = 3
   183    |-|-|- $ruse <= 3.5 samples = 113
   184    |-|-|-|- $pcon <= 2.5 samples = 52
   185    |-|-|-|-|- $pcap <= 3.5 samples = 25
   186    |-|-|-|-|-|- ['__6']  # samples = 13 # branch_id = 4
   187    |-|-|-|-|-|- ['__13']  # samples = 12 # branch_id = 5
   188    |-|-|-|-|- $flex <= 2.5 samples = 27
   189    |-|-|-|-|-|- ['__7']  # samples = 13 # branch_id = 6
   190    |-|-|-|-|-|- ['__8', '__13']  # samples = 14 # branch_id = 7
   191    |-|-|-|- $pcap <= 3.5 samples = 61
   192    |-|-|-|-|- $kloc <= 152.0 samples = 29
   193    |-|-|-|-|-|- ['__15']  # samples = 11 # branch_id = 8
   194    |-|-|-|-|-|- ['__13']  # samples = 18 # branch_id = 9
   195    |-|-|-|-|- $aexp <= 2.5 samples = 32
   196    |-|-|-|-|-|- ['__13', '__15']  # samples = 19 # branch_id = 10
   197    |-|-|-|-|-|- ['__12']  # samples = 13 # branch_id = 11
   198    |-|- $pvol <= 2.5 samples = 86
   199    |-|-|- $resl <= 2.5 samples = 28
   200    |-|-|-|- ['__4']  # samples = 15 # branch_id = 12
   201    |-|-|-|- ['__2', '__7']  # samples = 13 # branch_id = 13
   202    |-|-|- $acap <= 3.5 samples = 58
   203    |-|-|-|- $resl <= 2.5 samples = 23
   204    |-|-|-|-|- ['__2']  # samples = 12 # branch_id = 14
   205    |-|-|-|-|- ['__5']  # samples = 11 # branch_id = 15
   206    |-|-|-|- $team <= 2.5 samples = 35
   207    |-|-|-|-|- ['__1', '__11']  # samples = 15 # branch_id = 16
   208    |-|-|-|-|- ['__7']  # samples = 20 # branch_id = 17
   209    |- $pr <= 4.5 samples = 247
   210    |-|- $plex <= 2.5 samples = 193
   211    |-|-|- $flex <= 2.5 samples = 129
   212    |-|-|-|- $pvol <= 2.5 samples = 63
   213    |-|-|-|-|- $kloc <= 164.5 samples = 22
   214    |-|-|-|-|-|- ['__13']  # samples = 11 # branch_id = 18
   215    |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 19
   216    |-|-|-|-|- $pr <= 3.5 samples = 41
   217    |-|-|-|-|-|- $pcap <= 3.5 samples = 29
   218    |-|-|-|-|-|-|- ['__11']  # samples = 13 # branch_id = 20
   219    |-|-|-|-|-|-|- ['__14']  # samples = 16 # branch_id = 21
   220    |-|-|-|-|-|- ['__10']  # samples = 12 # branch_id = 22
   221    |-|-|-|- $site <= 1.5 samples = 66
   222    |-|-|-|-|- ['__15']  # samples = 17 # branch_id = 23
   223    |-|-|-|-|- $ruse <= 3.5 samples = 49
   224    |-|-|-|-|-|- ['__15']  # samples = 20 # branch_id = 24
   225    |-|-|-|-|-|- $pvol <= 3.5 samples = 29
   226    |-|-|-|-|-|-|- ['__12']  # samples = 16 # branch_id = 25
   227    |-|-|-|-|-|-|- ['__6', '__16']  # samples = 13 # branch_id = 26
   228    |-|-|- $kloc <= 295.5 samples = 64
   229    |-|-|-|- $team <= 1.5 samples = 47
   230    |-|-|-|-|- ['__13', '__15']  # samples = 17 # branch_id = 27
   231    |-|-|-|-|- $etat <= 2.5 samples = 30
   232    |-|-|-|-|-|- ['__13']  # samples = 13 # branch_id = 28
   233    |-|-|-|-|-|- ['__7']  # samples = 17 # branch_id = 29
   234    |-|-|-|- ['__7']  # samples = 17 # branch_id = 30
   235    |-|- $cplx <= 3.5 samples = 54
   236    |-|-|- ['__9']  # samples = 21 # branch_id = 31
   237    |-|-|- $team <= 2.5 samples = 33
   238    |-|-|-|- ['__15']  # samples = 15 # branch_id = 32
   239    |-|-|-|- ['__12', '__13']  # samples = 18 # branch_id = 33



To dist prune tree:
  94     $pr <= 1.5 samples = 500
    95    |- $prec <= 1.5 samples = 98
    96    |-|- $pcon <= 3.5 samples = 22
    97    |-|-|- ['__2']  # samples = 11 # branch_id = 0
    98    |-|- $flex <= 1.5 samples = 76
    99    |-|-|- $kloc <= 256.0 samples = 24
   100    |-|-|-|- ['__9']  # samples = 11 # branch_id = 1
   101    |-|-|-|- ['__5']  # samples = 13 # branch_id = 2
   102    |-|-|- $team <= 1.5 samples = 52
   103    |-|-|-|- ['__11']  # samples = 13 # branch_id = 3
   104    |-|-|-|- $pcon <= 2.5 samples = 39
   105    |-|-|-|-|- ['__13']  # samples = 20 # branch_id = 4
   106    |-|-|-|-|- ['__9']  # samples = 19 # branch_id = 5
   107    |- $prec <= 1.5 samples = 402
   108    |-|- $etat <= 1.5 samples = 103
   109    |-|-|- ['__13']  # samples = 13 # branch_id = 6
   110    |-|-|- $resl <= 3.5 samples = 90
   111    |-|-|-|- $ltex <= 2.5 samples = 66
   112    |-|-|-|-|- $site <= 3.5 samples = 38
   113    |-|-|-|-|-|- $rely <= 3.5 samples = 25
   114    |-|-|-|-|-|-|- ['__6']  # samples = 12 # branch_id = 7
   115    |-|-|-|-|-|- ['__7']  # samples = 13 # branch_id = 8
   116    |-|-|-|-|- $acap <= 3.5 samples = 28
   117    |-|-|-|-|-|- ['__13']  # samples = 12 # branch_id = 9
   118    |-|-|-|-|-|- ['__4']  # samples = 16 # branch_id = 10
   119    |-|-|-|- $site <= 2.5 samples = 24
   120    |-|-|-|-|- ['__9']  # samples = 12 # branch_id = 11
   121    |-|- $acap <= 3.5 samples = 299
   122    |-|-|- $rely <= 3.5 samples = 157
   123    |-|-|-|- $ruse <= 3.5 samples = 82
   124    |-|-|-|-|- $prec <= 2.5 samples = 47
   125    |-|-|-|-|-|- $pcap <= 3.5 samples = 31
   126    |-|-|-|-|-|-|- ['__7']  # samples = 16 # branch_id = 12
   127    |-|-|-|-|-|-|- ['__1']  # samples = 15 # branch_id = 13
   128    |-|-|-|-|- $docu <= 2.5 samples = 35
   129    |-|-|-|-|-|- ['__13']  # samples = 21 # branch_id = 14
   130    |-|-|-|-|-|- ['__1']  # samples = 14 # branch_id = 15
   131    |-|-|-|- $team <= 2.5 samples = 75
   132    |-|-|-|-|- $docu <= 2.5 samples = 44
   133    |-|-|-|-|-|- ['__1']  # samples = 27 # branch_id = 16
   134    |-|-|-|-|-|- ['__1']  # samples = 17 # branch_id = 17
   135    |-|-|-|-|- ['__1']  # samples = 31 # branch_id = 18
   136    |-|-|- $flex <= 2.5 samples = 142
   137    |-|-|-|- $pcon <= 1.5 samples = 70
   138    |-|-|-|-|- $prec <= 2.5 samples = 53
   139    |-|-|-|-|-|- $kloc <= 204.0 samples = 35
   140    |-|-|-|-|-|-|- ['__7']  # samples = 15 # branch_id = 19
   141    |-|-|-|-|-|-|- ['__1']  # samples = 20 # branch_id = 20
   142    |-|-|-|- $site <= 2.5 samples = 72
   143    |-|-|-|-|- $pcon <= 2.5 samples = 33
   144    |-|-|-|-|-|- ['__6']  # samples = 15 # branch_id = 21
   145    |-|-|-|-|-|- ['__7']  # samples = 18 # branch_id = 22
   146    |-|-|-|-|- $pvol <= 3.5 samples = 39
   147    |-|-|-|-|-|- $pcap <= 3.5 samples = 25
   148    |-|-|-|-|-|-|- ['__1']  # samples = 14 # branch_id = 23
   149    |-|-|-|-|-|-|- ['__11']  # samples = 11 # branch_id = 24
   150    |-|-|-|-|-|- ['__3']  # samples = 14 # branch_id = 25


 

  Infogain Discretized Dtree and Dist pruned doesnt work.

Tree:

 flex <= 27.5 samples = 500
|- docu <= 6.5 samples = 261
|-|- ['__1']  # samples = 31 # branch_id = 0
|-|- resl <= 33.5 samples = 230
|-|-|- ['__2']  # samples = 200 # branch_id = 1
|-|-|- ['__2']  # samples = 30 # branch_id = 2
|- pr <= 18.0 samples = 239
|-|- ['__6']  # samples = 16 # branch_id = 3
|-|- ruse <= 32.0 samples = 223
|-|-|- ['__4']  # samples = 203 # branch_id = 4
 
 
Performance:
 
     Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              31              68              16               0    #
   2 Bef disc m              18              43              17               2    #
   3 Aft disc m              29              61              39               8    #
    4 T9:j/j_ m               0              33               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              29              16              24               0    #
   2 Bef disc q              15               8              26               3    #
   3 Aft disc q               3               0              10               9    #
    4 T9:j/j_ q               3              22               4              38    #
-------------------------------------------------------------------------------------
         1 T0 w             100             100              97              16    #
   2 Bef disc w              63              65             100              15    #
   3 Aft disc w              74              77              62              13    #
    4 T9:j/j_ w              32              99              58             100    #
-------------------------------------------------------------------------------------
            100         2702.14           41.94        17094.69             8.6    #
              0          117.21            2.44          381.83             0.0    # 


Sunday, July 6, 2014

Results with discretization, smart cluster by pruning.

  Pruning columns and Discretization of values --> disc


   Techniques -effort -months -defects -risks #
         1 T0 m              27              65              10               0    #
   2 Bef disc m               7              15               9               1    #
   3 Aft disc m              15              32               5               2    #
    4 T9:j/j_ m               0              31               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              26              15              17               0    #
   2 Bef disc q               7               0              12               2    #
   3 Aft disc q              15               5               8               3    #
    4 T9:j/j_ q               3              20               3              38    #
-------------------------------------------------------------------------------------
         1 T0 w              96             100              75              16    #
   2 Bef disc w              47              31              91               5    #
   3 Aft disc w             100              58             100              12    #
    4 T9:j/j_ w              28              96              41             100    #
-------------------------------------------------------------------------------------
            100          3096.1            43.2        24082.25             8.6    #
              0          117.21            2.99          381.83             0.0    #