Sunday, July 13, 2014

TEAK on dtree learning results


TEAK works:

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              29              66              10               0    #
   2 Bef disc m               9              17               9               2    #
   3 Aft disc m               7              17              11               2    #
    4 T9:j/j_ m               0              32               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              28              16              16               2    #
   2 Bef disc q               9               0              12               3    #
   3 Aft disc q               7               0              14               3    #
    4 T9:j/j_ q               3              20               3              38    #
-------------------------------------------------------------------------------------
         1 T0 w             100             100              71              17    #
   2 Bef disc w              56              33              89               8    #
   3 Aft disc w              54              34             100               8    #
    4 T9:j/j_ w              30              97              38             100    #
-------------------------------------------------------------------------------------
            100         2856.84            42.5        25882.79             8.6    #
              0          117.21            3.05          381.83             0.0    #
 
But trees are still big:
 
Bef disc:
 
111  $docu <= 3.5 samples = 500
   112 |- $rely <= 3.5 samples = 386
   113 |-|- $pcap <= 3.5 samples = 195
   114 |-|-|- $flex <= 1.5 samples = 102
   115 |-|-|-|- $aexp <= 2.5 samples = 31
   116 |-|-|-|-|- ['__3', '__5']  # samples = 13 # branch_id = 0
   117 |-|-|-|-|- ['__2']  # samples = 18 # branch_id = 1
   118 |-|-|-|- $pcon <= 1.5 samples = 71
   119 |-|-|-|-|- ['__11']  # samples = 15 # branch_id = 2
   120 |-|-|-|-|- $team <= 3.5 samples = 56
   121 |-|-|-|-|-|- $kloc <= 161.0 samples = 42
   122 |-|-|-|-|-|-|- ['__7']  # samples = 16 # branch_id = 3
   123 |-|-|-|-|-|-|- $aexp <= 1.5 samples = 26
   124 |-|-|-|-|-|-|-|- ['__3']  # samples = 12 # branch_id = 4
   125 |-|-|-|-|-|-|-|- ['__3']  # samples = 14 # branch_id = 5
   126 |-|-|-|-|-|- ['__2']  # samples = 14 # branch_id = 6
   127 |-|-|- $ruse <= 3.5 samples = 93
   128 |-|-|-|- $team <= 3.5 samples = 40
   129 |-|-|-|-|- $cplx <= 3.5 samples = 22
   130 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 7
   131 |-|-|-|-|-|- ['__9']  # samples = 11 # branch_id = 8
   132 |-|-|-|-|- ['__3', '__8', '__18']  # samples = 18 # branch_id = 9
   133 |-|-|-|- $resl <= 3.5 samples = 53
   134 |-|-|-|-|- $resl <= 1.5 samples = 39
   135 |-|-|-|-|-|- ['__12', '__13']  # samples = 16 # branch_id = 10
   136 |-|-|-|-|-|- $prec <= 2.5 samples = 23
   137 |-|-|-|-|-|-|- ['__11']  # samples = 11 # branch_id = 11
   138 |-|-|-|-|-|-|- ['__14']  # samples = 12 # branch_id = 12
   139 |-|-|-|-|- ['__5', '__10']  # samples = 14 # branch_id = 13
   140 |-|- $team <= 3.5 samples = 191
   141 |-|-|- $flex <= 1.5 samples = 141
   142 |-|-|-|- $kloc <= 262.0 samples = 30
   143 |-|-|-|-|- ['__12', '__14']  # samples = 16 # branch_id = 14
   144 |-|-|-|-|- ['__1', '__5', '__10', '__12', '__13']  # samples = 14 # branch_id = 15
   145 |-|-|-|- $site <= 4.5 samples = 111
   146 |-|-|-|-|- $flex <= 2.5 samples = 89
   147 |-|-|-|-|-|- $ruse <= 3.5 samples = 27
   148 |-|-|-|-|-|-|- ['__4', '__16']  # samples = 12 # branch_id = 16
   149 |-|-|-|-|-|-|- ['__12']  # samples = 15 # branch_id = 17
   150 |-|-|-|-|-|- $docu <= 2.5 samples = 62
   151 |-|-|-|-|-|-|- $pr <= 3.5 samples = 47
   152 |-|-|-|-|-|-|-|- $flex <= 3.5 samples = 25
   153 |-|-|-|-|-|-|-|-|- ['__13']  # samples = 11 # branch_id = 18
   154 |-|-|-|-|-|-|-|-|- ['__13']  # samples = 14 # branch_id = 19
   155 |-|-|-|-|-|-|-|- $pr <= 4.5 samples = 22
   156 |-|-|-|-|-|-|-|-|- ['__15']  # samples = 11 # branch_id = 20
   157 |-|-|-|-|-|-|-|-|- ['__16']  # samples = 11 # branch_id = 21
   158 |-|-|-|-|-|-|- ['__12']  # samples = 15 # branch_id = 22
   159 |-|-|-|-|- $pcap <= 3.5 samples = 22
   160 |-|-|-|-|-|- ['__12']  # samples = 11 # branch_id = 23
   161 |-|-|-|-|-|- ['__14']  # samples = 11 # branch_id = 24
   162 |-|-|- $site <= 2.5 samples = 50
   163 |-|-|-|- ['__1']  # samples = 19 # branch_id = 25
   164 |-|-|-|- $ruse <= 2.5 samples = 31
   165 |-|-|-|-|- ['__17']  # samples = 14 # branch_id = 26
   166 |-|-|-|-|- ['__12']  # samples = 17 # branch_id = 27
   167 |- $rely <= 3.5 samples = 114
   168 |-|- $pcon <= 2.5 samples = 58
   169 |-|-|- $acap <= 3.5 samples = 33
   170 |-|-|-|- ['__1', '__2', '__3']  # samples = 17 # branch_id = 28
   171 |-|-|-|- ['__10']  # samples = 16 # branch_id = 29
   172 |-|-|- $pvol <= 2.5 samples = 25
   173 |-|-|-|- ['__1', '__2', '__8']  # samples = 13 # branch_id = 30
   174 |-|-|-|- ['__2']  # samples = 12 # branch_id = 31
   175 |-|- $pcap <= 3.5 samples = 56
   176 |-|-|- $plex <= 1.5 samples = 29
   177 |-|-|-|- ['__19']  # samples = 12 # branch_id = 32
   178 |-|-|-|- ['__18']  # samples = 17 # branch_id = 33
   179 |-|-|- $pcon <= 1.5 samples = 27
   180 |-|-|-|- ['__17']  # samples = 11 # branch_id = 34
   181 |-|-|-|- ['__17']  # samples = 16 # branch_id = 35

After disc:
 
44 rows after pruned 304
    45  $pcap <= 3.5 samples = 304
    46 |- $ltex <= 2.5 samples = 166
    47 |-|- $plex <= 1.5 samples = 104
    48 |-|-|- $site <= 2.5 samples = 36
    49 |-|-|-|- ['__2', '__3']  # samples = 18 # branch_id = 0
    50 |-|-|-|- ['__5']  # samples = 18 # branch_id = 1
    51 |-|-|- $flex <= 2.5 samples = 68
    52 |-|-|-|- $etat <= 2.5 samples = 31
    53 |-|-|-|-|- ['__6']  # samples = 17 # branch_id = 2
    54 |-|-|-|-|- ['__4']  # samples = 14 # branch_id = 3
    55 |-|-|-|- $prec <= 2.5 samples = 37
    56 |-|-|-|-|- ['__6']  # samples = 20 # branch_id = 4
    57 |-|-|-|-|- ['__2']  # samples = 17 # branch_id = 5
    58 |-|- $etat <= 3.5 samples = 62
    59 |-|-|- $aexp <= 2.5 samples = 35
    60 |-|-|-|- ['__5']  # samples = 21 # branch_id = 6
    61 |-|-|-|- ['__3']  # samples = 14 # branch_id = 7
    62 |-|-|- $kloc <= 232.5 samples = 27
    63 |-|-|-|- ['__3']  # samples = 14 # branch_id = 8
    64 |-|-|-|- ['__2']  # samples = 13 # branch_id = 9
    65 |- $acap <= 3.5 samples = 138
    66 |-|- $etat <= 1.5 samples = 64
    67 |-|-|- ['__3', '__9']  # samples = 11 # branch_id = 10
    68 |-|-|- $ruse <= 4.5 samples = 53
    69 |-|-|-|- $pvol <= 3.5 samples = 37
    70 |-|-|-|-|- $team <= 3.5 samples = 22
    71 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 11
    72 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 12
    73 |-|-|-|-|- ['__6']  # samples = 15 # branch_id = 13
    74 |-|-|-|- ['__5']  # samples = 16 # branch_id = 14
    75 |-|- $ltex <= 2.5 samples = 74
    76 |-|-|- $aexp <= 3.5 samples = 50
    77 |-|-|-|- $cplx <= 3.5 samples = 38
    78 |-|-|-|-|- ['__8']  # samples = 13 # branch_id = 15
    79 |-|-|-|-|- $rely <= 3.5 samples = 25
    80 |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 16
    81 |-|-|-|-|-|- ['__8']  # samples = 14 # branch_id = 17
    82 |-|-|-|- ['__6']  # samples = 12 # branch_id = 18
    83 |-|-|- $rely <= 3.5 samples = 24
    84 |-|-|-|- ['__6']  # samples = 11 # branch_id = 19
    85 |-|-|-|- ['__8']  # samples = 13 # branch_id = 20
 
Tree is  almost halved with improvement in performance, but trees are still big enough for business users.

TEAK with everything:
 Infogain, Dtree prune, distance prune, discretization 

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              20              66               6               0    #
   2 Bef disc m               3              15               7               1    #
   3 Aft disc m              12              39               6               0    #
    4 T9:j/j_ m               0              32               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              19              16               9               0    #
   2 Bef disc q               3               0               8               3    #
   3 Aft disc q              13               9               9               4    #
    4 T9:j/j_ q               2              21               1              38    #
-------------------------------------------------------------------------------------
         1 T0 w              73             100              37              15    #
   2 Bef disc w              28              30              57               9    #
   3 Aft disc w             100              78             100              18    #
    4 T9:j/j_ w              21              96              21             100    #
-------------------------------------------------------------------------------------
            100         3954.71           42.89        46036.21             8.6    #
              0          117.21            2.71          381.83            0.01    #
 
Tree size:
106  kloc <= 3.5 samples = 407
   107 |- resl <= 7.5 samples = 234
   108 |-|- cplx <= 12.0 samples = 77
   109 |-|-|- ['__7']  # samples = 40 # branch_id = 0
   110 |-|-|- ['__8']  # samples = 37 # branch_id = 1
   111 |-|- flex <= 3.5 samples = 157
   112 |-|-|- ['__7']  # samples = 86 # branch_id = 2
   113 |-|-|- team <= 9.0 samples = 71
   114 |-|-|-|- ['__7']  # samples = 26 # branch_id = 3
   115 |-|-|-|- ['__6']  # samples = 45 # branch_id = 4
   116 |- prec <= 1.5 samples = 173
   117 |-|- site <= 3.5 samples = 45
   118 |-|-|- ['__3']  # samples = 15 # branch_id = 5
   119 |-|-|- ['__1']  # samples = 30 # branch_id = 6
   120 |-|- etat <= 4.5 samples = 128
   121 |-|-|- kloc <= 24.0 samples = 94
   122 |-|-|-|- ['__3']  # samples = 19 # branch_id = 7
   123 |-|-|-|- kloc <= 90.5 samples = 75
   124 |-|-|-|-|- ['__4']  # samples = 21 # branch_id = 8
   125 |-|-|-|-|- ruse <= 2.5 samples = 54
   126 |-|-|-|-|-|- ['__3']  # samples = 15 # branch_id = 9
   127 |-|-|-|-|-|- kloc <= 299.0 samples = 39
   128 |-|-|-|-|-|-|- ['__5']  # samples = 13 # branch_id = 10
   129 |-|-|-|-|-|-|- pvol <= 2.5 samples = 26
   130 |-|-|-|-|-|-|-|- ['__4']  # samples = 15 # branch_id = 11
   131 |-|-|-|-|-|-|-|- ['__2']  # samples = 11 # branch_id = 12
   132 |-|-|- ['__2']  # samples = 34 # branch_id = 13

Best result so far:
 
Cluster size: 44,66
Dtree pruned, Distance pruned, 50% infogained, discretized.
 
 Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              23              67               5               0    #
   2 Bef disc m               3              14               4               2    #
   3 Aft disc m               9              30               6               2    #
    4 T9:j/j_ m               0              33               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              21              16               8               1    #
   2 Bef disc q               3               0               5               3    #
   3 Aft disc q               9               6               8               6    #
    4 T9:j/j_ q               3              21               1              38    #
-------------------------------------------------------------------------------------
         1 T0 w              79             100              36              17    #
   2 Bef disc w              29              28              46               8    #
   3 Aft disc w             100              66             100              14    #
    4 T9:j/j_ w              24              98              19             100    #
-------------------------------------------------------------------------------------
            100         3500.22           42.38        50660.52             8.6    #
              0          117.21            2.51          381.83             0.0    #
 
Tree size:
58  flex <= 27.5 samples = 941
    59 |- docu <= 6.5 samples = 483
    60 |-|- ['__7']  # samples = 31 # branch_id = 0
    61 |-|- team <= 27.0 samples = 452
    62 |-|-|- ['__2']  # samples = 440 # branch_id = 1
    63 |-|-|- ['__6']  # samples = 12 # branch_id = 2
    64 |- ruse <= 35.5 samples = 458
    65 |-|- docu <= 38.5 samples = 50
    66 |-|-|- team <= 21.5 samples = 30
    67 |-|-|-|- ['__4']  # samples = 14 # branch_id = 3
    68 |-|-|-|- ['__1']  # samples = 16 # branch_id = 4
    69 |-|-|- ['__2']  # samples = 20 # branch_id = 5
    70 |-|- ['__1']  # samples = 408 # branch_id = 6
    71 500
   

No comments:

Post a Comment