Wednesday, July 9, 2014

New results dtree learning



Results

The samples generated are equal to original samples we started with.
sample work flow:

1. 500 samples
2. cluster into C1.C2....Cn
3. build Dtrees. form branches B1,B2,...Bn. Find best and worst between clusters in B1..Bn.
4. CS1,CS2,...CSn between worse clusters to become better clusters.
5. Regenerate samples using CS1,CS2...CSn. =Total 500. Maintaining the ratio.
6. Generate result tables.

Techniques:


1. Distance pruning: Prune clusters that are less than 0.3 (normalized distance) each other into one big cluster.

2. Dtree pruning: Prune leaves of decision trees -remove leaves with multiple majority classes - remove subtrees with same majority class

3. Discretize: Discretize continous data into discrete values and generate trees. Reduces trees a lot! Affects performance.

4. Infogain: Prune columns with infogain.

Distance pruning works:

Techniques: Distance pruning and  DTree pruning

Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              31              67               6               0    #
  2 Bef prune m              11              20               8               2    #
  3 Aft prune m               8              20              11               2    #
    4 T9:j/j_ m               0              31               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              26              13               9               2    #
  2 Bef prune q              10               0              10               3    #
  3 Aft prune q               8               0              14               4    #
    4 T9:j/j_ q               3              19               2              38    #
-------------------------------------------------------------------------------------
         1 T0 w             100             100              42              16    #
  2 Bef prune w              67              38              63               9    #
  3 Aft prune w              58              39             100               9    #
    4 T9:j/j_ w              31              98              25             100    #
-------------------------------------------------------------------------------------
            100         2761.71            42.2        39498.09             8.6    #
              0          117.21            3.48          381.83             0.0    #
 
 
but tree is big:

MAIN_TREE:

173     $rely <= 3.5 samples = 500
   174    |- $ltex <= 2.5 samples = 253
   175    |-|- $cplx <= 3.5 samples = 167
   176    |-|-|- $site <= 2.5 samples = 54
   177    |-|-|-|- $kloc <= 249.0 samples = 24
   178    |-|-|-|-|- ['__2']  # samples = 13 # branch_id = 0
   179    |-|-|-|-|- ['__4', '__9']  # samples = 11 # branch_id = 1
   180    |-|-|-|- $pr <= 2.5 samples = 30
   181    |-|-|-|-|- ['__7']  # samples = 11 # branch_id = 2
   182    |-|-|-|-|- ['__5']  # samples = 19 # branch_id = 3
   183    |-|-|- $ruse <= 3.5 samples = 113
   184    |-|-|-|- $pcon <= 2.5 samples = 52
   185    |-|-|-|-|- $pcap <= 3.5 samples = 25
   186    |-|-|-|-|-|- ['__6']  # samples = 13 # branch_id = 4
   187    |-|-|-|-|-|- ['__13']  # samples = 12 # branch_id = 5
   188    |-|-|-|-|- $flex <= 2.5 samples = 27
   189    |-|-|-|-|-|- ['__7']  # samples = 13 # branch_id = 6
   190    |-|-|-|-|-|- ['__8', '__13']  # samples = 14 # branch_id = 7
   191    |-|-|-|- $pcap <= 3.5 samples = 61
   192    |-|-|-|-|- $kloc <= 152.0 samples = 29
   193    |-|-|-|-|-|- ['__15']  # samples = 11 # branch_id = 8
   194    |-|-|-|-|-|- ['__13']  # samples = 18 # branch_id = 9
   195    |-|-|-|-|- $aexp <= 2.5 samples = 32
   196    |-|-|-|-|-|- ['__13', '__15']  # samples = 19 # branch_id = 10
   197    |-|-|-|-|-|- ['__12']  # samples = 13 # branch_id = 11
   198    |-|- $pvol <= 2.5 samples = 86
   199    |-|-|- $resl <= 2.5 samples = 28
   200    |-|-|-|- ['__4']  # samples = 15 # branch_id = 12
   201    |-|-|-|- ['__2', '__7']  # samples = 13 # branch_id = 13
   202    |-|-|- $acap <= 3.5 samples = 58
   203    |-|-|-|- $resl <= 2.5 samples = 23
   204    |-|-|-|-|- ['__2']  # samples = 12 # branch_id = 14
   205    |-|-|-|-|- ['__5']  # samples = 11 # branch_id = 15
   206    |-|-|-|- $team <= 2.5 samples = 35
   207    |-|-|-|-|- ['__1', '__11']  # samples = 15 # branch_id = 16
   208    |-|-|-|-|- ['__7']  # samples = 20 # branch_id = 17
   209    |- $pr <= 4.5 samples = 247
   210    |-|- $plex <= 2.5 samples = 193
   211    |-|-|- $flex <= 2.5 samples = 129
   212    |-|-|-|- $pvol <= 2.5 samples = 63
   213    |-|-|-|-|- $kloc <= 164.5 samples = 22
   214    |-|-|-|-|-|- ['__13']  # samples = 11 # branch_id = 18
   215    |-|-|-|-|-|- ['__7']  # samples = 11 # branch_id = 19
   216    |-|-|-|-|- $pr <= 3.5 samples = 41
   217    |-|-|-|-|-|- $pcap <= 3.5 samples = 29
   218    |-|-|-|-|-|-|- ['__11']  # samples = 13 # branch_id = 20
   219    |-|-|-|-|-|-|- ['__14']  # samples = 16 # branch_id = 21
   220    |-|-|-|-|-|- ['__10']  # samples = 12 # branch_id = 22
   221    |-|-|-|- $site <= 1.5 samples = 66
   222    |-|-|-|-|- ['__15']  # samples = 17 # branch_id = 23
   223    |-|-|-|-|- $ruse <= 3.5 samples = 49
   224    |-|-|-|-|-|- ['__15']  # samples = 20 # branch_id = 24
   225    |-|-|-|-|-|- $pvol <= 3.5 samples = 29
   226    |-|-|-|-|-|-|- ['__12']  # samples = 16 # branch_id = 25
   227    |-|-|-|-|-|-|- ['__6', '__16']  # samples = 13 # branch_id = 26
   228    |-|-|- $kloc <= 295.5 samples = 64
   229    |-|-|-|- $team <= 1.5 samples = 47
   230    |-|-|-|-|- ['__13', '__15']  # samples = 17 # branch_id = 27
   231    |-|-|-|-|- $etat <= 2.5 samples = 30
   232    |-|-|-|-|-|- ['__13']  # samples = 13 # branch_id = 28
   233    |-|-|-|-|-|- ['__7']  # samples = 17 # branch_id = 29
   234    |-|-|-|- ['__7']  # samples = 17 # branch_id = 30
   235    |-|- $cplx <= 3.5 samples = 54
   236    |-|-|- ['__9']  # samples = 21 # branch_id = 31
   237    |-|-|- $team <= 2.5 samples = 33
   238    |-|-|-|- ['__15']  # samples = 15 # branch_id = 32
   239    |-|-|-|- ['__12', '__13']  # samples = 18 # branch_id = 33



To dist prune tree:
  94     $pr <= 1.5 samples = 500
    95    |- $prec <= 1.5 samples = 98
    96    |-|- $pcon <= 3.5 samples = 22
    97    |-|-|- ['__2']  # samples = 11 # branch_id = 0
    98    |-|- $flex <= 1.5 samples = 76
    99    |-|-|- $kloc <= 256.0 samples = 24
   100    |-|-|-|- ['__9']  # samples = 11 # branch_id = 1
   101    |-|-|-|- ['__5']  # samples = 13 # branch_id = 2
   102    |-|-|- $team <= 1.5 samples = 52
   103    |-|-|-|- ['__11']  # samples = 13 # branch_id = 3
   104    |-|-|-|- $pcon <= 2.5 samples = 39
   105    |-|-|-|-|- ['__13']  # samples = 20 # branch_id = 4
   106    |-|-|-|-|- ['__9']  # samples = 19 # branch_id = 5
   107    |- $prec <= 1.5 samples = 402
   108    |-|- $etat <= 1.5 samples = 103
   109    |-|-|- ['__13']  # samples = 13 # branch_id = 6
   110    |-|-|- $resl <= 3.5 samples = 90
   111    |-|-|-|- $ltex <= 2.5 samples = 66
   112    |-|-|-|-|- $site <= 3.5 samples = 38
   113    |-|-|-|-|-|- $rely <= 3.5 samples = 25
   114    |-|-|-|-|-|-|- ['__6']  # samples = 12 # branch_id = 7
   115    |-|-|-|-|-|- ['__7']  # samples = 13 # branch_id = 8
   116    |-|-|-|-|- $acap <= 3.5 samples = 28
   117    |-|-|-|-|-|- ['__13']  # samples = 12 # branch_id = 9
   118    |-|-|-|-|-|- ['__4']  # samples = 16 # branch_id = 10
   119    |-|-|-|- $site <= 2.5 samples = 24
   120    |-|-|-|-|- ['__9']  # samples = 12 # branch_id = 11
   121    |-|- $acap <= 3.5 samples = 299
   122    |-|-|- $rely <= 3.5 samples = 157
   123    |-|-|-|- $ruse <= 3.5 samples = 82
   124    |-|-|-|-|- $prec <= 2.5 samples = 47
   125    |-|-|-|-|-|- $pcap <= 3.5 samples = 31
   126    |-|-|-|-|-|-|- ['__7']  # samples = 16 # branch_id = 12
   127    |-|-|-|-|-|-|- ['__1']  # samples = 15 # branch_id = 13
   128    |-|-|-|-|- $docu <= 2.5 samples = 35
   129    |-|-|-|-|-|- ['__13']  # samples = 21 # branch_id = 14
   130    |-|-|-|-|-|- ['__1']  # samples = 14 # branch_id = 15
   131    |-|-|-|- $team <= 2.5 samples = 75
   132    |-|-|-|-|- $docu <= 2.5 samples = 44
   133    |-|-|-|-|-|- ['__1']  # samples = 27 # branch_id = 16
   134    |-|-|-|-|-|- ['__1']  # samples = 17 # branch_id = 17
   135    |-|-|-|-|- ['__1']  # samples = 31 # branch_id = 18
   136    |-|-|- $flex <= 2.5 samples = 142
   137    |-|-|-|- $pcon <= 1.5 samples = 70
   138    |-|-|-|-|- $prec <= 2.5 samples = 53
   139    |-|-|-|-|-|- $kloc <= 204.0 samples = 35
   140    |-|-|-|-|-|-|- ['__7']  # samples = 15 # branch_id = 19
   141    |-|-|-|-|-|-|- ['__1']  # samples = 20 # branch_id = 20
   142    |-|-|-|- $site <= 2.5 samples = 72
   143    |-|-|-|-|- $pcon <= 2.5 samples = 33
   144    |-|-|-|-|-|- ['__6']  # samples = 15 # branch_id = 21
   145    |-|-|-|-|-|- ['__7']  # samples = 18 # branch_id = 22
   146    |-|-|-|-|- $pvol <= 3.5 samples = 39
   147    |-|-|-|-|-|- $pcap <= 3.5 samples = 25
   148    |-|-|-|-|-|-|- ['__1']  # samples = 14 # branch_id = 23
   149    |-|-|-|-|-|-|- ['__11']  # samples = 11 # branch_id = 24
   150    |-|-|-|-|-|- ['__3']  # samples = 14 # branch_id = 25


 

  Infogain Discretized Dtree and Dist pruned doesnt work.

Tree:

 flex <= 27.5 samples = 500
|- docu <= 6.5 samples = 261
|-|- ['__1']  # samples = 31 # branch_id = 0
|-|- resl <= 33.5 samples = 230
|-|-|- ['__2']  # samples = 200 # branch_id = 1
|-|-|- ['__2']  # samples = 30 # branch_id = 2
|- pr <= 18.0 samples = 239
|-|- ['__6']  # samples = 16 # branch_id = 3
|-|- ruse <= 32.0 samples = 223
|-|-|- ['__4']  # samples = 203 # branch_id = 4
 
 
Performance:
 
     Techniques         -effort         -months        -defects          -risks    #
         1 T0 m              31              68              16               0    #
   2 Bef disc m              18              43              17               2    #
   3 Aft disc m              29              61              39               8    #
    4 T9:j/j_ m               0              33               0              22    #
-------------------------------------------------------------------------------------
         1 T0 q              29              16              24               0    #
   2 Bef disc q              15               8              26               3    #
   3 Aft disc q               3               0              10               9    #
    4 T9:j/j_ q               3              22               4              38    #
-------------------------------------------------------------------------------------
         1 T0 w             100             100              97              16    #
   2 Bef disc w              63              65             100              15    #
   3 Aft disc w              74              77              62              13    #
    4 T9:j/j_ w              32              99              58             100    #
-------------------------------------------------------------------------------------
            100         2702.14           41.94        17094.69             8.6    #
              0          117.21            2.44          381.83             0.0    # 


No comments:

Post a Comment