Results
The samples generated are equal to original samples we started with.sample work flow:
1. 500 samples
2. cluster into C1.C2....Cn
3. build Dtrees. form branches B1,B2,...Bn. Find best and worst between clusters in B1..Bn.
4. CS1,CS2,...CSn between worse clusters to become better clusters.
5. Regenerate samples using CS1,CS2...CSn. =Total 500. Maintaining the ratio.
6. Generate result tables.
Techniques:
1. Distance pruning: Prune clusters that are less than 0.3 (normalized distance) each other into one big cluster.
2. Dtree pruning: Prune leaves of decision trees -remove leaves with multiple majority classes - remove subtrees with same majority class
3. Discretize: Discretize continous data into discrete values and generate trees. Reduces trees a lot! Affects performance.
4. Infogain: Prune columns with infogain.
Distance pruning works:
Techniques: Distance pruning and DTree pruningTechniques -effort -months -defects -risks # 1 T0 m 31 67 6 0 # 2 Bef prune m 11 20 8 2 # 3 Aft prune m 8 20 11 2 # 4 T9:j/j_ m 0 31 0 22 # ------------------------------------------------------------------------------------- 1 T0 q 26 13 9 2 # 2 Bef prune q 10 0 10 3 # 3 Aft prune q 8 0 14 4 # 4 T9:j/j_ q 3 19 2 38 # ------------------------------------------------------------------------------------- 1 T0 w 100 100 42 16 # 2 Bef prune w 67 38 63 9 # 3 Aft prune w 58 39 100 9 # 4 T9:j/j_ w 31 98 25 100 # ------------------------------------------------------------------------------------- 100 2761.71 42.2 39498.09 8.6 # 0 117.21 3.48 381.83 0.0 #
but tree is big:
MAIN_TREE:
173 $rely <= 3.5 samples = 500
174 |- $ltex <= 2.5 samples = 253
175 |-|- $cplx <= 3.5 samples = 167
176 |-|-|- $site <= 2.5 samples = 54
177 |-|-|-|- $kloc <= 249.0 samples = 24
178 |-|-|-|-|- ['__2'] # samples = 13 # branch_id = 0
179 |-|-|-|-|- ['__4', '__9'] # samples = 11 # branch_id = 1
180 |-|-|-|- $pr <= 2.5 samples = 30
181 |-|-|-|-|- ['__7'] # samples = 11 # branch_id = 2
182 |-|-|-|-|- ['__5'] # samples = 19 # branch_id = 3
183 |-|-|- $ruse <= 3.5 samples = 113
184 |-|-|-|- $pcon <= 2.5 samples = 52
185 |-|-|-|-|- $pcap <= 3.5 samples = 25
186 |-|-|-|-|-|- ['__6'] # samples = 13 # branch_id = 4
187 |-|-|-|-|-|- ['__13'] # samples = 12 # branch_id = 5
188 |-|-|-|-|- $flex <= 2.5 samples = 27
189 |-|-|-|-|-|- ['__7'] # samples = 13 # branch_id = 6
190 |-|-|-|-|-|- ['__8', '__13'] # samples = 14 # branch_id = 7
191 |-|-|-|- $pcap <= 3.5 samples = 61
192 |-|-|-|-|- $kloc <= 152.0 samples = 29
193 |-|-|-|-|-|- ['__15'] # samples = 11 # branch_id = 8
194 |-|-|-|-|-|- ['__13'] # samples = 18 # branch_id = 9
195 |-|-|-|-|- $aexp <= 2.5 samples = 32
196 |-|-|-|-|-|- ['__13', '__15'] # samples = 19 # branch_id = 10
197 |-|-|-|-|-|- ['__12'] # samples = 13 # branch_id = 11
198 |-|- $pvol <= 2.5 samples = 86
199 |-|-|- $resl <= 2.5 samples = 28
200 |-|-|-|- ['__4'] # samples = 15 # branch_id = 12
201 |-|-|-|- ['__2', '__7'] # samples = 13 # branch_id = 13
202 |-|-|- $acap <= 3.5 samples = 58
203 |-|-|-|- $resl <= 2.5 samples = 23
204 |-|-|-|-|- ['__2'] # samples = 12 # branch_id = 14
205 |-|-|-|-|- ['__5'] # samples = 11 # branch_id = 15
206 |-|-|-|- $team <= 2.5 samples = 35
207 |-|-|-|-|- ['__1', '__11'] # samples = 15 # branch_id = 16
208 |-|-|-|-|- ['__7'] # samples = 20 # branch_id = 17
209 |- $pr <= 4.5 samples = 247
210 |-|- $plex <= 2.5 samples = 193
211 |-|-|- $flex <= 2.5 samples = 129
212 |-|-|-|- $pvol <= 2.5 samples = 63
213 |-|-|-|-|- $kloc <= 164.5 samples = 22
214 |-|-|-|-|-|- ['__13'] # samples = 11 # branch_id = 18
215 |-|-|-|-|-|- ['__7'] # samples = 11 # branch_id = 19
216 |-|-|-|-|- $pr <= 3.5 samples = 41
217 |-|-|-|-|-|- $pcap <= 3.5 samples = 29
218 |-|-|-|-|-|-|- ['__11'] # samples = 13 # branch_id = 20
219 |-|-|-|-|-|-|- ['__14'] # samples = 16 # branch_id = 21
220 |-|-|-|-|-|- ['__10'] # samples = 12 # branch_id = 22
221 |-|-|-|- $site <= 1.5 samples = 66
222 |-|-|-|-|- ['__15'] # samples = 17 # branch_id = 23
223 |-|-|-|-|- $ruse <= 3.5 samples = 49
224 |-|-|-|-|-|- ['__15'] # samples = 20 # branch_id = 24
225 |-|-|-|-|-|- $pvol <= 3.5 samples = 29
226 |-|-|-|-|-|-|- ['__12'] # samples = 16 # branch_id = 25
227 |-|-|-|-|-|-|- ['__6', '__16'] # samples = 13 # branch_id = 26
228 |-|-|- $kloc <= 295.5 samples = 64
229 |-|-|-|- $team <= 1.5 samples = 47
230 |-|-|-|-|- ['__13', '__15'] # samples = 17 # branch_id = 27
231 |-|-|-|-|- $etat <= 2.5 samples = 30
232 |-|-|-|-|-|- ['__13'] # samples = 13 # branch_id = 28
233 |-|-|-|-|-|- ['__7'] # samples = 17 # branch_id = 29
234 |-|-|-|- ['__7'] # samples = 17 # branch_id = 30
235 |-|- $cplx <= 3.5 samples = 54
236 |-|-|- ['__9'] # samples = 21 # branch_id = 31
237 |-|-|- $team <= 2.5 samples = 33
238 |-|-|-|- ['__15'] # samples = 15 # branch_id = 32
239 |-|-|-|- ['__12', '__13'] # samples = 18 # branch_id = 33
To dist prune tree:
94 $pr <= 1.5 samples = 500
95 |- $prec <= 1.5 samples = 98
96 |-|- $pcon <= 3.5 samples = 22
97 |-|-|- ['__2'] # samples = 11 # branch_id = 0
98 |-|- $flex <= 1.5 samples = 76
99 |-|-|- $kloc <= 256.0 samples = 24
100 |-|-|-|- ['__9'] # samples = 11 # branch_id = 1
101 |-|-|-|- ['__5'] # samples = 13 # branch_id = 2
102 |-|-|- $team <= 1.5 samples = 52
103 |-|-|-|- ['__11'] # samples = 13 # branch_id = 3
104 |-|-|-|- $pcon <= 2.5 samples = 39
105 |-|-|-|-|- ['__13'] # samples = 20 # branch_id = 4
106 |-|-|-|-|- ['__9'] # samples = 19 # branch_id = 5
107 |- $prec <= 1.5 samples = 402
108 |-|- $etat <= 1.5 samples = 103
109 |-|-|- ['__13'] # samples = 13 # branch_id = 6
110 |-|-|- $resl <= 3.5 samples = 90
111 |-|-|-|- $ltex <= 2.5 samples = 66
112 |-|-|-|-|- $site <= 3.5 samples = 38
113 |-|-|-|-|-|- $rely <= 3.5 samples = 25
114 |-|-|-|-|-|-|- ['__6'] # samples = 12 # branch_id = 7
115 |-|-|-|-|-|- ['__7'] # samples = 13 # branch_id = 8
116 |-|-|-|-|- $acap <= 3.5 samples = 28
117 |-|-|-|-|-|- ['__13'] # samples = 12 # branch_id = 9
118 |-|-|-|-|-|- ['__4'] # samples = 16 # branch_id = 10
119 |-|-|-|- $site <= 2.5 samples = 24
120 |-|-|-|-|- ['__9'] # samples = 12 # branch_id = 11
121 |-|- $acap <= 3.5 samples = 299
122 |-|-|- $rely <= 3.5 samples = 157
123 |-|-|-|- $ruse <= 3.5 samples = 82
124 |-|-|-|-|- $prec <= 2.5 samples = 47
125 |-|-|-|-|-|- $pcap <= 3.5 samples = 31
126 |-|-|-|-|-|-|- ['__7'] # samples = 16 # branch_id = 12
127 |-|-|-|-|-|-|- ['__1'] # samples = 15 # branch_id = 13
128 |-|-|-|-|- $docu <= 2.5 samples = 35
129 |-|-|-|-|-|- ['__13'] # samples = 21 # branch_id = 14
130 |-|-|-|-|-|- ['__1'] # samples = 14 # branch_id = 15
131 |-|-|-|- $team <= 2.5 samples = 75
132 |-|-|-|-|- $docu <= 2.5 samples = 44
133 |-|-|-|-|-|- ['__1'] # samples = 27 # branch_id = 16
134 |-|-|-|-|-|- ['__1'] # samples = 17 # branch_id = 17
135 |-|-|-|-|- ['__1'] # samples = 31 # branch_id = 18
136 |-|-|- $flex <= 2.5 samples = 142
137 |-|-|-|- $pcon <= 1.5 samples = 70
138 |-|-|-|-|- $prec <= 2.5 samples = 53
139 |-|-|-|-|-|- $kloc <= 204.0 samples = 35
140 |-|-|-|-|-|-|- ['__7'] # samples = 15 # branch_id = 19
141 |-|-|-|-|-|-|- ['__1'] # samples = 20 # branch_id = 20
142 |-|-|-|- $site <= 2.5 samples = 72
143 |-|-|-|-|- $pcon <= 2.5 samples = 33
144 |-|-|-|-|-|- ['__6'] # samples = 15 # branch_id = 21
145 |-|-|-|-|-|- ['__7'] # samples = 18 # branch_id = 22
146 |-|-|-|-|- $pvol <= 3.5 samples = 39
147 |-|-|-|-|-|- $pcap <= 3.5 samples = 25
148 |-|-|-|-|-|-|- ['__1'] # samples = 14 # branch_id = 23
149 |-|-|-|-|-|-|- ['__11'] # samples = 11 # branch_id = 24
150 |-|-|-|-|-|- ['__3'] # samples = 14 # branch_id = 25
Infogain Discretized Dtree and Dist pruned doesnt work.
Tree:flex <= 27.5 samples = 500 |- docu <= 6.5 samples = 261 |-|- ['__1'] # samples = 31 # branch_id = 0 |-|- resl <= 33.5 samples = 230 |-|-|- ['__2'] # samples = 200 # branch_id = 1 |-|-|- ['__2'] # samples = 30 # branch_id = 2 |- pr <= 18.0 samples = 239 |-|- ['__6'] # samples = 16 # branch_id = 3 |-|- ruse <= 32.0 samples = 223 |-|-|- ['__4'] # samples = 203 # branch_id = 4
Performance:
Techniques -effort -months -defects -risks # 1 T0 m 31 68 16 0 # 2 Bef disc m 18 43 17 2 # 3 Aft disc m 29 61 39 8 # 4 T9:j/j_ m 0 33 0 22 # ------------------------------------------------------------------------------------- 1 T0 q 29 16 24 0 # 2 Bef disc q 15 8 26 3 # 3 Aft disc q 3 0 10 9 # 4 T9:j/j_ q 3 22 4 38 # ------------------------------------------------------------------------------------- 1 T0 w 100 100 97 16 # 2 Bef disc w 63 65 100 15 # 3 Aft disc w 74 77 62 13 # 4 T9:j/j_ w 32 99 58 100 # ------------------------------------------------------------------------------------- 100 2702.14 41.94 17094.69 8.6 # 0 117.21 2.44 381.83 0.0 #
No comments:
Post a Comment