coco.awk -> coco.py outputs:
/things/var/darren/coco $ python coco.py coc2-coc81-m.dat
@relation COCOMOII-COC81-MODIFIED
@attribute prec 4
@attribute flex 4
@attribute resl 4
@attribute team 5
@attribute pmat 1 2 3
@attribute rely 1 2 3 4 5
@attribute data 2 3 4 5
@attribute cplx 1 2 3 4 5 6
@attribute ruse 3
@attribute docu 3
@attribute time 3 4 5 6
@attribute stor 3 4 5 6
@attribute pvol 2 3 4 5
@attribute acap 1 2 3 4 5
@attribute pcap 1 2 3 4 5
@attribute pcon 3
@attribute aexp 1 2 3 4 5 <--- 'apex' ?
@attribute plex 1 2 3 4
@attribute ltex 1 2 3 4
@attribute tool 1 2
@attribute site 4
@attribute sced 1 2 3
@attribute kloc 1.98 2.14 3 3 3.6 3.9 4 4.4 5.3 6.1 6.2 6.3 6.7 6.9 8.2 9.1 9.4 10 13 15 16 17 22 23 24 25 27 28 28.6 29 30 30.6 32 35 37 38 42 45.5 48 60 62 73 77 90 91 113 118 132 252 293 299 320 390 464 1150
@attribute effort 5.9 6 7.3 8 9 12 14 15 18 20 33 36 38 40 41 43 45 47 50 55 57 60 61 70 79 82 83 87 88 98 106 122 126 130 156 176 201 218 230 237 240 243 321 387 423 453 523 539 605 702 724 958 1063 1075 1272 1600 2040 2455 6400 6600 11400
@data
4 4 4 5 1 2 5 1 3 3 3 4 4 2 2 3 2 2 3 1 4 3 113 2040 14183 39.5 2.7
4 4 4 5 1 2 5 2 3 3 3 4 3 3 3 3 4 4 4 1 4 3 293 1600 23374 46.7 1.1
4 4 4 5 3 3 5 2 3 3 3 3 2 4 4 3 5 4 4 2 4 3 132 243 3136 26.9 0.0
4 4 4 5 1 1 5 1 3 3 3 3 2 2 1 3 4 3 4 1 4 3 60 240 5272 26.9 3.8
4 4 4 5 1 2 2 3 3 3 3 3 2 3 4 3 3 4 4 1 4 3 16 33 970 14.3 2.1
4 4 4 5 1 1 3 2 3 3 3 5 3 1 1 3 3 4 4 1 4 3 4 43 553 11.6 5.4
4 4 4 5 3 1 3 3 3 3 3 3 2 3 3 3 3 4 4 2 4 3 6.9 8 350 10.3 1.3
4 4 4 5 1 4 2 5 3 3 6 6 5 5 3 3 4 1 1 1 4 2 22 1075 3260 23.6 2.9
4 4 4 5 3 4 2 5 3 3 5 5 4 4 4 3 3 2 2 1 4 3 30 423 1989 24.1 0.0
4 4 4 5 2 5 2 5 3 3 4 6 3 4 4 3 5 4 3 1 4 3 29 321 1274 21.7 0.5
4 4 4 5 2 5 2 5 3 3 4 6 3 4 4 3 5 4 3 1 4 3 32 218 1406 22.5 0.5
4 4 4 5 3 4 2 5 3 3 4 4 3 4 4 3 5 3 4 1 4 2 37 201 1517 17.9 0.5
4 4 4 5 3 4 2 5 3 3 4 4 4 5 5 3 3 2 3 1 4 3 25 79 1138 18.4 0.0
4 4 4 5 1 4 2 6 3 3 5 6 4 4 5 3 3 2 2 1 4 1 3 60 387 9.4 5.4
4 4 4 5 3 5 2 5 3 3 5 4 4 4 4 3 2 1 1 1 4 1 3.9 61 300 9.8 3.2
4 4 4 5 2 5 3 5 3 3 5 6 3 4 4 3 3 3 3 1 4 3 6.1 40 390 14.9 0.5
4 4 4 5 2 5 3 5 3 3 5 6 3 4 4 3 5 3 3 1 4 3 3.6 9 196 11.6 0.5
4 4 4 5 1 4 5 4 3 3 5 5 3 4 3 3 3 3 3 1 4 2 320 11400 34588 52.4 2.1
4 4 4 5 3 4 4 3 3 3 4 5 2 5 3 3 4 3 3 2 4 3 1150 6600 38222 64.4 0.0
4 4 4 5 1 5 4 5 3 3 4 5 4 5 3 3 5 2 2 1 4 2 299 6400 26471 50.0 2.1
4 4 4 5 3 3 5 4 3 3 3 3 2 4 3 3 3 3 3 2 4 3 252 2455 11664 40.8 0.0
4 4 4 5 3 4 3 3 3 3 3 4 3 4 4 3 5 4 3 1 4 1 118 724 4402 20.3 1.1
4 4 4 5 2 4 3 3 3 3 3 4 3 4 4 3 5 4 3 1 4 1 77 539 3714 18.2 2.1
4 4 4 5 3 2 3 2 3 3 3 4 3 3 3 3 1 2 4 3 4 3 90 453 5188 28.8 0.5
4 4 4 5 3 4 5 5 3 3 3 4 3 4 4 3 3 2 3 2 4 2 38 523 2269 20.2 0.0
4 4 4 5 3 3 3 2 3 3 3 4 4 4 4 3 3 2 3 1 4 2 48 387 2419 18.5 0.5
4 4 4 5 3 4 2 4 3 3 3 5 3 3 3 3 3 3 3 1 4 2 9.4 88 517 12.1 0.5
4 4 4 5 1 4 4 5 3 3 4 5 4 4 4 3 3 2 2 1 4 3 13 98 1473 19.6 1.6
4 4 4 5 3 2 3 3 3 3 3 3 3 3 4 3 1 3 3 2 4 1 2.14 7.3 163 5.7 3.8
4 4 4 5 3 2 3 3 3 3 3 3 3 3 4 3 1 3 3 2 4 1 1.98 5.9 151 5.5 3.8
4 4 4 5 2 5 4 3 3 3 3 6 4 4 4 3 5 2 2 1 4 3 62 1063 3139 30.7 0.5
4 4 4 5 1 2 4 2 3 3 3 3 3 5 3 3 5 3 3 1 4 3 390 702 26021 42.9 2.7
4 4 4 5 3 5 4 5 3 3 3 6 4 4 4 3 5 4 3 2 4 3 42 605 1534 25.4 0.0
4 4 4 5 3 4 4 3 3 3 3 3 3 3 3 3 3 3 3 1 4 1 23 230 1271 14.2 2.7
4 4 4 5 1 1 2 5 3 3 3 5 4 3 3 3 4 2 3 1 4 3 13 82 2085 16.5 2.1
4 4 4 5 2 2 3 3 3 3 3 3 2 2 2 3 3 4 4 2 4 3 15 55 1004 15.8 0.0
4 4 4 5 2 2 2 1 3 3 3 4 3 4 4 3 5 3 3 1 4 3 60 47 2453 19.0 2.7
4 4 4 5 3 3 3 4 3 3 3 3 2 5 3 3 4 4 4 2 4 3 15 12 467 13.0 0.5
4 4 4 5 3 3 3 4 3 3 3 3 2 5 5 3 5 3 4 1 4 3 6.2 8 167 9.0 1.1
4 4 4 5 1 3 2 5 3 3 3 3 3 4 2 3 5 3 3 1 4 3 3 8 251 8.8 1.6
4 4 4 5 3 2 2 3 3 3 3 3 2 3 5 3 5 4 4 2 4 3 5.3 6 146 8.1 1.1
4 4 4 5 2 2 3 3 3 3 3 4 2 4 3 3 3 4 4 1 4 3 45.5 45 2645 21.0 1.1
4 4 4 5 2 3 3 3 3 3 3 5 2 4 3 3 3 4 4 1 4 3 28.6 83 1416 18.9 0.5
4 4 4 5 1 2 3 3 3 3 3 5 2 3 3 3 3 4 4 1 4 3 30.6 87 2444 20.5 1.1
4 4 4 5 2 2 3 3 3 3 3 4 2 3 3 3 3 4 4 1 4 3 35 106 2198 20.1 0.5
4 4 4 5 2 2 3 3 3 3 3 4 2 3 4 3 3 4 4 1 4 3 73 126 4188 25.1 1.1
4 4 4 5 1 1 2 5 3 3 3 3 2 5 5 3 5 2 2 1 4 3 23 36 1839 14.6 6.4
4 4 4 5 1 2 2 2 3 3 3 3 2 2 2 3 4 4 4 1 4 3 464 1272 29640 51.4 2.7
4 4 4 5 3 3 3 2 3 3 3 3 3 5 5 3 3 2 3 2 4 3 91 156 2874 22.6 2.1
4 4 4 5 2 4 3 3 3 3 5 5 3 4 4 3 3 2 3 1 4 3 24 176 1541 20.3 0.5
4 4 4 5 1 2 3 3 3 3 3 3 3 2 1 3 3 3 4 1 4 3 10 122 1225 16.2 3.8
4 4 4 5 1 2 2 2 3 3 3 4 4 3 3 3 3 2 2 1 4 3 8.2 41 855 13.1 1.1
4 4 4 5 2 2 2 4 3 3 4 5 5 5 5 3 3 2 2 1 4 2 5.3 14 533 9.3 3.8
4 4 4 5 3 3 2 3 3 3 3 4 4 3 3 3 5 3 4 1 4 3 4.4 20 184 9.9 0.0
4 4 4 5 1 2 2 1 3 3 3 3 2 4 2 3 5 4 4 1 4 3 6.3 18 264 9.0 2.7
4 4 4 5 1 4 2 5 3 3 5 5 3 4 3 3 4 2 2 1 4 2 27 958 2971 20.3 2.1
4 4 4 5 1 3 2 4 3 3 4 5 5 3 3 3 3 2 2 1 4 1 17 237 2622 16.0 5.4
4 4 4 5 3 5 2 5 3 3 6 5 3 5 5 3 5 4 4 1 4 3 25 130 692 19.6 0.0
4 4 4 5 3 3 2 4 3 3 3 4 3 3 3 3 3 3 3 1 4 3 23 70 1294 18.2 0.0
4 4 4 5 1 4 2 5 3 3 4 4 3 4 4 3 2 2 2 1 4 2 6.7 57 707 11.6 3.2
4 4 4 5 3 3 2 4 3 3 3 3 2 4 4 3 3 4 3 1 4 3 28 50 997 16.4 0.0
4 4 4 5 3 2 2 5 3 3 4 5 4 3 5 3 5 1 1 1 4 3 9.1 38 782 14.4 1.1
4 4 4 5 3 3 2 4 3 3 3 3 3 5 4 3 5 3 3 1 4 3 10 15 356 10.9 0.5
@attribute defects 146 151 163 167 184 196 251 264 300 350 356 387 390 467 517 533 553 692 707 782 855 970 997 1004 1138 1225 1271 1274 1294 1406 1416 1473 1517 1534 1541 1839 1989 2085 2198 2269 2419 2444 2453 2622 2645 2874 2971 3136 3139 3260 3714 4188 4402 5188 5272 11664 14183 23374 26021 26471 29640 34588 38222
@attribute months 5.5 5.7 8.1 8.8 9.0 9.0 9.3 9.4 9.8 9.9 10.3 10.9 11.6 11.6 11.6 12.1 13.0 13.1 14.2 14.3 14.4 14.6 14.9 15.8 16.0 16.2 16.4 16.5 17.9 18.2 18.2 18.4 18.5 18.9 19.0 19.6 19.6 20.1 20.2 20.3 20.3 20.3 20.5 21.0 21.7 22.5 22.6 23.6 24.1 25.1 25.4 26.9 26.9 28.8 30.7 39.5 40.8 42.9 46.7 50.0 51.4 52.4 64.4
@attribute risks 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.3 1.6 1.6 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.7 2.7 2.7 2.7 2.7 2.7 2.9 3.2 3.2 3.8 3.8 3.8 3.8 3.8 5.4 5.4 5.4 6.4
-------
~/things/var/darren/coco $ python coco.py coc2-nasa93-m.dat
@relation COCOMOII-NASA93-MODIFIED
@attribute prec 4
@attribute flex 4
@attribute resl 4
@attribute team 5
@attribute pmat 2 3 4
@attribute rely 2 3 4 5
@attribute data 2 3 4 5
@attribute cplx 2 3 4 5 6
@attribute ruse 3
@attribute docu 3
@attribute time 3 4 5 6
@attribute stor 3 4 5 6
@attribute pvol 2 3 4
@attribute acap 3 4 5
@attribute pcap 3 4 5
@attribute pcon 3
@attribute aexp 2 3 4 5 <--- 'apex' ?
@attribute plex 1 2 3 4
@attribute ltex 1 2 3 4
@attribute tool 3 4
@attribute site 3
@attribute sced 2 3
@attribute kloc 0.9 2.2 3 3.5 5.5 6 6.2 6.5 7.25 7.5 7.7 8 8.2 9.7 10 10.4 11.3 11.4 12.8 13 14 15 15.4 16 16.3 19.3 19.7 20 21 24 24.6 25.9 29.5 31.5 32 32.5 32.6 34 35.5 38 40 41 47.5 48.5 50 53 60 65 66.6 70 78 79 85 90 98 100 101 111 137 144 150 151 162 165 177.9 190 219 227 233 240 271 282.1 284.7 302 339 350 352 423 980
@attribute effort 8.4 10.8 12 18 24 25.2 31.2 36 38 42 48 50 60 62 70 72 82 90 97 98.8 107 114 117.6 120 150 155 162 170 192 210 215 239 240 252 278 300 324 352.8 360 370 400 409 420 430 432 444 458 480 571.4 576 599 600 636 648 703 720 750 756 882 973 1181 1200 1248 1350 1368 1645.9 1772.5 1924.5 2120 2400 2460 4178.2 4560 8211
@data
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 25.9 117.6 808 15.3 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 24.6 117.6 767 15.0 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 7.7 31.2 240 10.1 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 8.2 36 256 10.4 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 9.7 25.2 302 11.0 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 2.2 8.4 69 6.6 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 3.5 10.8 109 7.8 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 66.6 352.8 2077 21.0 0.0
4 4 4 5 4 4 2 4 3 3 6 6 2 4 4 3 4 3 4 4 3 3 7.5 72 209 13.1 0.0
4 4 4 5 3 3 2 4 3 3 3 3 2 4 5 3 5 3 4 3 3 3 20 72 479 13.5 0.5
4 4 4 5 3 3 2 4 3 3 3 3 2 4 4 3 5 3 4 3 3 3 6 24 159 9.3 0.0
4 4 4 5 3 3 2 4 3 3 3 3 2 4 5 3 5 3 4 3 3 3 100 360 2397 23.6 0.5
4 4 4 5 3 3 2 4 3 3 3 3 2 4 3 3 5 3 2 3 3 3 11.3 36 388 12.0 0.0
4 4 4 5 3 3 2 4 3 3 3 3 4 4 4 3 4 2 1 3 3 3 100 215 5034 29.0 0.0
4 4 4 5 3 3 2 4 3 3 3 3 2 4 4 3 5 3 4 3 3 3 20 48 531 14.1 0.0
4 4 4 5 3 3 2 4 3 3 3 3 2 4 3 3 3 3 1 3 3 3 100 360 4342 28.0 0.3
4 4 4 5 3 3 2 4 3 3 3 6 2 4 5 3 5 3 4 3 3 3 150 324 4121 30.5 0.5
4 4 4 5 3 3 2 4 3 3 3 3 2 4 4 3 4 3 4 3 3 3 31.5 60 912 17.0 0.0
4 4 4 5 3 3 2 4 3 3 3 3 2 4 4 3 5 3 4 3 3 3 15 48 398 12.8 0.0
4 4 4 5 3 3 2 4 3 3 3 6 2 4 3 3 4 3 4 3 3 3 32.5 60 1181 20.0 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 19.7 60 614 13.9 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 66.6 300 2077 21.0 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 29.5 120 920 16.0 0.0
4 4 4 5 3 4 3 3 3 3 4 3 3 3 4 3 4 3 3 3 3 3 15 90 532 14.6 0.0
4 4 4 5 3 4 3 4 3 3 3 3 3 3 4 3 4 3 3 3 3 3 38 210 1436 20.4 0.0
4 4 4 5 3 3 3 3 3 3 3 3 3 3 4 3 4 3 3 3 3 3 10 48 395 11.9 0.0
4 4 4 5 4 3 5 4 3 3 5 5 2 5 3 3 4 2 4 3 3 2 15.4 70 708 14.0 0.5
4 4 4 5 4 3 5 4 3 3 5 5 2 5 3 3 4 2 4 3 3 2 48.5 239 2231 20.6 0.5
4 4 4 5 4 3 5 4 3 3 5 5 2 5 3 3 4 2 4 3 3 2 16.3 82 750 14.2 0.5
4 4 4 5 4 3 5 4 3 3 5 5 2 5 3 3 4 2 4 3 3 2 12.8 62 589 13.1 0.5
4 4 4 5 4 3 5 4 3 3 5 5 2 5 3 3 4 2 4 3 3 2 32.6 170 1500 18.0 0.5
4 4 4 5 4 3 5 4 3 3 5 5 2 5 3 3 4 2 4 3 3 2 35.5 192 1633 18.5 0.5
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 5.5 18 172 9.1 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 10.4 50 324 11.2 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 14 60 437 12.4 0.0
4 4 4 5 3 4 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6.5 42 290 12.0 0.0
4 4 4 5 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 13 60 683 14.8 0.0
4 4 4 5 4 3 3 4 3 3 3 3 3 3 4 3 3 3 4 4 3 3 90 444 3343 26.7 0.0
4 4 4 5 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 8 42 420 12.5 0.0
4 4 4 5 3 3 3 4 3 3 4 3 3 3 3 3 3 3 3 3 3 3 16 114 887 16.4 0.0
4 4 4 5 4 3 4 4 3 3 5 4 2 4 4 3 3 2 4 3 3 2 177.9 1248 7998 31.5 0.0
4 4 4 5 4 4 2 4 3 3 3 3 2 3 4 3 3 3 3 3 3 3 302 2400 8543 38.4 0.0
4 4 4 5 4 3 4 2 3 3 3 3 4 4 3 3 4 3 3 4 3 3 282.1 1368 9090 35.9 0.0
4 4 4 5 4 4 4 2 3 3 3 3 3 4 3 3 4 3 3 3 3 3 284.7 973 7887 36.7 0.0
4 4 4 5 3 4 4 3 3 3 3 3 2 3 4 3 4 3 4 3 3 3 79 400 2151 25.9 0.0
4 4 4 5 2 2 3 3 3 3 3 3 2 4 5 3 4 3 4 3 3 3 423 2400 17050 40.3 2.1
4 4 4 5 4 3 3 3 3 3 3 3 2 4 5 3 5 2 4 3 3 3 190 420 4309 28.4 0.5
4 4 4 5 4 3 3 4 3 3 3 4 3 4 3 3 4 3 4 3 3 3 47.5 252 1858 21.4 0.0
4 4 4 5 2 5 3 6 3 3 4 4 2 3 3 3 4 3 3 4 3 3 21 107 978 20.5 0.0
4 4 4 5 2 3 4 4 3 3 5 3 3 4 4 3 4 3 4 3 3 3 78 571.4 4456 29.4 0.0
4 4 4 5 2 3 4 4 3 3 5 3 3 4 4 3 4 3 4 3 3 3 11.4 98.8 651 14.9 0.0
4 4 4 5 2 3 4 4 3 3 5 3 3 4 4 3 4 3 4 3 3 3 19.3 155 1103 17.9 0.0
4 4 4 5 2 4 3 5 3 3 4 4 2 4 3 3 3 4 4 3 3 3 101 750 4840 32.4 0.0
4 4 4 5 2 4 3 4 3 3 4 4 2 3 3 3 4 3 3 3 3 3 219 2120 10883 41.2 0.0
4 4 4 5 2 4 3 4 3 3 4 4 2 3 3 3 4 3 3 3 3 3 50 370 2485 24.4 0.0
4 4 4 5 4 5 4 4 3 3 5 5 3 5 5 3 5 3 4 4 3 2 227 1181 5335 31.7 0.0
4 4 4 5 4 3 4 5 3 3 3 3 2 4 5 3 3 2 3 3 3 2 70 278 2950 20.2 0.5
4 4 4 5 4 4 2 4 3 3 3 3 2 3 3 3 3 3 4 3 3 2 0.9 8.4 28 4.9 0.0
4 4 4 5 2 5 2 6 3 3 6 5 2 4 4 3 5 1 4 3 3 3 980 4560 43279 90.3 0.0
4 4 4 5 3 3 2 4 3 3 3 3 2 5 5 3 3 4 4 3 3 3 350 720 8547 35.7 1.1
4 4 4 5 4 4 3 6 3 3 4 4 2 4 3 3 3 4 4 4 3 3 70 458 2404 27.5 0.0
4 4 4 5 4 4 3 6 3 3 4 4 2 4 3 3 3 4 4 4 3 3 271 2460 9308 43.4 0.0
4 4 4 5 3 3 3 3 3 3 3 3 2 4 4 3 4 3 4 3 3 3 90 162 2537 24.0 0.0
4 4 4 5 3 3 3 3 3 3 3 3 2 4 4 3 4 3 4 3 3 3 40 150 1127 18.1 0.0
4 4 4 5 3 4 3 4 3 3 4 3 2 4 4 3 4 3 4 3 3 3 137 636 3894 31.0 0.0
4 4 4 5 3 4 3 4 3 3 4 3 4 4 4 3 4 3 4 3 3 3 150 882 5413 34.8 0.0
4 4 4 5 3 5 3 4 3 3 4 3 2 4 4 3 4 3 4 3 3 3 339 444 7840 44.2 0.0
4 4 4 5 3 2 4 2 3 3 3 3 4 4 4 3 4 3 4 3 3 3 240 192 9544 35.7 1.1
4 4 4 5 2 4 3 4 3 3 3 5 2 4 4 3 4 4 4 3 3 2 144 576 5670 27.7 0.0
4 4 4 5 2 3 2 3 3 3 3 5 2 4 4 3 4 4 4 3 3 2 151 432 5676 25.2 0.0
4 4 4 5 2 3 2 4 3 3 3 5 2 4 4 3 4 4 4 3 3 2 34 72 1438 15.6 0.0
4 4 4 5 2 3 3 4 3 3 3 5 2 4 4 3 4 4 4 3 3 2 98 300 4540 23.5 0.0
4 4 4 5 2 3 3 4 3 3 3 5 2 4 4 3 4 4 4 3 3 2 85 300 3937 22.3 0.0
4 4 4 5 2 3 2 3 3 3 3 5 2 4 4 3 4 4 4 3 3 2 20 240 752 12.3 0.0
4 4 4 5 2 3 2 3 3 3 3 5 2 4 4 3 4 4 4 3 3 2 111 600 4172 22.6 0.0
4 4 4 5 2 4 5 4 3 3 3 5 2 4 4 3 4 4 4 3 3 2 162 756 6987 31.2 0.0
4 4 4 5 2 4 4 5 3 3 3 5 2 4 4 3 4 4 4 3 3 2 352 1200 16280 41.3 0.0
4 4 4 5 2 4 3 5 3 3 3 5 2 4 4 3 4 4 4 3 3 2 165 97 7278 30.3 0.0
4 4 4 5 4 4 3 5 3 3 4 4 2 4 3 3 3 4 4 3 3 3 60 409 2004 24.9 0.0
4 4 4 5 4 4 3 5 3 3 4 4 2 4 3 3 3 4 4 3 3 3 100 703 3340 29.6 0.0
4 4 4 5 3 4 5 5 3 3 6 6 4 3 3 3 3 2 2 3 3 3 32 1350 2984 33.6 0.0
4 4 4 5 4 4 4 4 3 3 5 6 4 4 4 3 4 4 4 3 3 3 53 480 2061 27.7 0.0
4 4 4 5 4 4 2 5 3 3 5 6 2 5 5 3 5 1 1 4 3 3 41 599 1354 21.5 0.0
4 4 4 5 4 4 2 5 3 3 5 6 2 5 5 3 5 1 1 4 3 3 24 430 793 18.0 0.0
4 4 4 5 4 5 4 5 3 3 6 6 3 4 4 3 4 4 4 3 3 3 165 4178.2 5799 45.5 0.0
4 4 4 5 4 5 4 5 3 3 6 6 3 4 4 3 4 4 4 3 3 3 65 1772.5 2285 33.2 0.0
4 4 4 5 4 5 4 5 3 3 6 6 3 4 4 3 4 4 4 3 3 3 70 1645.9 2460 34.1 0.0
4 4 4 5 4 5 4 6 3 3 6 6 3 4 4 3 4 4 4 3 3 3 50 1924.5 1946 32.9 0.0
4 4 4 5 2 5 2 5 3 3 5 6 2 4 3 3 2 1 2 4 3 3 7.25 648 442 16.1 0.0
4 4 4 5 4 5 4 5 3 3 6 6 3 4 4 3 4 4 4 3 3 3 233 8211 8189 51.1 0.0
4 4 4 5 3 4 3 5 3 3 5 5 4 3 3 3 3 2 2 3 3 3 16.3 480 1253 21.5 0.0
4 4 4 5 3 4 3 5 3 3 5 5 4 3 3 3 3 2 2 3 3 3 6.2 12 477 15.4 0.0
4 4 4 5 3 4 3 5 3 3 5 5 4 3 3 3 3 2 2 3 3 3 3 38 231 12.0 0.0
@attribute defects 28 69 109 159 172 209 231 240 256 290 302 324 388 395 398 420 437 442 477 479 531 532 589 614 651 683 708 750 752 767 793 808 887 912 920 978 1103 1127 1181 1253 1354 1436 1438 1500 1633 1858 1946 2004 2061 2077 2077 2151 2231 2285 2397 2404 2460 2485 2537 2950 2984 3340 3343 3894 3937 4121 4172 4309 4342 4456 4540 4840 5034 5335 5413 5670 5676 5799 6987 7278 7840 7887 7998 8189 8543 8547 9090 9308 9544 10883 16280 17050 43279
@attribute months 4.9 6.6 7.8 9.1 9.3 10.1 10.4 11.0 11.2 11.9 12.0 12.0 12.0 12.3 12.4 12.5 12.8 13.1 13.1 13.5 13.9 14.0 14.1 14.2 14.6 14.8 14.9 15.0 15.3 15.4 15.6 16.0 16.1 16.4 17.0 17.9 18.0 18.0 18.1 18.5 20.0 20.2 20.4 20.5 20.6 21.0 21.0 21.4 21.5 21.5 22.3 22.6 23.5 23.6 24.0 24.4 24.9 25.2 25.9 26.7 27.5 27.7 27.7 28.0 28.4 29.0 29.4 29.6 30.3 30.5 31.0 31.2 31.5 31.7 32.4 32.9 33.2 33.6 34.1 34.8 35.7 35.7 35.9 36.7 38.4 40.3 41.2 41.3 43.4 44.2 45.5 51.1 90.3
@attribute risks 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.1 1.1 2.1
awk defects, months summaries:
coc2-coc81-m.dat
@attribute defects 146 151 163 167 184 196 251 264 300 350 356 387 390 467 517 533 553 692 707 782 855 970 997 1004 1138 1225 1271 1274 1294 1406 1416 1473 1517 1534 1541 1839 1989 2085 2198 2269 2419 2444 2453 2622 2645 2874 2971 3136 3139 3260 3714 4188 4402 5188 5272 11664 14183 23374 26021 26471 29640 34588
@attribute months 5.5 5.7 8.1 8.8 9.0 9.0 9.3 9.4 9.8 9.9 10.3 10.9 11.6 11.6 11.6 12.1 13.0 13.1 14.2 14.3 14.4 14.6 14.9 15.8 16.0 16.2 16.4 16.5 17.9 18.2 18.2 18.4 18.5 18.9 19.0 19.6 19.6 20.1 20.2 20.3 20.3 20.3 20.5 21.0 21.7 22.5 22.6 23.6 24.1 25.1 25.4 26.9 26.9 28.8 30.7 39.5 40.8 42.9 46.7 50.0 51.4 52.4
coc2-nasa93-m.dat
@attribute defects 28 69 109 159 172 209 231 240 256 290 302 324 388 395 398 420 437 442 477 479 531 532 589 614 651 683 708 750 752 767 793 808 887 912 920 978 1103 1127 1181 1253 1354 1436 1438 1500 1633 1858 1946 2004 2061 2077 2077 2151 2231 2285 2397 2404 2460 2485 2537 2950 2984 3340 3343 3894 3937 4121 4172 4309 4342 4456 4540 4840 5034 5335 5413 5670 5676 5799 6987 7278 7840 7887 7998 8189 8543 8547 9090 9308 9544 10883 16280 17050
@attribute months 4.9 6.6 7.8 9.1 9.3 10.1 10.4 11.0 11.2 11.9 12.0 12.0 12.0 12.3 12.4 12.5 12.8 13.1 13.1 13.5 13.9 14.0 14.1 14.2 14.6 14.8 14.9 15.0 15.3 15.4 15.6 16.0 16.1 16.4 17.0 17.9 18.0 18.0 18.1 18.5 20.0 20.2 20.4 20.5 20.6 21.0 21.0 21.4 21.5 21.5 22.3 22.6 23.5 23.6 24.0 24.4 24.9 25.2 25.9 26.7 27.5 27.7 27.7 28.0 28.4 29.0 29.4 29.6 30.3 30.5 31.0 31.2 31.5 31.7 32.4 32.9 33.2 33.6 34.1 34.8 35.7 35.7 35.9 36.7 38.4 40.3 41.2 41.3 43.4 44.2 45.5 51.1
original coc2-coc81.dat (python):
@attribute defects 128 138 173 197 216 230 276 294 309 350 387 390 418 504 517 533 553 650 813 855 918 970 997 1004 1138 1225 1271 1294 1416 1473 1496 1541 1651 1783 1803 1989 2161 2198 2250 2269 2419 2444 2622 2645 2874 2883 3203 3511 3682 3694 4188 4362 4407 5172 5688 11664 13027 25229 30484 30955 32002 34588 41248
@attribute months 5.2 5.3 8.7 9.3 9.4 9.5 9.5 9.6 9.6 10.3 10.6 11.3 11.6 11.6 12.1 12.3 13.1 13.5 14.2 14.3 14.9 15.3 15.6 15.8 16.0 16.2 16.4 17.2 18.2 18.4 18.5 18.9 19.1 19.5 19.6 20.1 20.2 20.3 20.3 20.5 20.9 21.0 21.1 21.7 22.6 23.2 24.0 24.1 24.5 25.1 27.1 27.1 28.0 28.7 32.8 38.4 40.8 45.8 48.6 52.4 53.4 53.4 67.0
@attribute risks 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.3 1.6 1.6 1.6 1.6 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.7 2.7 2.7 2.7 2.7 2.7 2.9 3.8 3.8 3.8 4.8 4.8 5.4 6.4
original coc2-nasa93.dat (python):
@attribute defects 28 69 109 172 188 226 231 240 256 290 302 324 406 420 427 437 456 470 477 566 575 614 626 636 683 704 765 767 808 810 813 887 920 933 986 1058 1191 1219 1253 1276 1553 1555 1594 1619 1763 2004 2007 2077 2077 2102 2227 2327 2404 2409 2468 2658 2685 2743 2832 2950 2984 3340 3343 4210 4256 4342 4511 4815 4840 4868 4907 5092 5434 5848 6129 6136 6266 6293 7553 7867 7998 8477 8518 8543 8547 8848 9308 9820 10313 11761 17597 18447 50961
@attribute months 4.9 6.6 7.8 9.1 9.9 10.1 10.4 11.0 11.2 12.0 12.0 12.4 12.4 12.5 12.8 12.8 13.6 13.6 13.6 13.9 14.4 14.5 14.8 14.8 15.0 15.1 15.2 15.3 15.4 15.5 15.6 16.0 16.2 16.4 17.6 18.6 18.7 18.9 19.2 19.3 20.2 20.8 21.0 21.0 21.3 21.3 21.4 21.5 22.3 23.0 23.2 23.5 24.4 24.9 25.0 25.2 25.4 26.2 26.7 26.9 27.5 28.0 28.8 28.8 29.6 30.1 30.3 30.5 31.5 31.5 32.2 32.4 32.4 32.5 33.6 33.8 34.2 34.5 35.4 35.7 36.2 37.1 37.3 38.1 38.4 41.9 42.8 42.9 43.4 45.9 47.3 53.1 96.4
@attribute risks 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.1 1.1 2.1
Tuesday, December 17, 2013
Monday, December 16, 2013
Tuesday, December 3, 2013
Statistics Report
name ,name , evals, cost, score, comp, idle, ibd, ibs
POM3A ,Baseline , 0.00, 606.77, 0.48, 0.93, 0.16, 1.00, 0.00
POM3A ,SPEA2 , 2800.00, 68.55, 0.84, 0.85, 0.00, 0.91, 0.03,
POM3A ,NSGAII , 3320.00, 82.93, 0.89, 0.90, 0.00, 0.91, 0.03,
POM3A ,top1bin , 52.35, 197.12, 0.86, 0.99, 0.02, 0.84, 0.03,
POM3A ,top3bin , 63.55, 156.99, 0.80, 0.99, 0.00, 0.84, 0.02,
POM3A ,top60%bin , 59.70, 368.05, 0.71, 0.99, 0.01, 0.88, 0.03,
POM3A ,allbins , 54.05, 179.80, 0.85, 0.99, 0.01, 0.85, 0.02,
SPEA2 Decision Report
SPEA2 on POM3A recommends the following range for Culture:[0.1~0.178]
SPEA2 on POM3A recommends the following range for Criticality:[0.82~0.858]
SPEA2 on POM3A recommends the following range for Criticality Modifier:[2.0~2.795]
SPEA2 on POM3A recommends the following range for Initial Known:[0.67~0.7]
SPEA2 on POM3A recommends the following range for Inter-Dependency:[50.5~60.4]
SPEA2 on POM3A recommends the following range for Dynamism:[1.0~5.722]
SPEA2 on POM3A recommends the following range for Size:[0.0~0.399]
SPEA2 on POM3A recommends the following range for Plan:[4.5~5.0]
SPEA2 on POM3A recommends the following range for Team Size:[1.0~5.299]
A validation of these ranges was tested, with 20 repeats:
Cost: 80.59
Score: 0.73
Completion: 0.70
Idle: 0.09
NSGA-II Decision Report
NSGAII on POM3A recommends the following range for Culture:[0.1~0.18]
NSGAII on POM3A recommends the following range for Criticality:[0.82~0.858]
NSGAII on POM3A recommends the following range for Criticality Modifier:[2.0~2.777]
NSGAII on POM3A recommends the following range for Initial Known:[0.67~0.7]
NSGAII on POM3A recommends the following range for Inter-Dependency:[50.51~60.402]
NSGAII on POM3A recommends the following range for Dynamism:[1.0~5.862]
NSGAII on POM3A recommends the following range for Size:[0.0~0.395]
NSGAII on POM3A recommends the following range for Plan:[0.0~0.5]
NSGAII on POM3A recommends the following range for Team Size:[1.0~5.293]
A validation of these ranges was tested, with 20 repeats:
Cost: 95.56
Score: 0.75
Completion: 0.62
Idle: 0.08
all bins:
charts - http://i.imgur.com/DLcN70R.png
GALE on POM3A recommends the following range for Culture:[0.165~0.252]
GALE on POM3A recommends the following range for Criticality:[0.864~0.922]
GALE on POM3A recommends the following range for Criticality Modifier:[2.55~3.192]
GALE on POM3A recommends the following range for Initial Known:[0.464~0.522]
GALE on POM3A recommends the following range for Inter-Dependency:[15.01~19.44]
GALE on POM3A recommends the following range for Dynamism:[6.66~8.354]
GALE on POM3A recommends the following range for Size:[0.466~0.645]
GALE on POM3A recommends the following range for Plan:[2.036~2.274]
GALE on POM3A recommends the following range for Team Size:[6.075~9.17]
A validation of these ranges was tested, with 20 repeats:
Cost: 128.47
Score: 0.82
Completion: 0.84
Idle: 0.20
top 1 bin:
charts - http://i.imgur.com/yf8secC.png
GALE on POM3A recommends the following range for Culture:[0.316~0.404]
GALE on POM3A recommends the following range for Criticality:[0.821~0.884]
GALE on POM3A recommends the following range for Criticality Modifier:[2.958~3.327]
GALE on POM3A recommends the following range for Initial Known:[0.434~0.497]
GALE on POM3A recommends the following range for Inter-Dependency:[39.044~46.42]
GALE on POM3A recommends the following range for Dynamism:[8.134~9.335]
GALE on POM3A recommends the following range for Size:[0.033~0.406]
GALE on POM3A recommends the following range for Plan:[0.42~0.824]
GALE on POM3A recommends the following range for Team Size:[7.181~9.132]
A validation of these ranges was tested, with 20 repeats:
Cost: 122.17
Score: 0.74
Completion: 0.89
Idle: 0.13
top 3 bins:
charts - http://i.imgur.com/7SkvOTi.png
GALE on POM3A recommends the following range for Culture:[0.32~0.41]
GALE on POM3A recommends the following range for Criticality:[0.86~0.93]
GALE on POM3A recommends the following range for Criticality Modifier:[2.972~3.333]
GALE on POM3A recommends the following range for Initial Known:[0.445~0.51]
GALE on POM3A recommends the following range for Inter-Dependency:[12.0~17.887]
GALE on POM3A recommends the following range for Dynamism:[7.908~10.196]
GALE on POM3A recommends the following range for Size:[0.236~0.509]
GALE on POM3A recommends the following range for Plan:[0.658~0.907]
GALE on POM3A recommends the following range for Team Size:[6.05~8.548]
A validation of these ranges was tested, with 20 repeats:
Cost: 140.87
Score: 0.72
Completion: 0.84
Idle: 0.19
top 60% bins:
charts - http://i.imgur.com/0X2Op5i.png
GALE on POM3A recommends the following range for Culture:[0.261~0.322]
GALE on POM3A recommends the following range for Criticality:[0.84~0.874]
GALE on POM3A recommends the following range for Criticality Modifier:[8.838~9.53]
GALE on POM3A recommends the following range for Initial Known:[0.56~0.586]
GALE on POM3A recommends the following range for Inter-Dependency:[59.328~68.416]
GALE on POM3A recommends the following range for Dynamism:[3.68~7.19]
GALE on POM3A recommends the following range for Size:[3.442~3.82]
GALE on POM3A recommends the following range for Plan:[0.44~0.848]
GALE on POM3A recommends the following range for Team Size:[29.19~31.784]
A validation of these ranges was tested, with 20 repeats:
Cost: 1242.29
Score: 0.82
Completion: 0.93
Idle: 0.10
Explanation in MOEA = Planning
.-''-.
__.....__ _..._ .----. .----. .' .-. )
.-'' '. .' '.\ \ / /.-. .-/ .' / /
/ .-''"'-. `. . .-. .' '. /' / \ \ / (_/ / /
/ /________\ \| ' ' || |' / \ \ / / / /
| || | | || || | \ \ / / / /
\ .-------------'| | | |'. `' .' \ \ / / . '
\ '-.____...---.| | | | \ / \ ` / / / _.-')
`. .' | | | | \ / \ / .' ' _.'.-''
`''-...... -' | | | | '----' / / / /.-'_.'
| | | | |`-' / / _.'
'--' '--' '..' ( _.-'
Here's problem :
The above shows 93 projects (with 23 decisions and 3 objectives) mapped into a 2-d space.
- The x-axis is the cosine dimension (where the two end points are found via the FastMap heuristic)
- The y-axis is the Pythagorus dimension: sqrt(a^2 -x^2)
- The colors denote clusters (so all the red ones at the bottom left are in one cluster).
- Clustering done via 4-way recursive splits on mean x,y points
The principle of envy says
- Find your nearest cluster with a better class score
- We use the mean IBEA score of the objectives = sum_i e^(obj[i] - min[i]) where all objectives have been normalized (x-min)/(max - min).
- Find the the delta from you to them
- That is is your plan on how to make your life better.
And here's the problem:
- how to explain that plan
- when "nearest" is expressed in the above weirdo cosine and Pythagorus dimensions.
Here's one solution:
- Given a pair of centroids (asIs, Envied)
- Sample:
- N times repeat
- Generate an example Eg at random from the examples in those two clusters
- Let value(Eg) be the score of each example (and its expressed as the distance to the Envied cluster)
- So smallest scores are better
- For each range in the example
- Add value(Eg) to value(range)
- Prune:
- Sort ranges by their value
- Let gap = distance(centroid(asIs), centroid(Envied))
- Score ranges by their ability to move rows in asIs towards Envied, as follows.
- For i = 1 to 10 (some magic number)
- Let elite be the best i ranges e.g.
- ((colNum, range),
[(18, 'l'),
(15, 'vh'),
(19, 'n'),
(8, 'xh'), ....] - For each row in asIs (i.e. the things you want to change)
- Let before = distance(row,centroid(Envied))
- Let mutant = copy(row)
- Inject all the ranges into elite into mutant
- Let after = distance(mutant, centroid(Envied))
- Let movement(elite) = (before - after)/gap
- Policy = the ranges with most movement
Details:
- Estimates of defect, effort, months from Vasil's algorithm (interpolation between 2 nearest clusters)
- To generate an example, I used the DE trick of interpolating between values. This has the benefit of preserving known distributions
def smear(rows,cols,cr=0.5,f=0.5): i = one(rows) j = one(rows) k = one(rows) must = any(0, len(i)) new = [] for n,(a,b,c,col) in enumerate(zip(i,j,k,cols)): if a == "?" or b == "?" or c == "?": x = "?" else: x = a if n == must or rand() < cr: if isa(col,Num): x = a + f*(b-c) else: if rand() < f: x = b if rand() < 0.5 else c new += [x] return new
Results
When applied in a 5*5 cross-val, the 25,50,75,100th percentiles of defect, effort, months were as follows. Note the large decreases at the 75th percentile:effort
before: [0.44, 0.98, 4.52, 199.5]
after: [0.39, 0.76, 2.01, 120.2]
defects
before: [0.34, 0.84, 3.57, 124.76]
after: [0.37, 0.72, 2.09, 127.61]
months
before: [0.15, 0.33, 0.58, 3.7]
after: [0.16, 0.35, 0.51, 3.76]
Tuesday, November 19, 2013
Version Tracking Visualization
Results 1/21/14
Results of A/B/C/D prediction: dismal
Results 2:
Back to the CSV: class names are listed
Type A: 4% B: 11% C: 12% D: 71% NoMatch: 0%
Type A: 3% B: 17% C: 8% D: 63% NoMatch: 5%
Type A: 5% B: 5% C: 18% D: 69% NoMatch: 0%
Type A: 17% B: 8% C: 15% D: 58% NoMatch: 0%
['camel-1.0.csv', 'camel-1.2.csv', 'camel-1.4.csv', 'camel-1.6.csv']
Type A: 3% B: 0% C: 22% D: 51% NoMatch: 22%
Type A: 15% B: 18% C: 3% D: 55% NoMatch: 6%
Type A: 9% B: 7% C: 10% D: 71% NoMatch: 1%
['ivy-1.1.csv', 'ivy-1.4.csv', 'ivy-2.0.csv']
Type A: 7% B: 47% C: 2% D: 40% NoMatch: 1%
Type A: 0% B: 0% C: 0% D: 0% NoMatch: 100%
['jedit-3.2.csv', 'jedit-4.0.csv', 'jedit-4.1.csv', 'jedit-4.2.csv', 'jedit-4.3.csv']
Type A: 17% B: 15% C: 5% D: 58% NoMatch: 2%
Type A: 16% B: 7% C: 9% D: 62% NoMatch: 4%
Type A: 9% B: 15% C: 3% D: 64% NoMatch: 6%
Type A: 0% B: 11% C: 0% D: 47% NoMatch: 38%
['log4j-1.0.csv', 'log4j-1.1.csv', 'log4j-1.2.csv']
Type A: 16% B: 6% C: 8% D: 41% NoMatch: 27%
Type A: 30% B: 1% C: 56% D: 5% NoMatch: 5%
['lucene-2.0.csv', 'lucene-2.2.csv', 'lucene-2.4.csv']
Type A: 33% B: 12% C: 24% D: 28% NoMatch: 1%
Type A: 42% B: 15% C: 21% D: 15% NoMatch: 4%
['synapse-1.0.csv', 'synapse-1.1.csv', 'synapse-1.2.csv']
Type A: 5% B: 4% C: 22% D: 63% NoMatch: 3%
Type A: 13% B: 12% C: 19% D: 53% NoMatch: 1%
['velocity-1.4.csv', 'velocity-1.5.csv', 'velocity-1.6.csv']
Type A: 40% B: 34% C: 2% D: 2% NoMatch: 20%
Type A: 26% B: 37% C: 3% D: 29% NoMatch: 2%
['xalan-2.4.csv', 'xalan-2.5.csv', 'xalan-2.6.csv', 'xalan-2.7.csv']
Type A: 9% B: 4% C: 36% D: 44% NoMatch: 4%
Type A: 27% B: 20% C: 15% D: 31% NoMatch: 4%
Type A: 44% B: 0% C: 51% D: 1% NoMatch: 2%
['xerces-1.2.csv', 'xerces-1.3.csv', 'xerces-1.4.csv']
Type A: 3% B: 11% C: 10% D: 72% NoMatch: 1%
Type A: 7% B: 0% C: 38% D: 25% NoMatch: 27%
Idea: New dataset consisting of:
- All attributes of N
- All attributes of N+1
- The delta between N and N+1
- Class of defect change
Result1
- Preliminary feature selection with info gain selecting top 50%
- Normalized and discredited with Fayyed-Irani
- PCA via FastMap
- Grid clustering
- Centroids plotted along with version n+1 nearest neighbor lines. (Not terribly useful)
- Do I smell transforms of best fit around the corner?
Results0
k-means 5 to cluster each data-set within itself
Eigenvalues used to determine select features with most influance
Actual selected columns are plotted, not synthesized dimensions
-- significant correlations could be reported as synonmyms
rules for connecting the dots?
Monday, November 11, 2013
Tree query languages for MOEA
Method
- Cluster the data
- Find deltas of interest between the clusters
- Score each cluster
- Let each row have objective scores, normalized 0..1, min..max
- Let the score of a row by the sum of the normalized scores
- Let the score of a cluster be the mean of the score of its rows
- Technically, this is almost the cdom predicate used in IBEA
- For each cluster C1
- Find its nearest neighbor C2 with a better score
- Assert one (leave,goto) tuple for (C1,C2)
- Build and prune a decision tree on the clusters
- Label each instance with the cluster it belongs to
- Build a decision tree on the labelled data set.
- Find the clusters that are only weakly recognized by the decision tree learner
- e.g. use a three-way cross val and prune anything with F < 0.5
- Remove the weakly recognized clusters
- For each (C1,C2) tuple where both are not weakly recognized,
- Query the tree to find the delta
Observation: the trees are so small that this can be done manually.
Example
Nasa93 clustered into 2D (one color per cluster)
Cluster details
- All the following values are normalized 0..1, min..max
- Defects and months are connected, but not always
- Effort is not what separates the projects- its more about defects and calender time to develop
- Clearly, cluster 2 is a bad place and 10 and 13 look nicest.
acap = h
| apex = h
| | pmat = h
| | | plex = h: _2 (6.0/1.0)
| | | plex = n: _4 (3.0)
| | pmat = l
| | | cplx = vh: _3 (2.0)
| | | cplx = h
| | | | time = vh: _3 (3.0)
| | | | time = n: _6 (4.0/1.0)
| | | cplx = n: _5 (2.0)
| | pmat = n: _6 (4.0/1.0)
| apex = n
| | data = h: _6 (2.0/1.0)
| | data = n: _4 (3.0/1.0)
| | data = l: _13 (1.0)
| apex = vh
| | pcap = h: _10 (3.0)
| | pcap = vh: _7 (2.0/1.0)
acap = n
| sced = n
| | stor = xh: _7 (1.0)
| | stor = n
| | | cplx = h
| | | | pcap = h: _10 (3.0/1.0)
| | | | pcap = n: _13 (3.0)
| | | cplx = n: _7 (3.0/1.0)
| | stor = vh: _11 (3.0)
| | stor = h: _11 (2.0)
| sced = l
| | $kloc <= 16.3: _9 (5.0)
| | $kloc > 16.3: _8 (6.0)
acap = vh: _12 (7.0/1.0)
A 3-way cross-val yielded following confusion matrix.
- The underlined and bold entries are the correctly classified rows.
- The red entries are errors.
- Note the poor performance for recognizing clusters 4,5,6,7,10,13
a b c d e f g h i j k l <-- as="" classified="" font="">-->
5 0 0 0 0 0 0 0 0 0 0 0 | a = _2
0 4 1 1 0 0 0 0 0 0 0 0 | b = _3
1 0 2 0 1 0 0 0 1 0 0 0 | c = _4
0 1 1 0 3 0 0 0 0 0 0 0 | d = _5
0 1 2 3 0 0 0 0 1 0 0 1 | e = _6
0 0 0 0 0 0 0 0 1 2 1 1 | f = _7
0 0 0 0 0 0 6 0 0 0 0 0 | g = _8
0 0 0 0 0 0 1 4 0 0 0 0 | h = _9
0 0 1 0 0 0 0 0 3 0 0 2 | i = _10
0 0 0 0 0 1 0 0 0 4 0 0 | j = _11
0 0 0 0 0 1 0 0 0 0 6 0 | k = _12
0 0 1 0 0 0 0 0 1 1 0 2 | l = _13
The above confusion matrix is mapped into the "f" measures of the following table.
- The "goto" column marks the deltas of interest.
- Low "f" values are marked in gray.
- Any "goto" that comes or goes into gray is marked with gray.
cluster | n | effort | defects | months | f | goto |
2 | 5 | 43% | 25% | 72% | 91% | 3 |
3 | 6 | 5% | 32% | 42% | 67% | 6 |
4 | 5 | 6% | 17% | 37% | 31% | 8 |
5 | 5 | 7% | 29% | 43% | 0% | 6 |
6 | 8 | 6% | 24% | 40% | 0% | 10 |
7 | 5 | 7% | 16% | 36% | 0% | 13 |
8 | 6 | 2% | 6% | 17% | 92% | 9 |
9 | 5 | 0% | 1% | 3% | 89% | |
10 | 6 | 2% | 9% | 22% | 46% | 12 |
11 | 5 | 7% | 18% | 31% | 67% | 13 |
12 | 7 | 2% | 7% | 18% | 86% | |
13 | 5 | 7% | 15% | 26% | 36% | |
total: | 68 |
If we prune the above tree of any branch that leads only to gray classes, we get, as promised above, a very small tree.
acap = h
| apex = h
| | pmat = h
| | | plex = h: _2 (6.0/1.0)
| | pmat = l
| | | cplx = vh: _3 (2.0)
| | | cplx = h
| | | | time = vh: _3 (3.0)
acap = n
| sced = n
| | stor = vh: _11 (3.0)
| | stor = h: _11 (2.0)
| sced = l
| | $kloc <= 16.3: _9 (5.0)
| | $kloc > 16.3: _8 (6.0)
acap = vh: _12 (7.0/1.0)
Summary
- The definite statements that clearly make changes in SE data are very succinct.
- But they might not cover everything.
Question: what would you baseline this against? I.e. how would you certify this as a good/crappy idea?
Monday, October 28, 2013
Monday, October 21, 2013
DataSet ID Comparison
All comments based on KNN performance
http://unbox.org/things/var/fayola/docs/newcliff-v6.pdf
fig. 5
Data Set---ID----- Comment
labor-neg :: 0.68 Noticeable Downtrend
glass :: 0.75 Down Trend with some improvement at 20%
iris :: 1.11 Flat
hepatitis :: 2.45 Noticeable Downtrend on ACC, Prec Not Convinced PD, PF
ecoli :: 3.27 Flat (acts like Software sets for pd, prec, pf)
bcancer :: 0.00 No Results (unsure of source)
heartc :: 0.00 No Results(restricted)
lymph :: 0.00 No Results (restricted)
vote :: 0.00 No Results
Note: I am interested to see what the UCI Data Sets look like with Error Bars to see if they look similar in any way to the Software Sets
http://unbox.org/things/var/brian/2013/projects/data-quality/mucker2/output/10-7-2013%20plots/
----------------
xerces1.4:: 0.01 Flat/Not Convinced
poi-3.0 :: 1.45 Flat/Not Convinced
ivy-1.1 :: 1.74 Flat/Not Convinced
synap1.2:: 1.89 Flat/Not Convinced
xalan2.6 :: 1.90 Flat/Not Convinced
jedit-4 :: 2.03 Flat/Not Convinced
ant-1.7 :: 2.08 Flat/Not Convinced
log4j-1.1:: 2.67 Flat/Not Convinced
lucene2.4:: 2.94 Flat/Not Convinced
veloci1.6:: 3.00 Flat/Not Convinced
TODO:
*Create Random DataSets as per "Reflections on the NASA MDP data sets" - D.Gray
*Get My (Vasil's) rig running with the UCI data sets to see if we get similar results?
*Find sources for [bcancer, lymph, vote]
*Get to the bottom of why the noise is not a problem, if ID does not explain it.
*????
http://unbox.org/things/var/fayola/docs/newcliff-v6.pdf
fig. 5
Data Set---ID----- Comment
labor-neg :: 0.68 Noticeable Downtrend
glass :: 0.75 Down Trend with some improvement at 20%
iris :: 1.11 Flat
hepatitis :: 2.45 Noticeable Downtrend on ACC, Prec Not Convinced PD, PF
ecoli :: 3.27 Flat (acts like Software sets for pd, prec, pf)
bcancer :: 0.00 No Results (unsure of source)
heartc :: 0.00 No Results(restricted)
lymph :: 0.00 No Results (restricted)
vote :: 0.00 No Results
Note: I am interested to see what the UCI Data Sets look like with Error Bars to see if they look similar in any way to the Software Sets
http://unbox.org/things/var/brian/2013/projects/data-quality/mucker2/output/10-7-2013%20plots/
----------------
xerces1.4:: 0.01 Flat/Not Convinced
poi-3.0 :: 1.45 Flat/Not Convinced
ivy-1.1 :: 1.74 Flat/Not Convinced
synap1.2:: 1.89 Flat/Not Convinced
xalan2.6 :: 1.90 Flat/Not Convinced
jedit-4 :: 2.03 Flat/Not Convinced
ant-1.7 :: 2.08 Flat/Not Convinced
log4j-1.1:: 2.67 Flat/Not Convinced
lucene2.4:: 2.94 Flat/Not Convinced
veloci1.6:: 3.00 Flat/Not Convinced
TODO:
*Create Random DataSets as per "Reflections on the NASA MDP data sets" - D.Gray
*Get My (Vasil's) rig running with the UCI data sets to see if we get similar results?
*Find sources for [bcancer, lymph, vote]
*Get to the bottom of why the noise is not a problem, if ID does not explain it.
*????
Tuesday, October 8, 2013
UPC Find
Update 5/01/14
Nutritional Facts information: we've got it for about 30% of the items.
Unique UPCs | Total Items
Items matching Walmart.com : 5436 34.3% | 5904 35.0%
Items with Walmart nutrition : 4514 28.5% | 4940 29.0%
'Sodium' : 4540 28.7% | 4972 30.0%
'Total Carbohydrate' : 4540 28.7% | 4973 30.0%
'Protein' : 4514 28.5% | 4940 29.8%
'Total Fat' : 4512 28.5% | 4949 29.9%
'Sugars' : 3924 24.8% | 4251 25.7%
'Saturated Fat' : 3778 23.8% | 4228 25.5%
'Cholesterol' : 3744 23.6% | 4175 25.2%
'Trans Fat' : 3446 21.7% | 3849 23.2%
'Dietary Fiber' : 3296 20.8% | 3503 21.1%
'Potassium' : 1462 9.2% | 1506 9.1%
'Calories' : 614 3.9% | 729 4.4%
'Dietary' : 360 2.3% | 567 3.4%
'Monounsaturated Fat' : 274 1.7% | 333 2.0%
'' : 250 1.6% | 248 1.5%
'Saturated' : 64 0.4% | 69 0.4%
'Trans' : 64 0.4% | 54 0.3%
'Sugars Less Than' : 12 0.1% | 9 0.1%
'Total' : 8 0.1% | 4 0.0%
'Dietary Fiber Less Than' : 6 0.0% | 4 0.0%
'Monounsaturated Fat 1.5' : 2 0.0% | 1 0.0%
'Dietary Fiber 2' : 2 0.0% | 7 0.0%
Items matching upcdatabase.com : 4344 27.4% | 5125 30.0%
Items matching local UPC list : 2594 16.4% | 3046 18.0%
Items matching ONLY Walmart.com : 3352 21.2% | 3606 21.0%
Items matching ONLY upcdatabase : 3352 21.2% | 3606 21.0%
Items matching ONLY local list : 3352 21.2% | 3606 21.0%
Items matching all three sources: 1474 9.3% | 1717 10.0%
Items matching any source : 7886 49.8% | 8891 53.0%
Update 3/27/14
Criteria for match success: must match at ONE of the following:
-Walmart.com result
-UPC database result + NDB match
-Food database result + NDB match
Graphs (some of these are repeats from lost blog posts):
Tabular Results:
top 1000 items
Update--Web Sources for Barcode Lookup:
Summary:
- Of the 50 Items scanned from my fiance's kitchen, a match was found for 29 items (58%).
- If beverages are removed from the list, the match rate for this set of scans jumps to 28/40 (70%).
- An interesting observation: in this set of scans, ALL store-brand products failed to match.
- Partial matching of UPCs yielded poor results
- For this set, 81% of the items which failed to match had a brand code which matched to something in the database. Perhaps brand could be used in some circumstances to make an educated guess about the properties of the item if a match cannot be made.
- positive example: Florida's Natural (low brand variance)
- negative example: Sam's Choice (high brand variance)
UPCA == TMMMMMPPPPPX
where T is type (0 for US UPC)
M is manufacturer code
P is product code
X is check digit
1) UPCA => Jif Peanut Butter
2) UPCA => Nutella Hazelnut Spread with Skim Milk & Cocoa
3) UPCA => No Match 078742095233
4) UPCA => No Match 011110833303
5) UPCA => No Match 041498127824
6) UPCA => No Match 044000031138
7) UPCA => Bush's Best Baked Beans
8) UPCA => Kraft Dinners Easy Mac
9) UPCA => No Match 038000844966
10) UPCA => Campbell's Pasta
11) UPCA => Betty Crocker Instant Potatoes
12) UPCA => Campbell's R&W Condensed Soup
13) UPCA => No Match 085000016176
14) UPCA => No Match 085000019894
15) UPCA => No Match 081172780006
16) UPCA => Smart Balance Cooking Spray
17) UPCA => No Match 031200002945
18) UPCA => No Match 078742351896
19) UPCA => Kraft Philadelphia Cream Cheese Spread
20) UPCA => No Match 016300151304
21) UPCA => No Match 070847000037
22) UPCA => Coca-Cola Cola
23) UPCA => No Match 087692591009
24) UPCB => UPCA 8857378 => 088573000078
24) UPCA => No Match 088573000078
25) UPCA => Sweet Baby Ray's Barbecue Sauce
26) UPCB => UPCA 1364008 => 013000006408
26) UPCA => Heinz Ketchup
27) UPCA => French's Classic Yellow Mustard
28) UPCA => Hellmann's Mayonnaise
29) UPCA => DiGiorno Pizza & Breadsticks
30) UPCA => Birds Eye Steamfresh Corn
31) UPCA => No Match 011110673565
32) UPCA => Lance Toastchee
33) UPCA => No Match 078742434377
34) UPCA => McCormick Grill Mates Seasoning
35) UPCA => Sun Chips Flavored Multigrain Snack
36) UPCA => Doritos Tortilla Chips
37) UPCA => No Match 050000497256
38) UPCA => Smucker's Preserves
39) UPCA => No Match 011110786715
40) UPCA => Tostitos Dip
41) UPCA => Prego Italian Sauce
42) UPCA => Quaker Oatmeal Instant Oatmeal
43) UPCA => Swiss Miss Hot Cocoa Mix
44) UPCA => Swanson RTS Broth
45) UPCA => No Match 078742030104
46) UPCA => No Match 011110492630
47) UPCA => Betty Crocker Loaded Mashed
48) UPCA => No Match 072736014880
49) UPCA => Duncan Hines Cake Mix
50) UPCA => Knorr Side Dishes Fiesta Sides
match: 29/50 58.0%
Items not found:
Note: one-off, two-off are the number of database items which differ by one or two characters
UPCA | FoundBrand | one-off | two-off | type (* indicates store-brand)
078742095233 | T | 0 | 2 | trail mix*
011110833303 | T | 0 | 0 | trail mix*
041498127824 | T | 0 | 0 | baking soda*
044000031138 | T | 0 | 9 | ritz snack packs
038000844966 | T | 0 | 0 | pringles
085000016176 | F | 0 | 2 | gin
085000019894 | F | 0 | 0 | wine
081172780006 | T | 0 | 0 | gummy candy
031200002945 | T | 0 | 18 | crasins
078742351896 | T | 0 | 2 | skim milk*
016300151304 | T | 0 | 4 | orange juice
070847000037 | T | 0 | 2 | energy drink
087692591009 | F | 0 | 0 | beer
088573000078 | F | 0 | 0 | beer
011110673565 | T | 0 | 0 | jello*
078742434377 | T | 0 | 1 | paprika*
050000497256 | T | 0 | 0 | coffee mate
011110786715 | T | 0 | 0 | applesauce*
078742030104 | T | 0 | 1 | chicken breast*
011110492630 | T | 0 | 2 | cola*
072736014880 | T | 0 | 0 | vinaigrette
List of Items Scanned:
1) Gif reduced fat PB
2) Nutella
3) tropical Trail Mix
4) traditional trail mix
5) baker's corner baking soda
6) ritz fresh snacks
7) busch's baked beans
8) kraft Easy mac
9) pringles original
10) Spaghetto0s meatballs
11) betty crocker sour cream and chives
12) campbell's chicken noodle
13) New Amsterdam Gin
14) Barefood Red moscato
15) PVZ gummies
16) smart balance cooking spray
17) crasins
18) great value skim milk
19) phillidelpha cream cheese
20) florida's natural
21) monster absolutely zero
22) coke 2 litre
23) sam adams latitude 48
24) schiner bock
25) sweet baby ray's
26) heinz
27) french's
28) hellman's
29) pizza and breadsticks
30) bird's eye corn
31) kroger strawberry jello
32) lance toast chee
33) great value paprika
34) grill mates mesquite spice
35) sun chips
36) dorritos
37) coffee mate french vanilla
38) smuckers strawberry preserves
39) kroger applesauce
40) tostitos creamy spanish
41) prego meat
42) quaker fruit n cream
43) swiss miss
44) swanson chicken broth
45) great value chicken breast
46) big k vanilla cola
47) betty crocker loaded mash
48) vinaigrette
49) duncan hines spice cake
50) knorr taco rice
Subscribe to:
Posts (Atom)