ai @ wvu: DataSet ID Comparison

Monday, October 21, 2013

DataSet ID Comparison

All comments based on KNN performance

http://unbox.org/things/var/fayola/docs/newcliff-v6.pdf
fig. 5

Data Set---ID----- Comment
labor-neg :: 0.68 Noticeable Downtrend
glass :: 0.75 Down Trend with some improvement at 20%
iris :: 1.11 Flat
hepatitis :: 2.45 Noticeable Downtrend on ACC, Prec Not Convinced PD, PF
ecoli :: 3.27 Flat (acts like Software sets for pd, prec, pf)
bcancer :: 0.00 No Results (unsure of source)
heartc :: 0.00 No Results(restricted)
lymph :: 0.00 No Results (restricted)
vote :: 0.00 No Results

Note: I am interested to see what the UCI Data Sets look like with Error Bars to see if they look similar in any way to the Software Sets

http://unbox.org/things/var/brian/2013/projects/data-quality/mucker2/output/10-7-2013%20plots/
----------------
xerces1.4:: 0.01 Flat/Not Convinced
poi-3.0 :: 1.45 Flat/Not Convinced
ivy-1.1 :: 1.74 Flat/Not Convinced
synap1.2:: 1.89 Flat/Not Convinced
xalan2.6 :: 1.90 Flat/Not Convinced
jedit-4 :: 2.03 Flat/Not Convinced
ant-1.7 :: 2.08 Flat/Not Convinced
log4j-1.1:: 2.67 Flat/Not Convinced
lucene2.4:: 2.94 Flat/Not Convinced
veloci1.6:: 3.00 Flat/Not Convinced

TODO:
*Create Random DataSets as per "Reflections on the NASA MDP data sets" - D.Gray
*Get My (Vasil's) rig running with the UCI data sets to see if we get similar results?
*Find sources for [bcancer, lymph, vote]
*Get to the bottom of why the noise is not a problem, if ID does not explain it.
*????

ai @ wvu

Monday, October 21, 2013

DataSet ID Comparison

No comments:

Post a Comment

Labels