Monday, October 21, 2013

DataSet ID Comparison

All comments based on KNN performance

http://unbox.org/things/var/fayola/docs/newcliff-v6.pdf
fig. 5

Data Set---ID----- Comment
labor-neg :: 0.68     Noticeable Downtrend
glass        :: 0.75     Down Trend with some improvement at 20%
iris           :: 1.11     Flat
hepatitis   :: 2.45     Noticeable Downtrend on ACC, Prec Not Convinced PD, PF
ecoli        :: 3.27      Flat (acts like Software sets for pd, prec, pf)
bcancer   :: 0.00      No Results (unsure of source)
heartc     :: 0.00       No Results(restricted)
lymph      :: 0.00      No Results (restricted)
vote        :: 0.00       No Results

Note: I am interested to see what the UCI Data Sets look like with Error Bars to see if they look similar in any way to the Software Sets

http://unbox.org/things/var/brian/2013/projects/data-quality/mucker2/output/10-7-2013%20plots/
----------------
xerces1.4:: 0.01    Flat/Not Convinced
poi-3.0   :: 1.45     Flat/Not Convinced
ivy-1.1   :: 1.74     Flat/Not Convinced
synap1.2:: 1.89     Flat/Not Convinced
xalan2.6 :: 1.90     Flat/Not Convinced
jedit-4    :: 2.03     Flat/Not Convinced
ant-1.7   :: 2.08     Flat/Not Convinced
log4j-1.1:: 2.67     Flat/Not Convinced
lucene2.4:: 2.94    Flat/Not Convinced
veloci1.6:: 3.00     Flat/Not Convinced


TODO:
*Create Random DataSets as per "Reflections on the NASA MDP data sets" - D.Gray
*Get My (Vasil's) rig running with the UCI data sets to see if we get similar results?      
*Find sources for [bcancer, lymph, vote]                                                                    
*Get to the bottom of why the noise is not a problem, if ID does not explain it.
*????            

No comments:

Post a Comment