MSCS Computer Science - June 2013 -> ??? Research: Data Reduction with Vasil on Public Health Data Data-Quality Projects: I am currently working on a project to test some of the current
"Data Cleaning Techniques" outlined below. First to see how many of
our current datasets contain such "problem data" then to see if injecting
"problem data" into existing datasets will actually have any meaningful effect. Some examples of "problem data" as defined here.
Features 1) Identical Features 2) Constant Features 3) Features with Missing Values 4) Features with Implausible Values Instances 1) Identical Cases 2) Inconsistant Cases 3) Cases with Missing Values 4) Cases with Conflicting Values 5) Cases with Implausible Values The current approach outlined in the paper above is to lump all of
the data falling into these categories together then throw them out
thus cleaning the data.