Monday, June 24, 2013

Brian: what I do

MSCS Computer Science - June 2013 -> ???

Research: Data Reduction with Vasil on Public Health Data
          Data-Quality

Projects: I am currently working on a project to test some of the current 
"Data Cleaning Techniques" outlined below. First to see how many of 
our current datasets contain such "problem data" then to see if injecting 
"problem data" into existing datasets will actually have any meaningful effect.

Some examples of "problem data" as defined here. 
Features 
1) Identical Features
2) Constant Features
3) Features with Missing Values
4) Features with Implausible Values

Instances
1) Identical Cases
2) Inconsistant Cases
3) Cases with Missing Values
4) Cases with Conflicting Values
5) Cases with Implausible Values

The current approach outlined in the paper above is to lump all of 
the data falling into these categories together then throw them out 
thus cleaning the data. 
 

No comments:

Post a Comment