Master's CS Computer Science student since 2011, expected to graduate on August 2013.
Research scope: Data Reduction. Condense large training data into succinct summaries. Enabling users to review and analyze raw data.
Algorithm: PEEKING2 - A tool for data carving.
Implementation:
Inference:
Experiments:
Applied data mining techniques to reduce the cost of data collection for a public health study conducted by WVU.
Research scope: Data Reduction. Condense large training data into succinct summaries. Enabling users to review and analyze raw data.
Algorithm: PEEKING2 - A tool for data carving.
Implementation:
- Feature Selection via Information Gain.
Prune irrelevant features, select 25% of features with highest Info Gain. - FASTMAP.
Project into the direction of the greatest variability, applying a PCA-like linear time projection. - Grid-clustering.
Recursively split clusters by the median of each projected dimension. - Centroid estimation.
Replace each data cluster with its centroid.
Inference:
- Instance base learning. (k=2 Nearest Neighbor)
Extrapolate between centroids to make predictions. - Contrast set rule learning.
Generated rules estimating the deltas btw. centroids.
Experiments:
- PEEKING2 applied on 10 defect data sets and 10 effort data sets from PROMISE.
- Large data reduction is being observed: 93% of original data is reduced.
- Little information is lost. In most of the cases, k=2 NN applied on condensed data performed as well or better as other state of the art algorithm applied on overall data.
Applied data mining techniques to reduce the cost of data collection for a public health study conducted by WVU.
- Applying Correlation-based feature selection, we could drastically reduced the number of features without significantly impacting the performance of Linear Regression.
- We have also observed some degree of stability across different geographical regions.
- Small samples of stores (20%) can be used to make prediction for the rest.
- Samples are selected to minimize the travel distance btw. stores
No comments:
Post a Comment