Labeling every data point is time consuming and unlabeled data is abundant. This motivates the field of active learning, in which learner is able to ask for labels of specific points, but each question is charged.

Another ideas: Use two learners and ask points where they disagree, use SVM and only ask point closest to hyperplane at each round. The question to me is how I can adapt it to effort estimation (a regression problem). We formed a reading group for this problem, the bib file etc. are here: http://unbox.org/wisp/var/ekrem/activeLearning/Literature/The points to be queried are usually chosen from a pool of unlabeled data points. What we need is to ask as few as possible queries and pick up points that would help the learner the most (highest information content).

Possible ideas to find the points to be queried:

1) Build a Voronoi structure and ask the points which are a) center of largest circumcirle or b) subset of Voronoi vertices whose nearest neighbors belong to different classes. It is difficult for high dimensions.