It would be nice to know how much (if at all) we can trust that prediction. To this end, Ekrem Kocagüneli at Softlab (in Turkey) applied Greedy Agglomerative Clustering to generate a binary tree whose leaves are real training instances and whose internal nodes are a mid-way point between their two children. He found a relationship between prediction error and variance of the performance statistic (*) in the sub-tree. That is, we can return not just the estimate, but an trust measure of how much we should believe that estimate.
(* Ekrem build predictions via the median of the estimate in the leaves of the subtree).
In the following experiments, 20 times, we took out one training instance, GAC the remaining, then estimated the set-aside. Note that after some critical value of variance (on the x-axis), the error spikes (on the y-axis). So, if your test instance falls into sub-trees with that variance, do not trust the estimate
In the following diagram:
- x-axis: weighted variance of sub-tree
- y-axis: log of error = magnitude of relative error = abs(actual - predicted)/ actual
For more details, see here.