Tuesday, November 6, 2012

Intrinsic dimensionality

Here's an article playing my song:

http://www.stat.berkeley.edu/~bickel/mldim.pdf (cited by 300+ people)

Maximum Likelihood Estimation of Intrinsic Dimension
Elizaveta Levina
Peter J. Bickel

In Advances in NIPS, volume 17. MIT Press, 2005
There is a consensus in the high-dimensional data analysis community that the only
reason any methods work in very high dimensions is that, in fact, the data are not
truly high-dimensional. Rather, they are embedded in a high-dimensional space,
but can be e±ciently summarized in a space of a much lower dimension, such as a
nonlinear manifold. Then one can reduce dimension without losing much informa-
tion for many types of real-life high-dimensional data, such as images, and avoid
many of the \curses of dimensionality".

Most standard method: the correlation dimension plot log(C(r)) vs log(r):


They go on to offer another method, which I don't understand, but the resulting numbers are very close to (1).

For a sample of these log(C(r)) vs log(r) graphs, see fig1 of http://oldweb.ct.infn.it/~rapis/corso-fsc2/grassberger-procaccia-prl1983.pdf





No comments:

Post a Comment