Tuesday, April 5, 2011

Histograms and EMD

Taken from Wikipedia, the source of all knowledge and truth:

"The earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D. Informally, if the distributions are interpreted as two different ways of piling up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by which it is moved."

Largest Earth Mover in the world -- German made. 311 feet tall, 705 feet long, 45,000 tons, can move 76, 455 cubic meters each day.

Weka style attribute Histograms (10 bins).

I can compute the EMD of two numeric columns in JM1 (5440 samples) in 0.0035 seconds.

