Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emd vs emd_samples #29

Closed
Kawue opened this issue Mar 9, 2018 · 5 comments
Closed

emd vs emd_samples #29

Kawue opened this issue Mar 9, 2018 · 5 comments

Comments

@Kawue
Copy link

Kawue commented Mar 9, 2018

I am slightly confused by your description of emd and emd_samples, since for emd you wrote histogram as parameter and for emd_sample 1D samples.

If I want to compare two images by emd and I use emd_samples. Do I have to pass the images or the histograms of the images to the method?

@wmayner
Copy link
Owner

wmayner commented Mar 9, 2018

emd expects histograms and a distance matrix as input; emd_samples will compute the histograms for you and use the Euclidean distance matrix by default. So you probably want to pass the images to emd_samples, unless you already have the histograms computed for some other reason, in which case you can use emd directly.

@wmayner wmayner closed this as completed Mar 9, 2018
@Kawue
Copy link
Author

Kawue commented Mar 12, 2018

Can you explain to me why the results of emd_samples are highly different from scipy.stats.wasserstein_distance ?
They require the histograms as parameter, but if I provide my flattened image instead the results are quite similar to emd_samples.
Shouldn't the results beeing quite similar if I use fd for binning in both methods?

@wmayner wmayner reopened this Mar 14, 2018
@wmayner
Copy link
Owner

wmayner commented Mar 14, 2018

I'm not sure, but it looks like the SciPy function is using the L1 distance as the ground metric, whereas the default ground distance in emd_samples is the Euclidean distance. Another difference may stem from the fact that emd_samples uses the centers of the bins to generate the distance matrix. There also might be something going on with normalization.

@OzzyTao
Copy link

OzzyTao commented Mar 23, 2018

Hi, I want to penalise mass that remain there they were. But changing diagonal elements of the distance matrix to large values does not seem to work. I'm not sure if my understanding of the package is correct or emd always assume things remain static as much as possible?

@wmayner
Copy link
Owner

wmayner commented Apr 13, 2018

Increasing the diagonal entries will indeed penalize mass that goes directly from bin i to bin i, but remember that there are many other routes that mass could take; the flow diagram might allow the mass to take a less direct, but also less costly, path. Perhaps that explains why you're seeing unexpectedly low EMD values (which is what I assume you mean when you say it “does not work”).

@wmayner wmayner closed this as completed Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants