Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_dist not updating with Python backend #8

Closed
lgaborini opened this issue May 7, 2019 · 3 comments
Closed

min_dist not updating with Python backend #8

lgaborini opened this issue May 7, 2019 · 3 comments

Comments

@lgaborini
Copy link

Hello, I found out this pretty strange bug.

In short, when I use the umap-learn backend, any setting of min_dist is not respected.
The Python backend complains if it's higher than spread, the arguments change, min_dist is in
config$umap_learn_args, but the UMAP coordinates do not change.

Other settings (e.g. n_neighbors) are good.

When the same code is called directly from a Python session, min_dist is correctly applied.

Repro and more comments in this repository:
https://github.com/lgaborini/umap-bug/blob/master/umap_compare_iris.md

Source (R markdown + reticulate)

Thank you!

@tkonopka
Copy link
Owner

tkonopka commented May 7, 2019

Thanks for the bug report and the detailed examples. Can confirm that I can reproduce the effect. WIll investigate.

@tkonopka
Copy link
Owner

tkonopka commented May 8, 2019

Thanks again for spotting this bug. I just committed an update; it fixes the issue as far as I can tell. Would you be able to try it out too? Many thanks.

Here's more background. min_dist is used together with spread to tune how close together items should appear in the final embedding. An alternative way to achieve the same tuning is via parameters a and b. The latter pair actually take precedence. By default, a and b are None in umap-learn, so changing min_dist (and/or spread) is an effective way of altering the embedding. But if a and b are set, values assigned to min_dist or spread become irrelevant.

The bug in the R package arose because NAs for a and b in the configuration object were mistakenly translated into python number-like values. Those number-like values for a and b made min_dist inactive; calculations with different min_dist produced equivalent coordinates. The fix avoids sending NA values to umap-learn, thus avoids setting a and b, and thus enables min_dist to take effect.

@lgaborini
Copy link
Author

Looking great also in my code! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants