Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there any spark version implementations? #16

Closed
HCMY opened this issue Apr 20, 2022 · 2 comments
Closed

is there any spark version implementations? #16

HCMY opened this issue Apr 20, 2022 · 2 comments

Comments

@HCMY
Copy link

HCMY commented Apr 20, 2022

hey guys, is there any spark implementation of umap?

@tkonopka
Copy link
Owner

Hi @HCMY. Just to get on the same page: this package provides an implementation of the umap algorithm in R/Rcpp and an implementation that launches the python-based 'umap-learn' (the original umap).

The R/Rcpp implementation in this repo relies on a dataset to be loaded as a matrix in memory. If you can coerce your data from whatever source into a matrix, then all is OK. But if you are asking about processing data as a stream or data that is larger than memory, then that is not supported.

My impression is that some users of umap-learn package have mentioned spark, but I have not used that myself. You can ask there for help (?). Also, keep in mind that their advanced capabilities might not be compatible with the R-python interfacing here, so they might not work through this package. If you have success with this, please share! Cheers.

@HCMY
Copy link
Author

HCMY commented May 22, 2022

Hi @HCMY. Just to get on the same page: this package provides an implementation of the umap algorithm in R/Rcpp and an implementation that launches the python-based 'umap-learn' (the original umap).

The R/Rcpp implementation in this repo relies on a dataset to be loaded as a matrix in memory. If you can coerce your data from whatever source into a matrix, then all is OK. But if you are asking about processing data as a stream or data that is larger than memory, then that is not supported.

My impression is that some users of umap-learn package have mentioned spark, but I have not used that myself. You can ask there for help (?). Also, keep in mind that their advanced capabilities might not be compatible with the R-python interfacing here, so they might not work through this package. If you have success with this, please share! Cheers.

thnaks for your reply, im working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants