Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vaex instead of pandas for dataframes #2

Open
ltirrell opened this issue Aug 17, 2020 · 0 comments
Open

Use vaex instead of pandas for dataframes #2

ltirrell opened this issue Aug 17, 2020 · 0 comments

Comments

@ltirrell
Copy link
Member

With 20 million rows, pandas is quite slow for reading in data and manipulating it. After a quick assessment of modin and vaex, vaex seems like an easy to use and fast solution. modin was a bit slow for my use case. dask is another option, but based on benchmarks posted online, it also seems like it won't lead to much speed up over raw pandas (though lazy evaluation would probably lead to less swapping).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant