New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Superhash: binby, groupby, unique, value_counts and xarray support #197
Conversation
dcf3eee
to
fef2fe7
Compare
9d38553
to
60241a4
Compare
cc @rabernat I've chosen to split 'groupby' into two:
|
04feec8
to
4eae0e3
Compare
I find this potential issue:
that returns
For big datasets this also sometimes kills the kernel, although I always reproduce it. |
Due to fundamental issues with the aggregation code, this led to a complete refactor, which also gives a 20-100% speedup in certain cases for 1 or 2 binning for non-float (and even float in some cases). |
cb394b9
to
f6d1966
Compare
…er the existing ones according to a corresponding dict or a mapping function.
…port discrete bins
f6d1966
to
3565ea3
Compare
fix: compiler warnings cleanip+small tweaks remove: unique now replaced by superhash
fix: vaex-hdf5: support timedelta64
…s.txt ci: skip numba for py35
3565ea3
to
526a75b
Compare
I've squashed some commits (mainly CI debugging), but since it's a big change, I'm gonna merge this as is to have a fine grained history. |
This is the start of using hashmaps in vaex for
Value count is 235x faster than the old implementation, 4.7x faster compared to pandas (benchmarked on 1e8 random integers between 0 and 99).