Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superhash: binby, groupby, unique, value_counts and xarray support #197

Merged
merged 43 commits into from
Apr 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
9e54373
improved: value_counts using hashmap
maartenbreddels Apr 2, 2019
043ce92
vendor hopscotch map
maartenbreddels Apr 2, 2019
46817cf
initial commit
maartenbreddels Apr 2, 2019
e3cea84
vendor flat_hash_map
maartenbreddels Apr 2, 2019
69d80b2
hash: ´make it easier to switch hashmap implementation
maartenbreddels Apr 2, 2019
a517d44
hash: proper default template args
maartenbreddels Apr 2, 2019
dc786ad
initial commit
maartenbreddels Apr 2, 2019
fc71bd2
hash: using tessil instead of skarupke hashmap (gcc >= 5.0), also bet…
maartenbreddels Apr 2, 2019
7e5dde0
hash: fix: reference count missed (+tests)
maartenbreddels Apr 2, 2019
7b89891
make test order independant
maartenbreddels Apr 3, 2019
c91e9a7
add support for bool
maartenbreddels Apr 4, 2019
fa2f9ac
new: add ordered set to improve unique performance
maartenbreddels Apr 12, 2019
efc3c62
small bugs
maartenbreddels Apr 12, 2019
6938ac7
initial commit
maartenbreddels Apr 12, 2019
2f3ab29
new: proper groupby start
maartenbreddels Apr 12, 2019
4c55b9c
Relase the gil
maartenbreddels Apr 12, 2019
bdfd43d
small groupby fixes
maartenbreddels Apr 12, 2019
e6ed02a
new: binby and better groupby
maartenbreddels Apr 15, 2019
94f673c
fix: test: order independant
maartenbreddels Apr 15, 2019
4a955bd
fix: proper datatime handling for minmax
maartenbreddels Apr 15, 2019
afa2fee
Adding a map method to the vaex expressions, which maps new values ov…
Feb 16, 2019
1bbb727
Improving the map method for expressions: Applying the comments from …
JovanVeljanoski Mar 28, 2019
2708725
Small fix for python2.7.
JovanVeljanoski Mar 28, 2019
97d8e3a
fix: py27 support
maartenbreddels Apr 15, 2019
b762efc
missed a set
maartenbreddels Apr 15, 2019
ca2791a
optimize map
maartenbreddels Apr 15, 2019
471b623
make .keys() give an ordered list so no sorting is needed
maartenbreddels Apr 16, 2019
40c5819
performance: use double instead of ints, saves a casting until we sup…
maartenbreddels Apr 16, 2019
ffccb5d
performance: use delayed execution so all operations are done in 1 pa…
maartenbreddels Apr 16, 2019
ccc24d8
new+improvement: binning/aggregation is redone, much more efficient, …
maartenbreddels Apr 19, 2019
b257ef6
stringview->string, added first, mean/min/max use new system, cleanups
maartenbreddels Apr 19, 2019
7ce9fd5
task fix
maartenbreddels Apr 19, 2019
44de132
fix: msvc doesnt like it
maartenbreddels Apr 19, 2019
b1a6745
add AggSum_int32, which seems default on py27-win
maartenbreddels Apr 19, 2019
09f7672
fix: bug when dimension is 0
maartenbreddels Apr 20, 2019
62b40c4
travis debug
maartenbreddels Apr 20, 2019
8a89989
fix: stride was incorrect (assumed 8 bits per element)
maartenbreddels Apr 20, 2019
eeec86b
datetime64 support and more Aggregator types
maartenbreddels Apr 20, 2019
03f0331
fix: valgrind found array oob issue with strings
maartenbreddels Apr 20, 2019
bb00963
fix: mean for datetime
maartenbreddels Apr 20, 2019
1a68c46
tests: move to agg_tests.py
maartenbreddels Apr 23, 2019
0987f00
fix: Adding datetime, timedelta and bool support and testing
JovanVeljanoski Apr 22, 2019
526a75b
ci: requirements for numpy due to np.isnat and pyarrow in requirement…
maartenbreddels Apr 23, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,9 @@
[submodule "packages/vaex-core/vendor/string-view-lite"]
path = packages/vaex-core/vendor/string-view-lite
url = https://github.com/martinmoene/string-view-lite
[submodule "packages/vaex-core/vendor/hopscotch-map"]
path = packages/vaex-core/vendor/hopscotch-map
url = https://github.com/Tessil/hopscotch-map
[submodule "packages/vaex-core/vendor/flat_hash_map"]
path = packages/vaex-core/vendor/flat_hash_map
url = https://github.com/skarupke/flat_hash_map
5 changes: 1 addition & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,7 @@ before_install:
- source deactivate
- source activate test-environment
- which pip
- pip install pybind11
# - conda install -c conda-forge pandas kapteyn # these extra installs should disappear
# - conda create --name dev --clone test
# - pip install -r requirements.txt
- pip install -r requirements.txt
install:
- source activate test-environment
- (cd packages/vaex-core; pip install -v .)
Expand Down
4 changes: 2 additions & 2 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ install:
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda info -a
- "conda create -q -n test-environment -c conda-forge python=%PYTHON_VERSION% numpy scipy pyqt matplotlib pyopengl h5py numexpr astropy tornado cython pandas runipy cython pytest numba pyarrow>=0.12 graphviz python-graphviz pcre"
- "conda create -q -n test-environment -c conda-forge python=%PYTHON_VERSION% numpy scipy pyqt matplotlib pyopengl h5py numexpr astropy tornado cython pandas runipy cython pytest numba pyarrow graphviz python-graphviz pcre"
- activate test-environment
- pip install pybind11
- pip install "numpy>=1.13" "pyarrow>=0.12"
- pip install -r requirements.txt
- pushd packages\vaex-core && pip install . && popd
- pushd packages\vaex-hdf5 && pip install . && popd
Expand Down
38 changes: 32 additions & 6 deletions packages/vaex-core/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,17 @@ def __str__(self):
if platform.system().lower() == 'windows':
extra_compile_args = ["/EHsc"]
else:
# TODO: maybe enable these flags for non-wheel/conda builds? ["-mtune=native", "-march=native"]
extra_compile_args = ["-std=c++11", "-mfpmath=sse", "-O3", "-funroll-loops"]
extra_compile_args.append("-g")
if sys.platform == 'darwin':
extra_compile_args.append("-mmacosx-version-min=10.9")

# on windows (Conda-forge builds), the dirname is an absolute path
extension_vaexfast = Extension("vaex.vaexfast", [os.path.relpath(os.path.join(dirname, "src/vaexfast.cpp"))],
include_dirs=[get_numpy_include()],
extra_compile_args=extra_compile_args)
extension_strings = Extension("vaex.strings", [os.path.relpath(os.path.join(dirname, "src/strings.cpp"))],
extension_strings = Extension("vaex.superstrings", [os.path.relpath(os.path.join(dirname, "src/strings.cpp"))],
include_dirs=[
get_numpy_include(),
get_pybind_include(),
Expand All @@ -88,10 +90,34 @@ def __str__(self):
extra_compile_args=extra_compile_args,
libraries=['pcre', 'pcrecpp']
)
extension_superutils = Extension("vaex.superutils", [os.path.relpath(os.path.join(dirname, "src/superutils.cpp"))],
include_dirs=[get_numpy_include(), get_pybind_include(),
get_pybind_include(user=True)],
extra_compile_args=extra_compile_args)
extension_superutils = Extension("vaex.superutils", [
os.path.relpath(os.path.join(dirname, "src/hash_object.cpp")),
os.path.relpath(os.path.join(dirname, "src/hash_primitives.cpp")),
os.path.relpath(os.path.join(dirname, "src/superutils.cpp")),
os.path.relpath(os.path.join(dirname, "src/hash_string.cpp")),
],
include_dirs=[
get_numpy_include(), get_pybind_include(),
get_pybind_include(user=True),
'vendor/flat_hash_map',
'vendor/sparse-map/include',
'vendor/hopscotch-map/include',
'vendor/string-view-lite/include'
],
extra_compile_args=extra_compile_args)

extension_superagg = Extension("vaex.superagg", [
os.path.relpath(os.path.join(dirname, "src/superagg.cpp")),
],
include_dirs=[
get_numpy_include(), get_pybind_include(),
get_pybind_include(user=True),
'vendor/flat_hash_map',
'vendor/sparse-map/include',
'vendor/hopscotch-map/include',
'vendor/string-view-lite/include'
],
extra_compile_args=extra_compile_args)

setup(name=name + '-core',
version=version,
Expand All @@ -104,7 +130,7 @@ def __str__(self):
license=license,
package_data={'vaex': ['test/files/*.fits', 'test/files/*.vot', 'test/files/*.hdf5']},
packages=['vaex', 'vaex.core', 'vaex.file', 'vaex.test', 'vaex.ext', 'vaex.misc'],
ext_modules=[extension_vaexfast] if on_rtd else [extension_vaexfast, extension_strings, extension_superutils],
ext_modules=[extension_vaexfast] if on_rtd else [extension_vaexfast, extension_strings, extension_superutils, extension_superagg],
zip_safe=False,
entry_points={
'console_scripts': ['vaex = vaex.__main__:main'],
Expand Down
21 changes: 21 additions & 0 deletions packages/vaex-core/src/hash.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// #include "flat_hash_map.hpp"
// #include "unordered_map.hpp"
#include "tsl/hopscotch_set.h"
#include "tsl/hopscotch_map.h"

namespace vaex {

template<class Key, class Value, class Hash=std::hash<Key>, class Compare=std::equal_to<Key>>
// using hashmap = ska::flat_hash_map<Key, Value, Hash, Compare>;
using hashmap = tsl::hopscotch_map<Key, Value, Hash, Compare>;
// template<class Key, class Hash, class Compare>
// using hashset = tsl::hopscotch_set<Key, Hash, Compare>;

// we cannot modify .second, instead use .value()
// see https://github.com/Tessil/hopscotch-map
template<class I, class V>
inline void set_second(I& it, V &&value) {
it.value() = value;
}

}