Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to unbundle blosc, zstd, lz4 #264

Closed
bnavigator opened this issue Dec 28, 2020 · 7 comments
Closed

How to unbundle blosc, zstd, lz4 #264

bnavigator opened this issue Dec 28, 2020 · 7 comments

Comments

@bnavigator
Copy link
Contributor

bnavigator commented Dec 28, 2020

From a distribution point of view, it is undesirable to bundle outdated standard system libraries into packages. Here is a quick guide how to unbundle blosc, zstd and lz4 from numcodecs but use the system libraries instead:

  1. delete c-blosc subdirectory
  2. Patch setup.py to link the extensions against the libraries:
Index: numcodecs-0.7.2/setup.py
===================================================================
--- numcodecs-0.7.2.orig/setup.py
+++ numcodecs-0.7.2/setup.py
@@ -112,6 +112,7 @@ def blosc_extension():
         Extension('numcodecs.blosc',
                   sources=sources + blosc_sources,
                   include_dirs=include_dirs,
+                  libraries=[] if blosc_sources else ['blosc'],
                   define_macros=define_macros,
                   extra_compile_args=extra_compile_args,
                   ),
@@ -152,6 +153,7 @@ def zstd_extension():
         Extension('numcodecs.zstd',
                   sources=sources + zstd_sources,
                   include_dirs=include_dirs,
+                  libraries=[] if zstd_sources else ['zstd'],
                   define_macros=define_macros,
                   extra_compile_args=extra_compile_args,
                   ),
@@ -185,6 +187,7 @@ def lz4_extension():
         Extension('numcodecs.lz4',
                   sources=sources + lz4_sources,
                   include_dirs=include_dirs,
+                  libraries=[] if lz4_sources else ['lz4'],
                   define_macros=define_macros,
                   extra_compile_args=extra_compile_args,
                   ),

The modification to setup.py would need some more work, but maybe you want to consider adding the possibility to select the building against system libraries through some environment variable, e.g. USE_SYSTEM_LIBS=1.

This also resolves the problems reported in #215

@jakirkham
Copy link
Member

Yeah definitely understand the value of unbundling. However this was impractical until recently as there were not Blosc wheels. As that has now changed ( #262 ), agree it makes sense to use external dependencies.

To solve this would actually go about this a bit differently. In particular would change codecs to use 3rd party libraries in the code and drop the Blosc submodule.

@jakirkham
Copy link
Member

Just to add PR ( #274 ) started down this path

@hmaarrfk
Copy link

Hmmmm i just got bit by this since I was trying to compile with snappy support. I guess there is some more work to do on the unbundling front.

@bnavigator
Copy link
Contributor Author

bnavigator commented Jan 12, 2023

If the system blosc has snappy as available codec:

--- numcodecs-0.11.0.orig/numcodecs/tests/test_blosc.py
+++ numcodecs-0.11.0/numcodecs/tests/test_blosc.py
@@ -155,10 +155,11 @@ def test_compress_complib(use_threads):
     }
     blosc.use_threads = use_threads
     for cname in blosc.list_compressors():
-        enc = blosc.compress(arr, cname.encode(), 1, Blosc.NOSHUFFLE)
-        complib = blosc.cbuffer_complib(enc)
-        expected_complib = expected_complibs[cname]
-        assert complib == expected_complib
+        if cname in expected_complibs:
+            enc = blosc.compress(arr, cname.encode(), 1, Blosc.NOSHUFFLE)
+            complib = blosc.cbuffer_complib(enc)
+            expected_complib = expected_complibs[cname]
+            assert complib == expected_complib
     with pytest.raises(ValueError):
         # capitalized cname
         blosc.compress(arr, b'LZ4', 1)

@martindurant
Copy link
Member

Note that the package cramjam offers a very nice simple and fast interface to several byte compression algorithms in a single static compiled extension lib (zstd, lz4, snappy, ...). It is the only compression package used by fastparquet. cramjam is not blosc, but we may be able to simplify a lot of things with it.

@jakirkham
Copy link
Member

Think we have had this discussion before ( #314 (comment) ). This is really not pressing since we started building Conda & wheel binaries

Think there are better ways we can spend our time in Zarr development. Though don't want to discourage others if they really want to dig in. PRs welcome

@bnavigator
Copy link
Contributor Author

This one is superseded by #569. The old diffs here no longer apply to numcodecs 0.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants