From e506fa7852cfdca394720ff87faa4fca0da3938f Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Wed, 8 Jan 2025 15:50:34 -0800 Subject: [PATCH 1/4] add zarr-python3 release blog --- ...2025-01-09-zarr-python-v3-release.markdown | 115 ++++++++++++++++++ assets/images/zarr3-performance.png | Bin 0 -> 13723 bytes 2 files changed, 115 insertions(+) create mode 100644 _posts/2025-01-09-zarr-python-v3-release.markdown create mode 100644 assets/images/zarr3-performance.png diff --git a/_posts/2025-01-09-zarr-python-v3-release.markdown b/_posts/2025-01-09-zarr-python-v3-release.markdown new file mode 100644 index 0000000..76b8134 --- /dev/null +++ b/_posts/2025-01-09-zarr-python-v3-release.markdown @@ -0,0 +1,115 @@ +--- +layout: post +title: "Zarr-Python 3.0 is here!" +description: Zarr-Python 3.0 is here! This release brings support for Zarr's v3 specification and major performance improvements. +date: 2024-05-09 +categories: blog +permalink: /zarr-python-v3-release/ +--- + +After more than a year of development, we’re thrilled to announce the release of Zarr-Python 3.0! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enhancements, and a thoroughly modernized codebase. Whether you use Zarr to managing large multi-dimensional datasets in the cloud or for high-performance machine learning applications, Zarr-Python 3 has something for you. Let’s dive into some of the details of this release! + +Zarr-Python is available today on [PyPI](https://pypi.org/project/zarr/) and [Conda-Forge](https://anaconda.org/conda-forge/zarr). It is compatible with Python 3.11 and above. + +```bash +pip install --upgrade zarr +# or +conda install --channel conda-forge zarr +``` + +### Support for Zarr's v3 specification + +The most notable addition in Zarr-Python 3.0 is complete support for Zarr's [v3 specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). The v3 specification brought greater multi-language interoperability and new extension points for customizing Zarr (codecs, chunk grids, data types, and stores). + +Beyond supporting the core v3 specification, Zarr-Python 3.0 also includes support for the [chunk-sharding](https://zarr.dev/zeps/accepted/ZEP0002.html) extension. This feature allows for multiple chunks to be stored in a single file (or object), allowing users to utilize much smaller chunks without increasing the total number of objects in a dataset. Without chunk sharding, users optimizing for read-heavy applications had a difficult choice: either use a small chunk size, but create a huge number of stored objects, or use a large chunk size, but suffer poor IO for random reads into the data. With chunk-sharding, the number of stored objects is decoupled from the chunk size. Users can safely create very large Zarr arrays with very small chunks without generating a glut of stored objects. For more on how sharding works, see the [sharding documentation page](https://zarr.readthedocs.io/en/latest/user-guide/arrays.html#sharding). + +```python +import numpy as np +import zarr + +arr = zarr.create_array( + "data/example-1.zarr", + dtype="int32", + zarr_format=3, + shape=(1000, 1000), + shards=(100, 100), + chunks=(10, 10), +) +arr[:] = np.random.randint(0, 100, size=(1000, 1000)) +``` + +Note that Zarr-Python 3.0 maintains read and write support for data stored according to Zarr’s v2 specification. Some features (e.g. sharding) are not available for v2 data. Users can set `zarr_format=2` in the top level API to continue using Zarr v2’s specification. + +### Major performance improvements + +Zarr-Python 3.0 delivers significant performance improvements across the board. A large part of the refactor focused on making the library fully asynchronous, using Python’s [asyncio](https://docs.python.org/3/library/asyncio.html) library. The new asynchronous core enables efficient I/O operations and better utilization of system resources. This means that multiple I/O operations can be performed concurrently, leading to faster data access and reduced latency, especially when data is stored on high-latency storage backends (like cloud object storage). + +For compute bound operations (like compression), Zarr now dispatches to a managed thread pool. Combined with asynchronous IO, this threaded parallelization allows for Zarr to take full advantage of the compute resources available when reading and writing data. + +

+ zarr3perf +

Performance analysis of Zarr-Python 3 relative to Zarr-Python 2.18.4. Test wrote and read a 1GB array (shape=(512, 512, 512), chunks=(512, 512, 8), dtype='float64') to and from AWS S3 from a _m6i.4xlarge_ VM in the same region.
+

+ +While we've made significant strides in performance optimization in the 3.0 release, we've done little performance tuning and expect to share more optimizations in future releases. We are actively working on identifying and addressing performance bottlenecks to further enhance the library's speed and efficiency. + +### Built with extensions in mind + +Zarr-Python 3.0 is [designed to be highly extensible](https://zarr.readthedocs.io/en/latest/user-guide/extending.html). Key features include: + +- **New `Store` ABC:** A new abstract base class for defining custom storage backends, making it easier to integrate Zarr with various storage systems. This allows for seamless integration with cloud storage solutions, distributed file systems, and other data storage technologies. + + Zarr-Python 3.0 ships with support for the following stores: + + - `LocalStore` - for reading/writing to a [local file system](https://zarr-specs.readthedocs.io/en/latest/v3/stores/filesystem/v1.0.html) + - `FsspecStore` - for reading/writing to remote/cloud storage (based on [fsspec](https://filesystem-spec.readthedocs.io/)) + - `ZipStore` - for reading/writing to a ZipFile (experimental) + + Additional stores are also in development (like [Earthmover’s](https://earthmover.io/) [Icechunk](https://icechunk.io/icechunk-python/quickstart/) store). + +- **`Codec` and `CodecPipeline` Entrypoints:** Zarr-Python 3.0 provides [Python entry points](https://packaging.python.org/en/latest/specifications/entry-points/) for defining custom codecs and codec pipelines, enabling flexible data compression and encoding strategies. This empowers users to tailor data compression and encoding to specific use cases and optimize storage and performance. + + [Numcodecs](https://numcodecs.readthedocs.io/en/stable/zarr3.html) has been adapted to use Zarr’s `Codec` entrypoint system. And [`Zarrs-python`](https://zarrs-python.readthedocs.io/en/latest/) has already developed an experimental Rust-based `CodecPipeline`. + + +### Modernized Codebase + +The Zarr-Python 3.0 codebase has been significantly modernized: + +- **100% Type Hint Coverage:** Comprehensive type hints improve code readability, maintainability, and IDE support. This makes the code easier to understand, debug, and refactor, leading to higher code quality and reduced development time. +- **Cleanly Defined Public/Private API:** A clear distinction between public and private APIs enhances code organization and stability. This ensures that the public API remains stable and consistent, while allowing for flexibility and future development in the private API. +- **Improved Development Environment, CI/CD, and Testing:** A streamlined development workflow, robust CI/CD pipelines, and comprehensive testing ensure high-quality releases. This rigorous development process helps to identify and fix bugs early, leading to more reliable and robust software. + +### Migration from Zarr-Python 2 to 3 + +We have done everything possible to make the migration from Zarr-Python 2 to 3 as easy as possible. The [3.0 migration guide](https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html) provides details the parts of the Zarr-Python API that have changed and provides suggested actions for migration. Additionally, libraries such as [Xarray](https://xarray.dev/), [Dask](https://www.dask.org/), have already added support for Zarr-Python 3. + +### Conclusion + +Zarr-Python 3.0 marks the beginning of a new chapter for the Zarr project. We encourage you to try out this new version and provide feedback. We're also excited to see the development of new extensions built on top of this solid foundation, such as Icechunk and Zarrs-Python. + +The development of Zarr-Python 3 was a huge effort, spanning over 12 months and including contributions from over 30 contributors. Special thanks to [Davis Bennett](https://github.com/d-v-b) and [Norman Rzepka](https://github.com/normanrz) who helped me kick off the 3.0 refactor in Potsdam, Germany in December 2024. + +**Further reading** + +- [Zarr-Python 3 documentation](https://zarr.readthedocs.io/) +- [Zarr-Python 3 design doc](https://zarr.readthedocs.io/en/latest/developers/roadmap.html) +- [Zarr V3 Specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) + +~Joe Hamman + + diff --git a/assets/images/zarr3-performance.png b/assets/images/zarr3-performance.png new file mode 100644 index 0000000000000000000000000000000000000000..53ebf4bf2f92cded92cfe7a946f7239f3202a7e0 GIT binary patch literal 13723 zcmd^mXIPWlwr&a~fIB_a3M{)1f_S7Ub1K^kthgA5k(QD3rJA{ zM35jLph!oOAOcEDXdy_uGp=*a-RthXp0m$Aw?5}bo;>h<-<|*``@ok*;AyMCOFq{EgRl5sg&184^!4-!^mKFj z<@#0s05>1+;|gdc1tqy(Tmu7r12hyBz5eYE1s{JG#e4o%{$P|{zGp87Kp=cqSRd$f z-DhqPhzh6Ssb9@Pa^}avbL`9`cXkFdZH}d8=13loJB#3#sbpIuxweiJsZ)wtA6t1X z_6n42zhEReyyb2>8nPiHv~?NVJ5yW1MV{T#d-M39W9hy*wx}C%ttxOmrA#G}y+>*@ zPf&Cj2h=Yd$!O2XjPxIvT@U%{ca0Xw7M06v?NLq-iio!<772 zPRuk(L(^lsn^GhV&n1jog0FU^&2uS|(gzY8z@3L%=(6BGXH?W-!<2iwTgt&xDOdi3 z57F#*@7`^zsifrVMGiWfB3Y1WeJNto^@r_W!9R|I0(>U8`$r4nq`* z?g)hveT7^NR&RcMv#8+5#)f80Exe1EndSxd=d~l#ytI0eNhzM{r~w;l_t$%fODAwY ze~N6DCWd1%FlzHq0g1PSnZ=ForYhXk(DXvNXPzp`sY#efXA5-Xq`EZady%IIy!rf3O;soEK#7__lEFtjf!pcQ?hsKB{Ms;{dv7{lVW}>-RoY0X?*bZb< zj>Js9DJl`?0BK?;!yvXJ-`@6!ngND;SbL)Mqa@AT|$gQ zsqM#IgZuC*49bb=oT}$x;ayoOY7RmD=6utaZiBV)tMt0_#h!FnO-42!b3E}v2md?k ztsR64r`I0FBK89jE?^WuJNBaoc{+r_b->{+FlRQzP9T_X7HZl)$aX8@x_+q zWn>)@tNRY8RjT%f9HkuuMtF~knr%tpR7+P7Qn(mbbCi;)cHf$3msWC-R#JxHTvG&Hm9rN^Phl&FNY`;YpBv|CFXKz3U3s^7Tmz z6P^;=k7Ird!&0kqHr8`*fgO5(?V{Tl41g!Tsx4wmLkH3PrAGB5xs98P^P{u~`bWEo zQqAS!=hvpD?P4Sivrb46KV(fLsb2FKbeIuB;%DH@$PY>~xBmC0D56qwgMl^bHQjn#t zR<-iU>gQ3~0fn*ZsEL|k!holTljZCZuW{~>{l+Y(g62?o)I);5TX?`S!O7C#)4IOC zKHrobUeZv*mq##4A2ClXjwvUhJodakGflq4%1+LAq(Ws;iA5FhNeCpC9=0uS`rLzzm2+97;GeXPZ8oq1!Q~+uuwF|C< zMPd1491@Xa69!xHX=grbby-r#rQ{%Rwa{4ElxfEy!l%W0 z4F?BDa=FpXOS9js6uhMh<`Hmo7)T)hBS>Djox7KDro3M@Nt3 z!ysZ6mzVo^kyTuO2uW|MLLU%4oN!H0kk}z>5EsZpE#h^MIaLaM`OZDvYgB3{68Y8X zwP1Dk)YCx7Yr)`Ssx`5uZk>~%gU2p)W+*-+M0y!NJ!^XC=vTX#g7Uk~G<_oN$nT0L zaen%Uan(vP6&@t++p*hYwAiVl&ePdo_EjzAxb16?Es_^&(Ueq61uq}(T+6u`SB+>_ z*KcnKvza$HXiZK0bv5oArwv4d?g5yV@66g1&K2W|CJ3(Q(b&fmOsk0-LBXCO&Yc4f zTC!E~s8ciVdvhN=*vA!5CX;Q#!oz7@^|_hq(>VbG=04?@tEQ%k%U;inM-<(abX#C0 z=^YLcAw6|(Cj-Ql6Ld4`PEV-}%T~9~%&c#Ty%1n}BgVBVcln4zRG|+$tCU#R5&_=0 zjZTI}!#26B;1T;nkl*lVoviwZ6%1!dhF{`aEXzs!|?jRpUG zJS`t8_>1W7tn6%SY|By^rof`qy4rVjroVqQ5zYG4Btv_{6y~m^fA&^3X})E z_Nb_*TQ7q?!ABElue?0SLGRz5ef5ebImB~gJe(F0=yJu=CHZiF?hC_5~`f7tYl<$0@^Cz;y(=p4BfQrT3%9c zIWuym&2jlhTTPNl)Q|RM>#Swj=Vzw&XS|3E#P!znf&g+*S~{M<{b5Dr0&^_Z2S($_ zJw0$d;+F-uap7;;mt+KzI?r5xnH#>L-SV`4z~{4S_F(Tyb#S19w~VrO`1A+3n)j=X zfpOORc}>JyeA5VZst+72SM>0S$)yxb8r1aoxG4xEad(Rg356+v_cL5A{@CIpA9Yap zK`dS+*($g#!Jx)->;%DW`Q5#H?E|)IleHa}LKkws1YX}`YWLV^IyK$tYkcjfd{LJ~%(V#98)~nlF z$5r>VayE^4fOr~xMwf#HohJaCQ>Dri$DOrDYT-19LU+Xx)=9XVu&W78s*0&vD$@tq z1T6OF(J;iL|3QZ8YmKW#Hh(1gKN3a3oNyKQe9XZ!?#i#?K^?b?NP$JeR?Myo#vD@; z)2rLX^i5c1-0tp@nwL z5&27;FN&^Aoj{Yau}Lx0U#+rPgI*wy2Ma1SEv<6D8(7~e?_emsIGV`vslG=`yiK*x z=>2-%VC3DK;MFWAhsKEopJh^kbz0iPs}q|zxA%ucUfsA?gxMl?1=xXbo+4yfHm*iu zmLi)YiQV^Vuo7kvCCL|0#X59#xVdBZco>+ zfoUC3hlcF;m7^O`xli8EK8$Skuk07&g^yq*Dp#Y+%dMAL98Fe5nAQAv6xAklPy!4~%2?wk zgT2W=f%z+H#_gO$+~h(Y^q+ZfBoz-S@2g>1fr<%215yvFi7;XL#a z;%rf37=lL657veoBJxCs#4tIVcPhr-qG(yMDN|;^2CU<&8zWCtaW)Ch{$R*pQ)jV5 z=kyTc`>5f>of~L-mG48NF+GqT9GWbm7s5-2Qu(mFj-vEM)P!1b_X*KDwvYpK4XX1a zPFFsOK5Bnf-t?veVpl)b(fH+UBcnbkqq0uHCI96_lD$V8~g^g>%_X6;~{#1bn*k< zl2SGbKX!*r;BJsWBNuSVGl%KM40cDs%j7Vx$yr7CGsuo`h%{Y*`po&^TNCNHi@$Kc zrj_OFa8W-I-H1JUh~*;>mM&0fSZ)~=^6m(oIC`_T9j0bddv2Va=s-dl=pnjNlec)S z()EqN)UR%!lM|4QFoJyCMI1UEqK9aMH1QD{mIittO;G#27?IMs`eSh);1P0h^rxta z5c5x#ttg6hBsk#QG0`{CQDyFHe2m||oNo`i+jA4W9i0}byG!CTX9S$-@#uAf7?mAO z3YQ(+Jn?~#0A;T0l&Q!rEv(B!G}nAR(JwH(tkScH(*ZJqy;RI-xG5(XDIe#%4*;!V zh(uiaO$XwBVrq<1VHfR!-vi;8Px^@Ic%(5x;^l&YmtrBoUc+stHpZa+cdzQ~Kj5V1 z@P}LEt-67dcbc2z_qi^}MiJEg7He=&`~@+aJakzZeQ}wG$vcPXv8V}sVqRKp)Gbe@ z-fDS|S@P-y_v0H-gqKiz6XcWFgZ7rpvBClh%4NF@X z-&A)CuT2wm!s!z=V}j$5$q;)(RdEmL%Tiy>RN?Xxar#AY->Y);W0fE52mN_Kr1yzK z=I)L_FdfpyD9|(Q*M#iTe+Il0KQx9up>+A5>Zzm1Oj zV4^DJVNwR`mYBWS_{e?7BZfU~B1oQ=>)?bF=az`L=}BM7bJp#$J%AB;UwaI=X9Or8 zcqJn6a+?Y;9Thj&s0~&;9$6+S938IvBF^(f#RLWy-B`=eN1U#lbZ+AKe5Ph@E}%Uu zVy!QGH=Un)j><0)7w;@ZAYEVn{^a5O zMlW4=^~hV1qgeAahj7i{NijxKh{NCmqfp)Pd1PfZ)j0QyFV@qS@pUztRlC7cf_rGX zsilI!dxj6KGhR-y;IlF&G?9`40kx*{9u{J^__}o`$pC5@1Ok6=4mfa82Zm-%MSxqG zma>k97Iir-XikAsENe3XH9;Q88hz5iwYL`9iJc~)cEXRugJ}Alni0&@-Www3t4uGF zPVR9AoL}nBniOlVQi9cO;rrTiW$mi^#qxXHO+{9}=R)B_T6 z)x`aC+ng53XnUpGjP64lvGp%iC!sH3sOMM0>!s7qDOX?jTe+L`-M&vFJEIqB)ffn! znDiS7?%iw7pFhm!EoY}}E7U%jNN;UT&bxX( z$pzg2iSP%6Z$G0yEqCBJ2u`i2;JIl;5OilYQBAmr23UHGnV zS*J&9@hbZrUlK6+FN9;Jo7tgkqn7^teUpR1vvp>~DAAZG)LtG8ymvFlLNO}?*- zJE5fs#@l-?$B@-qu17&XbForRuyc!;@!{D$IzA3kJp)>0wkBU0!fch>R2ays^*v0h4`FS4k0+s-of-UN->RsN-t6nBa=xtB&p!Xy#gu-`7jUsGm=%7J5aKHNDUzk82|dAZi>cmNR7NQ!i3M z8P%oY{=R-OpRT&;?kS;3ywRhHtlOX4MjBAIQ)Xcp>e6JkqG zM5dEa#Zo=moK{U4%ic}`(|(g0QveI;InJBe4NAnZpLD=2P6s$-g4yU{#a5;dflMl}hKcKI4%*8Y7#RS${tPKz1@|q9#QfY7AoJ zqYD4i!EJB+yHFDyTOmE)iQiKhkqG98sGM;A%O*53Ie=&5{uA>~u#ghQ3#64TXe4vMA=qxGxyQ0 z9mVL!sq<*ko9}&5kAA}F>I!G)WtzfBMRi|lTkG_{a4SjQ*{O#?0mY5ILRF{grwwK- ze9vbki}ToiXFerOcIFvG-agFcqE6X*5H*B)fx3qp4cFz7&^kn_^Af5SN3yWAsaN2_ zkoC*~<`;?&Wr-SjC?Isa?5eEB594r46N9z92Nwl)+H`>4jNBmhVt3*rYE`I5o@0&8vtj@^lKZ80%xQ$hav+ZY9!+7>Zee5i%{r zJ(uPjUgDSCNk54|^ZH#Vc=AZg;?0W2whJTCa+hiIUg2)xqDVbNyx|7^>KgiORr62! z-@OO$*Y(ddP#8j@95q+t`-fSqFr%^;PTcR@K)@S$b4!#!wvlz4#Pj56-f#d@&Y2;` z>z(yQypLql85gKI9gH_I*`5L#ovkRV2#Qie`xFAT~^)$F@2IH#AF!( zwIJFk;HpiE|DQ@+7T-G*1h08DFr=X zj8&KRSmE=6JRnUkDL5D08Z&r#VWiPhnZp;j8-K)XZXcHo~f zdP*?*xGDLgM!Ea)199e*QhaE=fAKR8i?VME>a3s#p1lsbDR?yiQ(1J0g#nhbnZyCY z?ERYq1$JI04%!c`;pNsmUc{0m#DIVN z+SD7HA8P|7%Ssh=^e$V(Zg?A1yok`KTw4UB?qHvV2HrXanug>qp`)xA!W+@xKByuQ zCrZphHeW%LIQ?Z<)oajWKz@0b5o)HaZ<_Hy4;&_0`_v2L?+{jT{3 z^Ai)3@v*V7Z9F_YqAPedM(#z*YmLznX|-;OnrK~PV`FvbAdn&7(!=+D*rml2qU#GX z5FUfWa1N2V93NS<8a@3Ree>=i<(aNObD@mL-={tkzU-~RC;v8L@mh;!?fxcQFLKuY zbNkQ`@42V)Jd0~a$4x;~ijhN0JOwq;m8KYdK+lSMQ;qIU9pxPQ8cCE^IqHSK)%vCK zu4w5rIjYC^d)caRZP!$_a^$?|gQRROqaN~!72<81g~lLt79{%z{O7d}Fl+cy$J6<) zuH{>HqDHKF&l!O!4ZUHcm?zghkiLmTl=!^`8o5^G)!FWGqilnk?xkvFqkj(bsZh1^ zlK)Uh^R^oe4c%7y#d7(S;la~(H-HhB29>lyKA@%W_|+X4;?Uw(e^$A*wfU2EYXHg_ z_h(2BSINQIkM?{eP@Vml&zTrIVBsQu9{wl{t{B_@2j}#o{~LzY6qMigpXv=_m3@D6 z`v2M3@&ENhK-9S?D-oy5Nk*(d&&Slq^uTsRFd{uws=&p}qoa4?iJ3%Qd(g~6dsi#u z#!xZsq1DHT62oJ3*&F#b;q;p#K<6y<{#%6=Fq=T;qJBk96cn-Y{sUsBMBK6yPIF1- zLQ0?2-z!M`qF1_#ElPiAQ}1ed6x}T)>W2#E{peE~4>qMg@I9pT&=d7G;Z^C`l)f8b z>47!!->LT#X$=8Eue2KFX<^sCNOz|HjGrNN#z(?TVOr>e*1Y5e{hnq>iL^3HNzgsG zk<^5y>4CC1uQ(v8lUY;hr4~8yWxi#N>~}G-*ubE(w12dzj!772myfp#?Z3or*J%;B z0&-I>7)TSn>CVL?3`mF`D47jFGIiD10_&f`>u;AY&HMx5mGY;!Vw_v@KgN~SuXa5b zLQS6g@UO)gh=c2yc&Q0uFydheh`zEr)Hu)3=Y;_ zR~;RkbA1OiWh;&d&N^(SNr(+}_xrl%4M+dB)_qOmSALeRXh#eQ&8f+4F`jeJ%?Lh` z)%^D{$Kz#q^KsgNFH<~+a~JbZ$1fbhhnbUC-}i}7__zXmdHStDG(hU@VeV046*}sl z@IF9|$#k;|za%G+aeK~HdyNsEa6Y({5^@nQ?s4ir7JFE;S7BYmVK|-Ubj^j%`kVtE zFE_C6<$29Mh$}kuf2)sNuLtt#45&MS6e%`b88l5V=sINMvTmJKFhAOs#ie9TMw^b9 zkAM`bJu)_wqOXyi^(WO=h@N|~jUd%m%Rvx-P6Az22}tW0A6O5YJqj>luYl(RNgk*z zL|v#9d>R6J!1#-j$AmDRaY5W5vM9qr&zFNm=+@VLZN_Y6nf0dN2{SwxhlH{OJucv{ z8Jkay6i`e^s&M>-q5st)$v>%z{<*P}6@Jc3#65`#hD$-(ptY?id}r-X%{?iBtBLq2 z#u74^y6Qe!CC`fP%GT-xIa1w064Tnzi`@FJjWj@hpZ>!HmiqBO<#it+ulF=h*8?b& zdX%c-hfklXaQ?H+W!*JP5@VvvEJLEp3|_AJOMV&t**8otF;*Ly`Gs zM*r2d?vDpkSkBbNy)UWt>LE?*o`$X5@m{-GHSg>$54Rs9ni_&?`?GQz0={8|Yd=TM z`jS^&gN-`zo)0O#wsHc3Il9WqD~`Zh@2$oCm;#Gc8QD5n?ZOzGDOU>`NruafBjy}@Cs78~F*3E!m^|Sv)(26UZK6NrP6>E(WF)N zWmAy0!>RZGm-X@fm&(~}(7=JtB1^`}C^E>P>&?KhI zC$9j|LnQoxOUo)-X6^Z_i>}s$)sLc+i;EoVaJOIN<&BIUhg?0|_O437kTVh49H57o1iTq&eLR;rZaESy z4kcf#-jbun#(?59KlT)>;9OOR`yht-EPx@%(yS%Bl975)*Ch19yLth`s!n=wf`A_L z5OtcEiope2G9tx8_S4_9%85t^Xgbsfx&ZJD0m3})2=}!`jGIzVph**Yh-dH&GleSY zIEWGoV|&?+=If|J_jiQSWvO$h3F2H-TMDD?ZIcUCggQY)6NU8=P3+yAxtusy2Ap;S z?Omb~tU~9ao_%!cIzAL==`HylIh;AW0%5uw+gGnQ$r^67T;X#MRD+CS1>ayAG^4E= z%AF2MnlRNba@g|`Ty9-g`?-GR@SjJz($@vd1ARf0!UfqkQ?aj zu2XP{xNB4mYAx}6jJPGEL68YFgfEHHvVaadxPi4cP2Syf?oAK2sU%`c&rzQ4qow0H z9#1r;HoZ&Gf!LyBXB+m_357f_l5*Y6z#6Sa@B55mB-vhRLh=xI+l9GHkq zE}W~EAk{U@$qF|^V)L^75ecr(bIgEHM$%T(YMx&cYB4Y4Bw|V0d*Q4hYXWy2zL51`W3yM z*!{@DP{oO9-jak)jzbQllzyd4R*rMtSw%GQhH#mP(CVA6Ti8#VjO<*hydp3Q*dei# z^(j+UxW$ji<{%;(Ol)YuvnvS=aj5Xkx&!q-&Fyrw}*+mtVmPDcFJt1rQU`DZG&x??LPOr zbp6-cCvRmaZ-C`U+Hh5*dg{&XCvwK@sqD9YUc#jBq?GR8)~`Id@$8avFnXB(mz}JV zwY8(v(y9&7vm@g1{-H Date: Wed, 8 Jan 2025 15:51:03 -0800 Subject: [PATCH 2/4] update date --- _posts/2025-01-09-zarr-python-v3-release.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-01-09-zarr-python-v3-release.markdown b/_posts/2025-01-09-zarr-python-v3-release.markdown index 76b8134..3c60477 100644 --- a/_posts/2025-01-09-zarr-python-v3-release.markdown +++ b/_posts/2025-01-09-zarr-python-v3-release.markdown @@ -2,7 +2,7 @@ layout: post title: "Zarr-Python 3.0 is here!" description: Zarr-Python 3.0 is here! This release brings support for Zarr's v3 specification and major performance improvements. -date: 2024-05-09 +date: 2025-01-09 categories: blog permalink: /zarr-python-v3-release/ --- From f3bdb0514618635ad2773beeaaa56187e80381ad Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Wed, 8 Jan 2025 20:26:49 -0800 Subject: [PATCH 3/4] blog updates --- .../2025-01-09-zarr-python-3-release.markdown | 116 ++++++++++++++++++ ...2025-01-09-zarr-python-v3-release.markdown | 115 ----------------- 2 files changed, 116 insertions(+), 115 deletions(-) create mode 100644 _posts/2025-01-09-zarr-python-3-release.markdown delete mode 100644 _posts/2025-01-09-zarr-python-v3-release.markdown diff --git a/_posts/2025-01-09-zarr-python-3-release.markdown b/_posts/2025-01-09-zarr-python-3-release.markdown new file mode 100644 index 0000000..3b6ebcb --- /dev/null +++ b/_posts/2025-01-09-zarr-python-3-release.markdown @@ -0,0 +1,116 @@ +--- +layout: post +title: 'Zarr-Python 3 is here!' +description: Zarr-Python 3 is here! This release brings support for Zarr's v3 specification, new extensions, and major +date: 2025-01-09 +categories: blog +permalink: /zarr-python-3-release/ +--- + +After more than a year of development, we’re thrilled to announce the release of [Zarr-Python 3](https://zarr.readthedocs.io/en/v3.0.0/)! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enhancements, and a thoroughly modernized codebase. Whether you use Zarr to managing large multi-dimensional datasets in the cloud or for high-performance machine learning applications, we've built Zarr-Python 3 to help you. Let’s dive into some of the details of this release! + +Zarr-Python 3 is available today on [PyPI](https://pypi.org/project/zarr/) and [Conda-Forge](https://anaconda.org/conda-forge/zarr). It is compatible with Python 3.11 and above. + +```bash +pip install --upgrade zarr +# or +conda install --channel conda-forge zarr +``` + +### Support for Zarr's v3 specification + +The most notable addition in Zarr-Python 3 is complete support for Zarr's [v3 specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). The v3 specification brought greater multi-language interoperability and new extension points for customizing Zarr (codecs, chunk grids, data types, and stores). + +Beyond supporting the core v3 specification, Zarr-Python 3 also includes support for the [chunk-sharding](https://zarr.dev/zeps/accepted/ZEP0002.html) extension. This feature allows for multiple chunks to be stored in a single file (or object), allowing users to utilize much smaller chunks without increasing the total number of objects in a dataset. Without chunk sharding, users optimizing for read-heavy applications had a difficult choice: either use a small chunk size, but create a huge number of stored objects, or use a large chunk size, but suffer poor IO for random reads into the data. With chunk-sharding, the number of stored objects is decoupled from the chunk size. Users can safely create very large Zarr arrays with very small chunks without generating a glut of stored objects. For more on how sharding works, see Zarr-Python's [sharding documentation page](https://zarr.readthedocs.io/en/latest/user-guide/arrays.html#sharding). + +The code block below show's off Zarr-Python's new API for creating sharded arrays: + +```python +import numpy as np +import zarr + +arr = zarr.create_array( + "data/example-1.zarr", + dtype="int32", + zarr_format=3, + shape=(1000, 1000), + shards=(100, 100), + chunks=(10, 10), +) +arr[:] = np.random.randint(0, 100, size=(1000, 1000)) +``` + +Note that Zarr-Python 3 maintains read and write support for data stored according to Zarr’s v2 specification. Some features (e.g. sharding) are not available for v2 data. Users can set `zarr_format=2` in the top level API to continue using Zarr v2’s specification. + +### Major performance improvements + +Zarr-Python 3 delivers significant performance improvements across the board. A large part of the refactor focused on making the core of the library fully asynchronous, using Python’s [asyncio](https://docs.python.org/3/library/asyncio.html) library. The new asynchronous core enables efficient I/O operations and better utilization of system resources. This means that multiple I/O operations can be performed concurrently, leading to faster data access and reduced latency, especially when data is stored on high-latency storage backends (like cloud object storage). + +For compute bound operations (like compression/decompression), Zarr now dispatches to a managed thread pool. Combined with asynchronous IO, this threaded parallelization allows for Zarr to take full advantage of the compute resources available when reading and writing data. + +

+ zarr3perf +

Performance analysis of Zarr-Python 3 relative to Zarr-Python 2.18.4. Test wrote and read a 1GB array (shape=(512, 512, 512), chunks=(512, 512, 8), dtype='float64') to and from AWS S3 from a _m6i.4xlarge_ VM in the same region.
+

+ +While early benchmark results appear to show very promising performance results relative to prior versions of Zarr-Python, we have yet to do dedicated performance tuning. Users should expect further performance improvements as Zarr-Python 3 matures. In fact, we are already working on identifying and addressing a number of known performance bottlenecks to further enhance the library's speed and efficiency. + +### Built with extensions in mind + +Zarr-Python 3 is [designed to be highly extensible](https://zarr.readthedocs.io/en/latest/user-guide/extending.html). Key features include: + +- **New `Store` ABC:** A new abstract base class for defining custom storage backends, making it easier to integrate Zarr with various storage systems. This allows for seamless integration with cloud storage solutions, distributed file systems, and other data storage technologies. + + Zarr-Python 3 includes for the following stores: + + - `LocalStore` - for reading/writing to a [local file system](https://zarr-specs.readthedocs.io/en/latest/v3/stores/filesystem/v1.0.html) + - `FsspecStore` - for reading/writing to remote/cloud storage (based on [fsspec](https://filesystem-spec.readthedocs.io/)) + - `ZipStore` - for reading/writing to a ZipFile (experimental) + + Additional stores are also in development (like [Earthmover’s](https://earthmover.io/) [Icechunk](https://icechunk.io/icechunk-python/quickstart/) store). + +- **`Codec` and `CodecPipeline` Entrypoints:** Zarr-Python 3 provides [Python entry points](https://packaging.python.org/en/latest/specifications/entry-points/) for defining custom codecs and codec pipelines, enabling flexible data compression and encoding strategies. This empowers developers to tailor data compression and encoding to specific use cases and optimize storage and performance. + + [Numcodecs](https://numcodecs.readthedocs.io/en/stable/zarr3.html) has been adapted to use Zarr’s `Codec` entrypoint system and [`Zarrs-python`](https://zarrs-python.readthedocs.io/en/latest/) has already developed an experimental Rust-based `CodecPipeline`. + +### Modernized Codebase + +The Zarr-Python 3 codebase has been significantly modernized: + +- **100% Type Hint Coverage:** Comprehensive type hints improve code readability, maintainability, and IDE support. This makes the code easier to understand, debug, and refactor, leading to higher code quality and reduced development time. +- **Cleanly Defined Public/Private API:** A clear distinction between public and private APIs enhances code organization and stability. This ensures that the public API remains stable and consistent, while allowing for flexibility and future development in the private API. +- **Improved Development Environment, CI/CD, and Testing:** A streamlined development workflow, robust CI/CD pipelines, and comprehensive testing ensure high-quality releases. This rigorous development process helps to identify and fix bugs early, leading to more reliable and robust software. + +### Migration from Zarr-Python 2 to 3 + +We have done everything possible to make the migration from Zarr-Python 2 to 3 as easy as possible. The [3.0 migration guide](https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html) provides details the parts of the Zarr-Python API that have changed and provides suggested actions for migration. Additionally, libraries such as [Xarray](https://xarray.dev/), [Dask](https://www.dask.org/), have already added support for Zarr-Python 3. + +### Conclusion + +Zarr-Python 3.0.0 marks the beginning of a new chapter for the Zarr-Python project. We encourage you to try out this new version and provide feedback. We're also excited to see the development of new extensions built on top of this solid foundation, such as [Icechunk](https://icechunk.io), [Zarrs-Python](https://zarrs-python.readthedocs.io), and [VirtualiZarr](https://virtualizarr.readthedocs.io). + +The development of Zarr-Python 3 was a huge effort, spanning over 12 months and including contributions from over 30 contributors. Special thanks to [Davis Bennett](https://github.com/d-v-b) and [Norman Rzepka](https://github.com/normanrz) who helped me kick off the initial refactor in Potsdam, Germany in December 2024. + +**Further reading** + +- [Zarr-Python 3 documentation](https://zarr.readthedocs.io/) +- [Zarr-Python 3 design doc](https://zarr.readthedocs.io/en/latest/developers/roadmap.html) +- [Zarr v3 Specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) + +~Joe Hamman + + diff --git a/_posts/2025-01-09-zarr-python-v3-release.markdown b/_posts/2025-01-09-zarr-python-v3-release.markdown deleted file mode 100644 index 3c60477..0000000 --- a/_posts/2025-01-09-zarr-python-v3-release.markdown +++ /dev/null @@ -1,115 +0,0 @@ ---- -layout: post -title: "Zarr-Python 3.0 is here!" -description: Zarr-Python 3.0 is here! This release brings support for Zarr's v3 specification and major performance improvements. -date: 2025-01-09 -categories: blog -permalink: /zarr-python-v3-release/ ---- - -After more than a year of development, we’re thrilled to announce the release of Zarr-Python 3.0! This major release brings full support for the Zarr v3 specification, including the new chunk-sharding extension, major performance enhancements, and a thoroughly modernized codebase. Whether you use Zarr to managing large multi-dimensional datasets in the cloud or for high-performance machine learning applications, Zarr-Python 3 has something for you. Let’s dive into some of the details of this release! - -Zarr-Python is available today on [PyPI](https://pypi.org/project/zarr/) and [Conda-Forge](https://anaconda.org/conda-forge/zarr). It is compatible with Python 3.11 and above. - -```bash -pip install --upgrade zarr -# or -conda install --channel conda-forge zarr -``` - -### Support for Zarr's v3 specification - -The most notable addition in Zarr-Python 3.0 is complete support for Zarr's [v3 specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html). The v3 specification brought greater multi-language interoperability and new extension points for customizing Zarr (codecs, chunk grids, data types, and stores). - -Beyond supporting the core v3 specification, Zarr-Python 3.0 also includes support for the [chunk-sharding](https://zarr.dev/zeps/accepted/ZEP0002.html) extension. This feature allows for multiple chunks to be stored in a single file (or object), allowing users to utilize much smaller chunks without increasing the total number of objects in a dataset. Without chunk sharding, users optimizing for read-heavy applications had a difficult choice: either use a small chunk size, but create a huge number of stored objects, or use a large chunk size, but suffer poor IO for random reads into the data. With chunk-sharding, the number of stored objects is decoupled from the chunk size. Users can safely create very large Zarr arrays with very small chunks without generating a glut of stored objects. For more on how sharding works, see the [sharding documentation page](https://zarr.readthedocs.io/en/latest/user-guide/arrays.html#sharding). - -```python -import numpy as np -import zarr - -arr = zarr.create_array( - "data/example-1.zarr", - dtype="int32", - zarr_format=3, - shape=(1000, 1000), - shards=(100, 100), - chunks=(10, 10), -) -arr[:] = np.random.randint(0, 100, size=(1000, 1000)) -``` - -Note that Zarr-Python 3.0 maintains read and write support for data stored according to Zarr’s v2 specification. Some features (e.g. sharding) are not available for v2 data. Users can set `zarr_format=2` in the top level API to continue using Zarr v2’s specification. - -### Major performance improvements - -Zarr-Python 3.0 delivers significant performance improvements across the board. A large part of the refactor focused on making the library fully asynchronous, using Python’s [asyncio](https://docs.python.org/3/library/asyncio.html) library. The new asynchronous core enables efficient I/O operations and better utilization of system resources. This means that multiple I/O operations can be performed concurrently, leading to faster data access and reduced latency, especially when data is stored on high-latency storage backends (like cloud object storage). - -For compute bound operations (like compression), Zarr now dispatches to a managed thread pool. Combined with asynchronous IO, this threaded parallelization allows for Zarr to take full advantage of the compute resources available when reading and writing data. - -

- zarr3perf -

Performance analysis of Zarr-Python 3 relative to Zarr-Python 2.18.4. Test wrote and read a 1GB array (shape=(512, 512, 512), chunks=(512, 512, 8), dtype='float64') to and from AWS S3 from a _m6i.4xlarge_ VM in the same region.
-

- -While we've made significant strides in performance optimization in the 3.0 release, we've done little performance tuning and expect to share more optimizations in future releases. We are actively working on identifying and addressing performance bottlenecks to further enhance the library's speed and efficiency. - -### Built with extensions in mind - -Zarr-Python 3.0 is [designed to be highly extensible](https://zarr.readthedocs.io/en/latest/user-guide/extending.html). Key features include: - -- **New `Store` ABC:** A new abstract base class for defining custom storage backends, making it easier to integrate Zarr with various storage systems. This allows for seamless integration with cloud storage solutions, distributed file systems, and other data storage technologies. - - Zarr-Python 3.0 ships with support for the following stores: - - - `LocalStore` - for reading/writing to a [local file system](https://zarr-specs.readthedocs.io/en/latest/v3/stores/filesystem/v1.0.html) - - `FsspecStore` - for reading/writing to remote/cloud storage (based on [fsspec](https://filesystem-spec.readthedocs.io/)) - - `ZipStore` - for reading/writing to a ZipFile (experimental) - - Additional stores are also in development (like [Earthmover’s](https://earthmover.io/) [Icechunk](https://icechunk.io/icechunk-python/quickstart/) store). - -- **`Codec` and `CodecPipeline` Entrypoints:** Zarr-Python 3.0 provides [Python entry points](https://packaging.python.org/en/latest/specifications/entry-points/) for defining custom codecs and codec pipelines, enabling flexible data compression and encoding strategies. This empowers users to tailor data compression and encoding to specific use cases and optimize storage and performance. - - [Numcodecs](https://numcodecs.readthedocs.io/en/stable/zarr3.html) has been adapted to use Zarr’s `Codec` entrypoint system. And [`Zarrs-python`](https://zarrs-python.readthedocs.io/en/latest/) has already developed an experimental Rust-based `CodecPipeline`. - - -### Modernized Codebase - -The Zarr-Python 3.0 codebase has been significantly modernized: - -- **100% Type Hint Coverage:** Comprehensive type hints improve code readability, maintainability, and IDE support. This makes the code easier to understand, debug, and refactor, leading to higher code quality and reduced development time. -- **Cleanly Defined Public/Private API:** A clear distinction between public and private APIs enhances code organization and stability. This ensures that the public API remains stable and consistent, while allowing for flexibility and future development in the private API. -- **Improved Development Environment, CI/CD, and Testing:** A streamlined development workflow, robust CI/CD pipelines, and comprehensive testing ensure high-quality releases. This rigorous development process helps to identify and fix bugs early, leading to more reliable and robust software. - -### Migration from Zarr-Python 2 to 3 - -We have done everything possible to make the migration from Zarr-Python 2 to 3 as easy as possible. The [3.0 migration guide](https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html) provides details the parts of the Zarr-Python API that have changed and provides suggested actions for migration. Additionally, libraries such as [Xarray](https://xarray.dev/), [Dask](https://www.dask.org/), have already added support for Zarr-Python 3. - -### Conclusion - -Zarr-Python 3.0 marks the beginning of a new chapter for the Zarr project. We encourage you to try out this new version and provide feedback. We're also excited to see the development of new extensions built on top of this solid foundation, such as Icechunk and Zarrs-Python. - -The development of Zarr-Python 3 was a huge effort, spanning over 12 months and including contributions from over 30 contributors. Special thanks to [Davis Bennett](https://github.com/d-v-b) and [Norman Rzepka](https://github.com/normanrz) who helped me kick off the 3.0 refactor in Potsdam, Germany in December 2024. - -**Further reading** - -- [Zarr-Python 3 documentation](https://zarr.readthedocs.io/) -- [Zarr-Python 3 design doc](https://zarr.readthedocs.io/en/latest/developers/roadmap.html) -- [Zarr V3 Specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) - -~Joe Hamman - - From 61884cf1fcd1e5f928306bcc963eb1efcb9d3e38 Mon Sep 17 00:00:00 2001 From: Joseph Hamman Date: Thu, 9 Jan 2025 05:57:41 -0800 Subject: [PATCH 4/4] fixup --- _posts/2025-01-09-zarr-python-3-release.markdown | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2025-01-09-zarr-python-3-release.markdown b/_posts/2025-01-09-zarr-python-3-release.markdown index 3b6ebcb..157484f 100644 --- a/_posts/2025-01-09-zarr-python-3-release.markdown +++ b/_posts/2025-01-09-zarr-python-3-release.markdown @@ -89,13 +89,13 @@ We have done everything possible to make the migration from Zarr-Python 2 to 3 a Zarr-Python 3.0.0 marks the beginning of a new chapter for the Zarr-Python project. We encourage you to try out this new version and provide feedback. We're also excited to see the development of new extensions built on top of this solid foundation, such as [Icechunk](https://icechunk.io), [Zarrs-Python](https://zarrs-python.readthedocs.io), and [VirtualiZarr](https://virtualizarr.readthedocs.io). -The development of Zarr-Python 3 was a huge effort, spanning over 12 months and including contributions from over 30 contributors. Special thanks to [Davis Bennett](https://github.com/d-v-b) and [Norman Rzepka](https://github.com/normanrz) who helped me kick off the initial refactor in Potsdam, Germany in December 2024. +The development of Zarr-Python 3 was a huge effort, spanning over 12 months and including contributions from over 30 contributors. Special thanks to [Davis Bennett](https://github.com/d-v-b) and [Norman Rzepka](https://github.com/normanrz) who helped me kick off the initial refactor in Potsdam, Germany in December 2023. **Further reading** - [Zarr-Python 3 documentation](https://zarr.readthedocs.io/) - [Zarr-Python 3 design doc](https://zarr.readthedocs.io/en/latest/developers/roadmap.html) -- [Zarr v3 Specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) +- [Zarr v3 specification](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) ~Joe Hamman