You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the number of file based Dataset is growing, code duplications start to happen. The biggest area of duplication is the compression support. There are two types of compressions:
ZLIB/GZIP where you have a single compressed entry
ZIP where you have multiple entries inside (e.g, npz file is essentially a ZIP).
The compression topic itself could be complicated, like recursive compression. The goal of tensorflow-io though, is to support formats that are commonly used in machine learning community. So one level of compression is enough.
We should rework on Dataset to have a CompressedFileDataset like abstraction.
The text was updated successfully, but these errors were encountered:
Looks like https://github.com/libarchive/libarchive could be a decent choice for compression. Had some initial success with cifar dataset. Will create a PR soon for some initial checkin.
As the number of file based Dataset is growing, code duplications start to happen. The biggest area of duplication is the compression support. There are two types of compressions:
The compression topic itself could be complicated, like recursive compression. The goal of tensorflow-io though, is to support formats that are commonly used in machine learning community. So one level of compression is enough.
We should rework on Dataset to have a CompressedFileDataset like abstraction.
The text was updated successfully, but these errors were encountered: