Skip to content

Commit

Permalink
Adding "G5repack" (#21)
Browse files Browse the repository at this point in the history
  • Loading branch information
tdegeus committed Apr 5, 2020
1 parent 8feb764 commit 19b7135
Show file tree
Hide file tree
Showing 7 changed files with 157 additions and 59 deletions.
1 change: 1 addition & 0 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,5 @@ build_script:
- python test/cli/G5compare.py
- python test/cli/G5list.py
- python test/cli/G5repair.py
- python test/cli/G5repack.py

1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ script:
- python test/cli/G5list.py
- python test/cli/G5print.py
- python test/cli/G5repair.py
- python test/cli/G5repack.py

2 changes: 1 addition & 1 deletion GooseHDF5/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
warnings.filterwarnings("ignore")


__version__ = '0.5.1'
__version__ = '0.6.0'


def abspath(path):
Expand Down
47 changes: 47 additions & 0 deletions GooseHDF5/cli/G5repack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
'''G5repack
Read and write a HDF5 file, to write it more efficiently by removing features like
extendible datasets.
Usage:
G5repack [options] <source>...
Arguments:
<source> HDF5-file.
Options:
-h, --help Show help.
--version Show version.
(c - MIT) T.W.J. de Geus | tom@geus.me | www.geus.me | github.com/tdegeus/GooseHDF5
'''

from .. import getpaths
from .. import __version__
import docopt
import h5py
import os
import tempfile
import warnings
warnings.filterwarnings("ignore")

def check_isfile(fname):
if not os.path.isfile(fname):
raise IOError('"{0:s}" does not exist'.format(fname))

def main():

args = docopt.docopt(__doc__, version=__version__)
tempname = next(tempfile._get_candidate_names())

for filename in args['<source>']:

print(filename)

check_isfile(filename)

with h5py.File(filename, 'r') as source:
with h5py.File(tempname, 'w') as tmp:
for path in getpaths(source):
tmp[path] = source[path][...]

os.replace(tempname, filename)
140 changes: 82 additions & 58 deletions docs/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,20 @@ G5check

.. code-block:: none
G5check
Try reading datasets. In case of reading failure the path is printed (otherwise nothing is
printed).
G5check
Try reading datasets. In case of reading failure the path is printed (otherwise nothing is
printed).
Usage:
G5check <source> [options]
Usage:
G5check <source> [options]
Arguments:
<source> HDF5-file.
Arguments:
<source> HDF5-file.
Options:
-b, --basic Only try getting a list of datasets, skip trying to read them.
-h, --help Show help.
--version Show version.
Options:
-b, --basic Only try getting a list of datasets, skip trying to read them.
-h, --help Show help.
--version Show version.
G5list
------
Expand All @@ -34,20 +34,23 @@ G5list

.. code-block:: none
G5list
List datasets (or groups of datasets) in a HDF5-file.
G5list
List datasets (or groups of datasets) in a HDF5-file.
Usage:
G5list [options] [--fold ARG]... <source>
Usage:
G5list [options] [--fold ARG]... <source>
Arguments:
<source> HDF5-file.
Arguments:
<source> HDF5-file.
Options:
-f, --fold=ARG Fold paths.
-d, --max-depth=ARG Maximum depth to display.
-h, --help Show help.
--version Show version.
Options:
-f, --fold=ARG Fold paths.
-d, --max-depth=ARG Maximum depth to display.
-r, --root=ARG Start a certain point in the path-tree. [default: /]
-i, --info Print information: shape, dtype.
-l, --long As above but will all attributes.
-h, --help Show help.
--version Show version.
G5print
------
Expand All @@ -56,21 +59,21 @@ G5print

.. code-block:: none
G5print
Print datasets in a HDF5-file.
G5print
Print datasets in a HDF5-file.
Usage:
G5print [options] <source> <dataset>...
Usage:
G5print [options] <source> <dataset>...
Arguments:
<source> HDF5-file.
<dataset> Path to the dataset.
Arguments:
<source> HDF5-file.
<dataset> Path to the dataset.
Options:
-r, --regex Evaluate dataset name as a regular expression.
--info Print information: shape, dtype.
-h, --help Show help.
--version Show version.
Options:
-r, --regex Evaluate dataset name as a regular expression.
-i, --info Print information: shape, dtype.
-h, --help Show help.
--version Show version.
G5repair
--------
Expand All @@ -79,20 +82,20 @@ G5repair

.. code-block:: none
G5repair
Extract readable data from a HDF5-file and copy it to a new HDF5-file.
G5repair
Extract readable data from a HDF5-file and copy it to a new HDF5-file.
Usage:
G5repair [options] <source> <destination>
Usage:
G5repair [options] <source> <destination>
Arguments:
<source> Source HDF5-file, possibly containing corrupted data.
<destination> Destination HDF5-file.
Arguments:
<source> Source HDF5-file, possibly containing corrupted data.
<destination> Destination HDF5-file.
Options:
-f, --force Force continuation, overwrite existing files.
-h, --help Show help.
--version Show version.
Options:
-f, --force Force continuation, overwrite existing files.
-h, --help Show help.
--version Show version.
G5compare
---------
Expand All @@ -101,19 +104,40 @@ G5compare

.. code-block:: none
G5compare
Compare two HDF5 files. If the function does not output anything all datasets are present in both
files, and all the content of the datasets is equals
G5compare
Compare two HDF5 files. If the function does not output anything all datasets are present in both
files, and all the content of the datasets is equals
Usage:
G5compare [options] [--renamed ARG]... <source> <other>
Usage:
G5compare [options] [--renamed ARG]... <source> <other>
Arguments:
<source> HDF5-file.
<other> HDF5-file.
Arguments:
<source> HDF5-file.
<other> HDF5-file.
Options:
-r, --renamed=ARG Renamed paths, separated by a separator (see below).
-s, --ifs=ARG Separator used to separate renamed fields. [default: :]
-h, --help Show help.
--version Show version.
Options:
-r, --renamed=ARG Renamed paths, separated by a separator (see below).
-s, --ifs=ARG Separator used to separate renamed fields. [default: :]
-h, --help Show help.
--version Show version.
G5repack
---------

[:download:`G5repack <../GooseHDF5/cli/G5repack.py>`]

.. code-block:: none
G5repack
Read and write a HDF5 file, to write it more efficiently by removing features like
extendible datasets.
Usage:
G5repack [options] <source>...
Arguments:
<source> HDF5-file.
Options:
-h, --help Show help.
--version Show version.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@
'G5compare = GooseHDF5.cli.G5compare:main',
'G5list = GooseHDF5.cli.G5list:main',
'G5print = GooseHDF5.cli.G5print:main',
'G5repack = GooseHDF5.cli.G5repack:main',
'G5repair = GooseHDF5.cli.G5repair:main']})
24 changes: 24 additions & 0 deletions test/cli/G5repack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import subprocess
import os
import h5py
import numpy as np

def run(cmd):
out = list(filter(None, subprocess.check_output(cmd, shell=True).decode('utf-8').split('\n')))
return out

a = np.random.random(5)

with h5py.File('a.hdf5', 'w') as source:
source['/a'] = a

output = run('G5repack a.hdf5')

with h5py.File('a.hdf5', 'r') as source:
b = source['/a'][...]

os.remove('a.hdf5')

if not np.all(a == b):
raise IOError('Test failed')

0 comments on commit 19b7135

Please sign in to comment.