Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zgy_writer #117

Closed
ksurf1 opened this issue May 22, 2023 · 8 comments
Closed

zgy_writer #117

ksurf1 opened this issue May 22, 2023 · 8 comments

Comments

@ksurf1
Copy link

ksurf1 commented May 22, 2023

Hi, great job on the segysak package !

It might not be the best venue to write this since it is not a real issue with the existing code.

I would like to harvest the ZGY compression options and write 16bit ZGY files.
Having worked in the past with ZGY files on the development side, there are variable like bias and factor to set in the metadata in order to be able to write data as int16 or int8. (float32 = int16*factor+bias). Also there is a "brick" size which is by default [64,64,64] to compress the data: if all the data values are identical in one brick, only one value is dumped to disk not 64^3. This compresses the file size a lot when portion of the data is 0.0 for instance.

Right now your code writes data as float32 uncompressed if I am not mistaken.

Can you give me some pointers to modify your code to achieve the best compression possible ?

Much obliged.

Would this work in your code ?

def zgy_writer(seisnc_dataset, filename, datatype = "float", datarange=(-2**15, 2**15-1), dimension=None):
    """Write a seisnc dataset to ZGY file format.

    Args:
        seisnc_dataset (xr.Dataset): This should be a seisnc dataset.
        filename (pathlib.Path/str): The filename to write to.
        datatype  (str): "float", "int16" or "int8"
        datarange (float,float): (-1500.0, 1500.0)
        dimension (str, optional): The dimension to write out, if
            None uses available dimenions twt first.
    """
    assert seisnc_dataset.seis.is_3d()

    seisnc_dataset.seis.calc_corner_points()
    if seisnc_dataset.corner_points_xy:
        corners = tuple(seisnc_dataset.corner_points_xy[i] for i in [0, 3, 1, 2])
    else:
        corners=[(0, 0), (0, 0), (0, 0), (0, 0)]

    dimension = _check_dimension(seisnc_dataset, dimension)
    if dimension == "twt":
        zunitdim = UnitDimension(2001)
    elif dimension == "depth":
        zunitdim = UnitDimension(2002)
    else:
        zunitdim = UnitDimension(2000)
    try:
        coord_scalar_mult = seisnc_dataset.attrs["coord_scalar_mult"]
    except KeyError:
        coord_scalar_mult = 1.0

    # dimensions
    ni, nj, nk = (
        seisnc_dataset.dims[CoordKeyField.iline],
        seisnc_dataset.dims[CoordKeyField.xline],
        seisnc_dataset.dims[dimension],
    )

    # vertical
    z0 = int(seisnc_dataset[dimension].values[0])
    dz = int(seisnc_dataset.sample_rate)

    # annotation
    il0 = seisnc_dataset[CoordKeyField.iline].values[0]
    xl0 = seisnc_dataset[CoordKeyField.xline].values[0]
    dil = int(np.median(np.diff(seisnc_dataset[CoordKeyField.iline].values)))
    dxl = int(np.median(np.diff(seisnc_dataset[CoordKeyField.xline].values)))

    dtype_float = SampleDataType[datatype]
    
    if datatype == "float":
        datarange = (-float('inf'),float('inf'))
    

    with ZgyWriter(
        str(filename),
        size=(ni, nj, nk),
        datatype=dtype_float,
        datarange=datarange,
        zunitdim=zunitdim,
        zstart=z0,
        zinc=dz,
        annotstart=(il0, xl0),
        annotinc=(dil, dxl),
        corners=corners,
    ) as writer:

        order = (CoordKeyField.iline, CoordKeyField.xline, dimension)

        writer.write(
            (0, 0, 0),
            seisnc_dataset[VariableKeyField.data]
            .transpose(*order)
            .values.astype(np.float32),
        )


V

@ksurf1
Copy link
Author

ksurf1 commented May 23, 2023

this code works. I would like to be able to set a compressor and lodcompressor

Does anyone have a syntax for using the compressor arguments to Writer ?

@trhallam
Copy link
Owner

Thanks for your feedback.

Ideally this code needs to be migrated to use pyzgy on the backend and as a natural xarray backend but that is low priority with focus being SEGY.

You can see the openzgy code in the pyzgy library which is a fork of the code from the OSDU implementation. It also uses the Python API reference implementation which does not natively support compression. Additional instructions must be followed on their website to install the ZFP compressor.

Otherwise:

  • blocks are [64,64,64] by default.
  • references to output type are given here

To change the output datatype, in the segysak code you could add additional checks for the datatype to select the correct format to output to ZgyWriter.

@ksurf1
Copy link
Author

ksurf1 commented May 25, 2023 via email

@da-wad
Copy link

da-wad commented May 28, 2023

@ksurf1 depending on your aims here, you may find converting your SEG-Y files to either the fully opensource seismic-zfp or Bluware's nearly-open compressed VDS format helps you further with disk space. Either of these could be read through segysak (writing would take effort) since they have segyio-like reading syntax available (pyvds for VDS). However, if you have more than half the volume as a constant value you'd win more from ZGY/VDS than seismic-zfp as the latter favours regularity in the data over identifying constant-value bricks in order to make offsets computable rather than requiring a disk location lookup for bricks.

@ksurf1
Copy link
Author

ksurf1 commented May 28, 2023 via email

@ksurf1
Copy link
Author

ksurf1 commented May 30, 2023 via email

@ksurf1
Copy link
Author

ksurf1 commented May 30, 2023 via email

@trhallam
Copy link
Owner

I've moved your second issue to #119 please put separate questions in separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants