Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when writing 2D seismic data to file after reprojecting coordinates #109

Open
AlexanderJuestel opened this issue Apr 8, 2022 · 9 comments

Comments

@AlexanderJuestel
Copy link

Hello,

I am receiving a ValueError when replacing the original CDP coordinates with reprojected coordinates and then trying to save the data set again.

  • Saving the unchanged segy works without any issues
  • I also tried it with using only 9 characters as in the original data

I have attached the notebook and the data, the data can be freely downloaded from https://www.nlog.nl/index.php/en/node/588

The error occurs when trying to save the data:

from segysak.segy import segy_writer

segy_writer(seisnc = data, 
            segyfile = "../../data/seismic_data/09-02_crs_PostSTM_fisc_EPSG3034.sgy.val", 
            trace_header_map={'cdp':21, 
                              'cdp_x':181, 
                              'cdp_y':185})

Here is the full error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-432ff6e75445> in <module>
      1 from segysak.segy import segy_writer
      2 
----> 3 segy_writer(seisnc = data, 
      4             segyfile = "../../data/seismic_data/09-02_crs_PostSTM_fisc_EPSG3034.sgy.val",
      5             trace_header_map={'cdp':21, 

~\Anaconda3\envs\test_gempy\lib\site-packages\segysak\segy\_segy_writer.py in segy_writer(seisnc, segyfile, trace_header_map, il_chunks, dimension, silent, use_text)
    778 
    779     if isinstance(seisnc, xr.Dataset):
--> 780         _segy_writer_input_handler(
    781             seisnc, segyfile, trace_header_map, dimension, silent, None, use_text
    782         )

~\Anaconda3\envs\test_gempy\lib\site-packages\segysak\segy\_segy_writer.py in _segy_writer_input_handler(ds, segyfile, trace_header_map, dimension, silent, il_chunks, text)
    715         thm.unfreeze()
    716         thm.update(trace_header_map)
--> 717         _ncdf2segy_2d(ds, segyfile, **common_kwargs, **thm)
    718     elif ds.seis.is_2dgath():
    719         thm = output_byte_locs("standard_2d_gath")

~\Anaconda3\envs\test_gempy\lib\site-packages\segysak\segy\_segy_writer.py in _ncdf2segy_2d(ds, segyfile, cdp, cdp_x, cdp_y, dimension, text, silent, **thm_args)
    526     spec.tracecount = ni
    527 
--> 528     trace_headers = _build_trace_headers(ds, dimension, order, thm_args, cdp_x, cdp_y)
    529 
    530     # header constants

~\Anaconda3\envs\test_gempy\lib\site-packages\segysak\segy\_segy_writer.py in _build_trace_headers(ds, vert_dimension, other_dim_order, thm_args, cdp_x, cdp_y)
    209         try:
    210             trace_headers[thm_args[var]] = (
--> 211                 ds[var]
    212                 .broadcast_like(trace_layout)
    213                 .transpose(*other_dim_order, transpose_coords=True)

~\Anaconda3\envs\test_gempy\lib\site-packages\xarray\core\dataarray.py in transpose(self, transpose_coords, missing_dims, *dims)
   2308         """
   2309         if dims:
-> 2310             dims = tuple(utils.infix_dims(dims, self.dims, missing_dims))
   2311         variable = self.variable.transpose(*dims)
   2312         if transpose_coords:

~\Anaconda3\envs\test_gempy\lib\site-packages\xarray\core\utils.py in infix_dims(dims_supplied, dims_all, missing_dims)
    790         existing_dims = drop_missing_dims(dims_supplied, dims_all, missing_dims)
    791         if set(existing_dims) ^ set(dims_all):
--> 792             raise ValueError(
    793                 f"{dims_supplied} must be a permuted list of {dims_all}, unless `...` is included"
    794             )

ValueError: ('cdp',) must be a permuted list of ('cdp', 'cdp_x'), unless `...` is included

09-02_crs_PostSTM_fisc.sgy.zip

[09-02_crs_PostSTM_fisc.sgy.zip]

@trhallam
Copy link
Owner

trhallam commented Apr 8, 2022

So I've downloaded your data and tried to run an import export workflow.

The following appears to work for me

seisnc_vol = segy_loader("09-02_crs_PostSTM_fisc.sgy.val", cdp=21, cdpx=181, cdpy=185)
segy_writer(seisnc_vol, "test_out2d_2.sgy.val", {'cdp':21, 'cdp_x':181, 'cdp_y':185})

Can you try removing relative paths, ensure the call to segy_writer is the same as mine here.

This is what the loaded data looks like for me:

<xarray.Dataset>
Dimensions:  (cdp: 2702, twt: 2551)
Coordinates:
  * cdp      (cdp) uint16 1 2 3 4 5 6 7 8 ... 2696 2697 2698 2699 2700 2701 2702
  * twt      (twt) float64 0.0 2.0 4.0 6.0 ... 5.096e+03 5.098e+03 5.1e+03
    cdp_x    (cdp) float32 194.1 194.1 194.1 194.1 ... 220.7 220.7 220.7 220.7
    cdp_y    (cdp) float32 382.5 382.5 382.5 382.5 ... 382.2 382.2 382.2 382.2
Data variables:
    data     (cdp, twt) float32 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

@trhallam
Copy link
Owner

trhallam commented Apr 8, 2022

Ok, Sorry, I didn't see the notebook. This cell is your issue

data['cdp_x'] = np.around(x_3034, 1)
data['cdp_y'] = np.around(y_3034, 1)

When you assign numpy arrays back to and xarray Dataset it doesn't know how to interpret the dimension of what you are assigning. The assign operation overwrites any variable that was there before, so previous variables are ignored. What xarray does when it doesn't have the dimension specified is it creates a new dimension with the variable name. So after this operation cdp_x exists on the dimension cdp_x instead of cdp.

You can fix this two ways, specify the dimensions in the assignment

data['cdp_x'] = (("cdp",), np.around(x_3034, 1))
data['cdp_y'] = (("cdp",), np.around(y_3034, 1))

Or access the data array directly by indexing

data["cdp_x"][:] = np.around(x_3034, 1)

@AlexanderJuestel
Copy link
Author

Hey,

I tried both ways but when reloading the data just one cell below it is all NaN values even though the coordinates were changed, see screenshot. I also tried changing the trace_header_map but I am still getting NaNs.

I am starting to doubt my Python skills :D

image

@AlexanderJuestel
Copy link
Author

@trhallam I finally managed to convert the coordinates :) I will implement the functionality in GemGIS as it is quite useful for us working with cross-country seismic. I will also reference a notebook here later.

@AlexanderJuestel
Copy link
Author

@trhallam I noticed something weird:

  1. The coordinates from the coordinate transformation vary from the coordinates that the segy data set has:
    Transformed coordinates:
    image
    Appended coordinates:
    image

  2. As you can see, the Y coordinates are just NaNs. Any idea why that is? Maybe an error related to the length of coordinates?

@trhallam
Copy link
Owner

Possibly, you might have to set a coordinate scaler in the attributes of the Dataset before you write it out. If you load a 2d line that someone else has created in the same area, you'll probably see a scalar like -100.

@AlexanderJuestel
Copy link
Author

The coordinates actually fit when reading the data. How would you set the coordinate scalar?

@trhallam
Copy link
Owner

trhallam commented Sep 13, 2023

The problem is not so much the Python side but the SEGY side, there will never be an issue reading in the data, and the coordinates are transformed using the scalar from the trace headers. The trace header standard in SEGY limits the size of a number to 4 bytes. So the biggest number you can store is a 16 bit float I think, while floats in Python are typically 32 or 64 bit. This limits the maximum number of exponent bits you can store. Thus moving them into the fraction component reduces accuracy but typically we don't need more than 2 or 3 decimals for coordinates. Sub-mm accuracy isn't practical for most seismic.

To write the data out you should set the coord_scalar attribute on the dataset. Something like

seismic.coord_scalar = -100

before you write out the data. The scalar follows the SEGY conventions where -100 would mean coord/100.

You can see if this attribute is set to something on the data which you read in previously.

print(readin_seismic.coord_scalar)

@AlexanderJuestel
Copy link
Author

Yes, that worked! It was not

seismic.coord_scalar = -100

but

seismic.attrs['coord_scalar'] = -100

that did the trick. Thanks for the prompt help! I hope everything works now :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants