# SEG-Y to Vector DataFrames and Back

The connection of segysak to `xarray` greatly simplifies the process of vectorising segy 3D data and returning it to SEGY. To do this, one can use the close relationship between `pandas` and `xarray`.

## Loading Data

We start by loading data normally using the `segy_loader` utility. For this example we will use the Volve example sub-cube.

In [1]:
import pathlib
from IPython.display import display
from segysak.segy import segy_loader, well_known_byte_locs, segy_writer

volve_3d_path = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("3D", volve_3d_path.exists())

volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))

3D True


  0%|                                                                                                                                                                        | 0.00/12.3k [00:00<?, ? traces/s]

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.3k/12.3k [00:00<00:00, 38.5k traces/s]

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.3k/12.3k [00:00<00:00, 38.4k traces/s]




Loading as 3D
Fast direction is TRACE_SEQUENCE_FILE


Converting SEGY:   0%|                                                                                                                                                       | 0.00/12.3k [00:00<?, ? traces/s]

Converting SEGY:   7%|██████████▌                                                                                                                                     | 900/12.3k [00:00<00:01, 8.99k traces/s]

Converting SEGY:  15%|█████████████████████▊                                                                                                                        | 1.89k/12.3k [00:00<00:01, 9.55k traces/s]

Converting SEGY:  23%|████████████████████████████████▊                                                                                                             | 2.85k/12.3k [00:00<00:00, 9.54k traces/s]

Converting SEGY:  31%|███████████████████████████████████████████▊                                                                                                  | 3.81k/12.3k [00:00<00:00, 9.55k traces/s]

Converting SEGY:  39%|███████████████████████████████████████████████████████▏                                                                                      | 4.79k/12.3k [00:00<00:00, 9.64k traces/s]

Converting SEGY:  47%|██████████████████████████████████████████████████████████████████▍                                                                           | 5.76k/12.3k [00:00<00:00, 9.68k traces/s]

Converting SEGY:  55%|█████████████████████████████████████████████████████████████████████████████▉                                                                | 6.76k/12.3k [00:00<00:00, 9.79k traces/s]

Converting SEGY:  63%|█████████████████████████████████████████████████████████████████████████████████████████▎                                                    | 7.75k/12.3k [00:00<00:00, 9.81k traces/s]

Converting SEGY:  71%|████████████████████████████████████████████████████████████████████████████████████████████████████▋                                         | 8.74k/12.3k [00:00<00:00, 9.84k traces/s]

Converting SEGY:  79%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                             | 9.73k/12.3k [00:01<00:00, 9.85k traces/s]

Converting SEGY:  87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                  | 10.7k/12.3k [00:01<00:00, 9.83k traces/s]

Converting SEGY:  95%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉       | 11.7k/12.3k [00:01<00:00, 9.84k traces/s]

Converting SEGY: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.3k/12.3k [00:01<00:00, 9.45k traces/s]




## Vectorisation

Once the data is loaded it can be converted to a `pandas.DataFrame` directly from the loaded `Dataset`. The Dataframe is multi-index and contains columns for each variable in the originally loaded dataset. This includes the seismic amplitude as `data` and the `cdp_x` and `cdp_y` locations. If you require smaller volumes from the input data, you can use xarray selection methods prior to conversion to a DataFrame.

In [2]:
volve_3d_df = volve_3d.to_dataframe()
display(volve_3d_df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,data,cdp_x,cdp_y
iline,xline,twt,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10090,2150,4.0,0.020575,436400.500,6477447.0
10090,2150,8.0,0.022041,436400.500,6477447.0
10090,2150,12.0,0.019659,436400.500,6477447.0
10090,2150,16.0,0.025421,436400.500,6477447.0
10090,2150,20.0,0.025436,436400.500,6477447.0
...,...,...,...,...,...
10150,2351,3384.0,0.000000,434144.125,6478782.5
10150,2351,3388.0,0.000000,434144.125,6478782.5
10150,2351,3392.0,0.000000,434144.125,6478782.5
10150,2351,3396.0,0.000000,434144.125,6478782.5


We can remove the multi-index by resetting the index of the DataFrame. Vectorized workflows such as machine learning can then be easily applied to the DataFrame.

In [3]:
volve_3d_df_reindex = volve_3d_df.reset_index()
display(volve_3d_df_reindex)

Unnamed: 0,iline,xline,twt,data,cdp_x,cdp_y
0,10090,2150,4.0,0.020575,436400.500,6477447.0
1,10090,2150,8.0,0.022041,436400.500,6477447.0
2,10090,2150,12.0,0.019659,436400.500,6477447.0
3,10090,2150,16.0,0.025421,436400.500,6477447.0
4,10090,2150,20.0,0.025436,436400.500,6477447.0
...,...,...,...,...,...,...
10473695,10150,2351,3384.0,0.000000,434144.125,6478782.5
10473696,10150,2351,3388.0,0.000000,434144.125,6478782.5
10473697,10150,2351,3392.0,0.000000,434144.125,6478782.5
10473698,10150,2351,3396.0,0.000000,434144.125,6478782.5


## Return to Xarray

It is possible to return the DataFrame to the Dataset for output to SEGY. To do this the multi-index must be reset. Afterward, `pandas` provides the `to_xarray` method.

In [4]:
volve_3d_df_multi = volve_3d_df_reindex.set_index(["iline", "xline", "twt"])
display(volve_3d_df_multi)
volve_3d_ds = volve_3d_df_multi.to_xarray()
display(volve_3d_ds)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,data,cdp_x,cdp_y
iline,xline,twt,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10090,2150,4.0,0.020575,436400.500,6477447.0
10090,2150,8.0,0.022041,436400.500,6477447.0
10090,2150,12.0,0.019659,436400.500,6477447.0
10090,2150,16.0,0.025421,436400.500,6477447.0
10090,2150,20.0,0.025436,436400.500,6477447.0
...,...,...,...,...,...
10150,2351,3384.0,0.000000,434144.125,6478782.5
10150,2351,3388.0,0.000000,434144.125,6478782.5
10150,2351,3392.0,0.000000,434144.125,6478782.5
10150,2351,3396.0,0.000000,434144.125,6478782.5


The resulting dataset requires some changes to make it compatible again for export to SEGY.
Firstly, the attributes need to be set. The simplest way is to copy these from the original SEG-Y input. Otherwise they can be set manually. `segysak` specifically needs the `sample_rate` and the `coord_scalar` attributes.

In [5]:
volve_3d_ds.attrs = volve_3d.attrs
display(volve_3d_ds.attrs)

{'ns': None,
 'sample_rate': 4.0,
 'text': Text HeaderC 1 SEGY OUTPUT FROM Petrel 2017.2 Saturday, June 06 2020 10:15:00
 C 2 Name: ST10010ZDC12-PZ-PSDM-KIRCH-FULL-T.MIG_FIN.POST_STACK.3D.JS-017534
 ÝCroC 3
 C 4 First inline: 10090  Last inline: 10150
 C 5 First xline:  2150   Last xline:  2351
 C 6 CRS: ED50-UTM31 ("MENTOR:ED50-UTM31:European 1950 Based UTM, Zone 31 North,
 C 7 X min: 433955.09 max: 436589.56 delta: 2634.47
 C 8 Y min: 6477439.46 max: 6478790.23 delta: 1350.77
 C 9 Time min: -3402.00 max: -2.00 delta: 3400.00
 C10 Lat min: 58.25'52.8804"N max: 58.26'37.9493"N delta: 0.00'45.0689"
 C11 Long min: 1.52'7.1906"E max: 1.54'50.9616"E delta: 0.02'43.7710"
 C12 Trace min: -3400.00 max: -4.00 delta: 3396.00
 C13 Seismic (template) min: -58.55 max: 54.55 delta: 113.10
 C14 Amplitude (data) min: -58.55 max: 54.55 delta: 113.10
 C15 Trace sample format: IEEE floating point
 C16 Coordinate scale factor: 100.00000
 C17
 C18 Binary header locations:
 C19 Sample interval             

The `cdp_x` and `cdp_y` positions must be reduced to 2D along the vertical axis "twt" and set as coordinates.

In [6]:
volve_3d_ds["cdp_x"] = volve_3d_ds["cdp_x"].mean(dim=["twt"])
volve_3d_ds["cdp_y"] = volve_3d_ds["cdp_y"].mean(dim=["twt"])
volve_3d_ds = volve_3d_ds.set_coords(["cdp_x", "cdp_y"])
volve_3d_ds

Afterwards, use the `segy_writer` utility as normal to return to SEGY.

In [7]:
segy_writer(volve_3d_ds, "test.segy")

Writing to SEG-Y:   0%|                                                                                                                                                         | 0/12322 [00:00<?, ? traces/s]

Writing to SEG-Y:  44%|█████████████████████████████████████████████████████████████▌                                                                             | 5454/12322 [00:00<00:00, 53616.34 traces/s]

Writing to SEG-Y:  95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏      | 11716/12322 [00:00<00:00, 57979.12 traces/s]

Writing to SEG-Y: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12322/12322 [00:00<00:00, 57134.88 traces/s]


