<img src="swung_logo_vector.png" alt="swung" style="width: 40%;"/>

# T21 Segysak Tutorial - Tony Hallam, April 2021



Welcome!

**Admin:**
  - Get the tutorial and data on Github (https://github.com/trhallam/segysak-t21-tutorial)
  - Run the tutorial on binder.
  - Limited interaction during the video but talk to the `segysak` experts in GatherTown and on the Tutorial Slack Channel

# Introduction

## What is `segysak`?

## Tour

## `segysak` vs X

## SEG-Y Files

## `segysak`

Make SEG-Y data easily accessible and creatable from Python

Simply put, `segysak` has grown as a set of tools to make SEG-Y data easily accessible and createable from Python.
If leverages a number of existing libraries but brings them together to try and improve the user experience, and
to remove as much boiler plate code as possible when dealing with SEG-Y.

The project started about a year ago at Transform 2020. Most of the work was done during that hackathon but it has
continued to develop since then with gradual improvements, bug-fixes and user support.

Although I'm the project owner and one of the primary users of `segysak` (I use it it in a lot of my PhD projects).
It is open for the subsurface community to not only utilise, but to contribute to grow to meet peoples needs.

I'd strongly encourage anyone with ideas and/or enthusiasm for changes or additions to get in touch so we can improve `segysak` for everyone.

## Tour

 - Github (source code, issues, contributions) - https://github.com/trhallam/segysak
 - Documentation (help, examples, API) - https://segysak.readthedocs.io/en/latest/
 - Slack (help, discussion, ideas, contributions) - https://swung.slack.com/messages/segysak/

Everything you need to know about `segysak` is available online. There is the Github repository where we manage the source code for the library and distribute the packages for installation via pip. 

There is also an issue tracker where you can raise bugs/problems or submit ideas or suggestions. It's also a good place to look for things that need doing if you want to help out.

We then have the documentation on readthedocs. Here you will find more detailed help, examples (which are avaialble as Notebooks) and the API (of function and member descriptions). This is a really useful place to come if you are stuck, or
need more detail because we cover a lot of the basics in the documentation. Indeed this workshop is heavily influenced
by the first few example notebooks you can find here.

Finally, we have the Slack forum hosted on swung.slack. This space is always open for people to ask questions or get help, even drop by just for a bit of discussion.

## `segysak` versus X

A lot of the time I get asked about segysak versus X in the Python world, where does it fit in?
The reality is, segysak doesn't so much compete with any part of the scientific stack but tries to form bridges over
the common space we often have to traverse. For example.

### `segyio`

 - `segysak` relies on `segyio` but abstracts a lot of the low level detail

segysak couldn't exist without segyio - segyio does all the hard work of interacting with the actual SEG-Y and segysak tries to make segyio a bit more accessible by providing a direct link between it and easy to use libraries like xarray.

### `xarray`

 - `segysak` extends `xarray` to make it easier to deal with SEG-Y files

Things like loading and writing of files are more automated. Trys to take care of tracking things like headers, and attributes for you.

Also includes extensions for common seismic related tasks.

## SEG-Y Files

File format defined by the SEG Organisation for storing seismic trace data.

Heavily geared toward limited size magnetic reel tapes.

**Basic Format (SEGY-Rev2):**

<img src="segy_layout.png" alt="swung" style="width: 100%;"/>


# Installation

`pip install segysak`

Demo Data

Opening the Tutorial Notebook

In [None]:
from segysak.segy import segy_loader


## Basic Imports and Test Data

In [None]:
import pathlib
import platform
from IPython.display import display
import pandas as pd
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

In [None]:
# specify the example file and check we have the example data

segy_file = pathlib.Path("data/volve10r12-full-twt-sub3d.sgy")
print("SEG-Y exists:", segy_file.exists())
 

# Basic Usage

### Inspecting SEG-Y files

### Loading SEG-Y files

### `xarray.Dataset` basics

### NetCDF Files

### Editing and Saving

## Inspecting SEG-Y files


There are a number of utility functions in `segysak` designed to help you explore and understand the data in SEG-Y files without needing to load the entire SEG-Y file in.

For example we can look at the text header of the file by using the function `get_segy_texthead`.

In [None]:
from segysak.segy import segy_header_scan, segy_header_scrape, get_segy_texthead

In [None]:
# examine the text header
get_segy_texthead(segy_file)

## Inspecting SEG-Y files - trace header scan


The `segy_header_scan` function allows us to get a `pandas.DataFrame` containing information about the contents of the headers for the first few traces of a SEG-Y file. This saves us having to scan the whole file and often contains enough information for us to then load the file properly.

The DataFrame index names are the same as what `segyio` uses but `segysak` tables them and gives you a few bits of information. Importantly for properly loading SEG-Y data you will need to note the byte locations of header information you want. 

You can increase the number of traces you want to scan by setting the `max_traces_scan` keyword in the function - by default it is 1000 traces.

Try using the context manager to display more rows.
```python
with pd.option_context("display.max_rows", 100):
    display(df)
```

In [None]:
# scan the headers to check
scan = segy_header_scan(segy_file, max_traces_scan=2000)
scan

In [None]:
with pd.option_context("display.max_rows", 100):
    display(scan)

## Inspecting SEG-Y files - trace header scrape


We can also extract all of the trace header information using `segy_header_scrape`. This function creates a complete copy of the traces headers as a `pandas.DataFrame`. On larger files it can take a little bit of time to scan all the traces. You can see even on this small volume there are 12,322 traces.

The column names again are the same as in `segyio`.

We can do a quick check of the headers here by creating some simple plots. If you're used to loading SEG-Y in commercial software the usally offer you something showing trace number vs value.

In [None]:
trace_headers = segy_header_scrape(segy_file)
trace_headers

In [None]:
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(20, 10), sharex=True)

for ax, prop in zip(axs.ravel(), ["CDP_X", "CDP_Y", "INLINE_3D", "CROSSLINE_3D"]):
    ax.plot(trace_headers[prop])
    ax.set_title(prop, fontdict={"fontsize":18})

for ax in axs[1, :]:
    ax.set_xlabel("Trace", fontdict={"fontsize":18})

### Loading SEG-Y files - complete



Loading of SEG-Y data is also pretty straight forward, and in `segysak` there are a few different ways to go about it.

The most straight forward way is to use the multi-purpose function `segy_loader`. This function loads, 2D, 3D and gathers. There are a lot of options to customise the loader for most situations.

In [None]:
from segysak.segy import segy_loader
help(segy_loader)

In [None]:
# loading with default byte locations
seisnc_vol = segy_loader(segy_file)

`segy_loader` returns an `xarray.Dataset` which has dimensions appropriate for the type of seismic loaded. In this case we have a 3D volume so our dimensions are `iline`, `xline`, and because we used the default option of `TWT` the vertical dimension is `twt`, but if you have a depth volume you can specify this in the `segy_loader` keyword arguments using `vert_domain="DEPTH"`.

The actual seismic data/volume is contained within `data` variable, and we see this has the full dimensions of the cube. Whilst the `cdp_x` and `cdp_y` values from the trace headers don't have the vertical dimension.

There is also a number of attributes which are created by the loader and provide information about the loaded file.

In [None]:
print(seisnc_vol)

In [None]:
# lets quickly checkout our data - we'll talk about xarray basics in the next section
_ = seisnc_vol.sel(iline=10100).data.T.plot(yincrease=False, figsize=(20, 10), vmax=10)

Here is an example where we explicitly set the key header byte locations of `iline`, `xline`, `cdpx` & `cdpy`. If other values from the header are needed, the byte locations can be set using the `extra_byte_fields` keyword argument.

In [None]:
# specifying byte locations for key cube geometry
_ = segy_loader(
    segy_file,
    iline=189, xline=193, cdpx=181, cdpy=185,
    vert_domain="DEPTH",
    #extra_byte_fields=[117]
    extra_byte_fields={"my_name":117}
)
print(_)

### Loading SEG-Y files - with filtering



It is also possible to filter the data being loading in using the cropping keywords. These can be used to restrict the amount of data being loaded in. In this example we load just a single in-line `10100`. 

Unfortunately to filter the data we still have to scan all the headers but the loaded volume is now smaller.

In [None]:
seisnc_vol_iline_10100 = segy_loader(segy_file, ix_crop=(10100, 10100, 2000, 3000))

In [None]:
print(seisnc_vol_iline_10100)

Sometimes SEG-Y are really big and scanning the headers repeatedly can be awkward. 
Recall that we previous scanned the whole trace headers into a DataFrame. 
`segysak` allows us to perform filtering on the DataFrame, and use the filtered version for loading. 
This provides an enormous amount of flexibility on what data is loaded and means the headers only need to be scanned once on multiple loading events.

The trace header dataframe should be passed to the `head_df` keyword argument.

In [None]:
seisnc_vol_iline_block = segy_loader(
    segy_file,
    head_df=trace_headers[trace_headers["INLINE_3D"] <= 10100].copy()
)

In [None]:
print(seisnc_vol_iline_block)

Other useful loading functions in `segysak`.
  - `segy.segy_converter` - Streams data from SEG-Y to NetCDF4 on disk
  - `segy.segy_freeloader` - Support for higher dimensional data (development branch).
  - `openzgy.zgy_loader` - Experimental support for ZGY based upon open ZGY.

## `xarray.Dataset` basics

Based upon the NetCDF file format for multi-variable, n-dimensional data.

<center> <img src="seisnc-diagram.png" alt='seisnc' width="50%" /> </center>

All of the intricacies of `xarray` are a bit beyond this tutorial but we'll try to quickly cover here some of the most useful ones with `segysak` seismic Datasets.
The different parts of the dataset can be accessed through properties of the class.
  - dimensions : `dim`
  - coordinates : `coords`
  - variables : `variables`
  - attributes : `attrs`

In [None]:
# dataset anatomy - dimensions, coordinates, variables, attributes - DataArray vs Dataset
seisnc_vol

Sub-selections with `xarray` are not made by using regular indexing like `numpy` for example because `xarray` does not guarentee the order of dimensions. The key methods for selection are `sel`, and `isel` which allow labelled selection.

In [None]:
# data selection - sel, isel

The values of any dimension, coordinates, and variables can be returned using the property.

In [None]:
# data as numpy array
seisnc_vol.iline.values

There are also a few useful methods to know about including `plot`, `interp`, `mean`, `max`, `min` and so on.

Two really important ones are `transpose` which lets you do labeled transposing of variables and `broadcast_like` which allows you to transform one variable or dataset to match another. 

In [None]:
# xarray methods (plot, interp, mean, min, max, transpose, broadcast_like, etc...)
# seisnc_vol.cdp_x.transpose("xline", "iline")

The last important thing to learn with `xarray` is how to assign data into variables. Because `DataSets` are multi-dimensional we also have to give `xarray` information about the dimensions of the data.

```python
print(seisnc_vol.dims)
seisnc_vol["zeros"] = (("iline", "xline", "twt"), np.zeros((61, 202, 850)))
seisnc_vol
```

In [None]:
# xarray variable assignment

## `xarray` FAQ

 - Why don't we make the global coordinates the dimensions?
 - How do I save/persist my changes.

Notes: 
Global coordinates are not orthogonal because the seismic grid rarely lines up with Grid North.
Persisting changes either means saving back to SEG-Y or using the NetCDF File Format. 

# NetCDF File Format

Common in climate science, binary, fast and lazy loading

(basically `xarray.Datataset` on disk)


## NetCDF

Why use “another” file format for seismic?

Generally it just makes working with seismic in Python easier. It will save you time if you are reading volumes repeatedly or can't store everything you need in memory.

NetCDF was the logical choice because it is at the core of `xarray` but `xarray` supports other data models such as zarr which are investigating.
There is also beta support within `segysak` for the OpenZGY format with instructions on Github about how to set that up.

 - Faster than SEG-Y for most use cases.

 - Widely supported within the Python scientific stack (xarray, dask).

 - Commonly supported in other languages.

Saving the data to netcdf requires the use of the seisio accessor due to limitations on the types of attributes that can be
saved using the xarray method.



In [None]:
# output the data to netcdf
seisnc_vol.seisio.to_netcdf("data/test.seisnc")

In [None]:
if platform.system() == "Windows":
    !dir data\.
else: ## linux
    !ls data/.

We can check to see if the seisnc NetCDF file was created ok by reimporting it. To get it back into the same form that `segysak` uses we can use the `open_seisnc` function. `open_seisnc` is a thin wrapper around the `xarray.open_dataset` method that includes some special handling for segysak attributes and ensures that the dataset is opened with the `.seis` extension for xarray which we will talk about soon.

In [None]:
from segysak import open_seisnc
open_seisnc("data/test.seisnc")

# Saving to SEG-Y

Generally if you have loaded a SEG-Y file and edited it, you can then save that file back to a new SEG-Y in one line.
Currently SEGY-SAK doesn't support editing SEG-Y in place but it is something that might come in the future if the demand is
there (or you could help develop this for us). 

There are a few attributes that `segysak` needs to write your a new SEG-Y file.
  - `coord_scalar` (int)
  - `sample_rate` (float)
  - `source_file` (str)
  
It also needs dimensions from one of the dimension sets. 

In [None]:
from segysak.segy import segy_writer
help(segy_writer)

In [None]:
test = seisnc_vol.copy()
test.attrs = {"coord_scalar":-100, "sample_rate":4.0, "source_file":""}

If byte locations need to be changed for specific software the `trace_header_map` keyword is available. Any variable keys in the dataset can be assigned to a trace header byte location.

In this example we set the `iline` to go to byte location 21.

In [None]:
# export in memory dataset to segy
segy_writer(test, "data/test.segy", trace_header_map={"iline":21})

If we create variables that cover the trace header dimensions (iline/xline) then these can also be included in the output to SEG-Y by specifying the variable key and the byte location where the variable should be placed.

In [None]:
# write other variables
seisnc_vol["xy"] = seisnc_vol["cdp_x"] * seisnc_vol["cdp_y"] / 1E10
seisnc_vol
# segy_writer(seisnc_vol, "data/test.segy", trace_header_map={"xy":13})

Lets read the headers of that SEG-Y and see what we got. The output has been converted to int so our floating point values are gone. This is a limitation of the SEG-Y format. Any floating point numbers must be scaled to int and back again on loading. This is done automatically for coordinates but all other values must be handled manually.

In [None]:
segy_header_scrape("data/test.segy").T

# 10 Minute Break

In [None]:
import time
from tqdm.auto import tqdm

with tqdm(desc="Break Timer", total=10*60, bar_format="{l_bar}{bar} {elapsed_s:.0f}/{total} seconds") as pbar:
    start = time.time()
    now = time.time()
    prev_now = now
    while (now - start) < 10*60:
        pbar.update(now - prev_now)
        time.sleep(1)
        prev_now = now
        now = time.time()
    pbar.update(time.time() - prev_now)

# Horizon extraction

 - Load a horizon and add it to a cube
 - Plotting maps
 - Plotting horizons on vertical slices
 - Extracting seismic ampltidues along a horizon

## Load some seismic horizon data

Lets start by specifying the path to some seismic horizon data and checking it is available.

In [None]:
top_hugin_path = pathlib.Path("data/hor_twt_hugin_fm_top.dat")
print("File", top_hugin_path, "exists?", top_hugin_path.exists())

If we quickly look at the first few lines of the file we can see it is a space delimited file with three columns.
UTM X, UTM Y and TWT. 

In [None]:
# check the file layout
with open(top_hugin_path) as f:
    lines = [next(f) for i in range(5)]
print(*lines)

It is then quite straightforward to load the file in using `pandas.read_csv`.

In [None]:
# is a csv file
top_hugin_df = pd.read_csv(top_hugin_path, names=["cdp_x","cdp_y","twt_hugin"], sep=' ')
top_hugin_df.head()

When the horizon is in this format it might not map directly to the seismic, to simplify the process of interpolating the horizon to the seismic trace locations, `segysak` as a `surface_from_points` method in the `.seis` accessor. By default, this method will try to interpolate using `cdp_x` and `cdp_y` from the dataset but these options can be changed.

A new dataset is returned with the same dimensions as the seismic volume but now with the horizon data.

In [None]:
top_hugin_ds = seisnc_vol.seis.surface_from_points(top_hugin_df, 'twt_hugin', right=('cdp_x', 'cdp_y'))
print(top_hugin_ds)

## Plotting Maps

For plotting we can using the built in plot command that comes with `xarray` datasets. This is wrapper around a call to `matplotlib` and allows us to get quick plots of the data.

In [None]:
top_hugin_ds.twt_hugin.plot(cmap='hsv')

This plotting is done on the local iline/xline grid which defines the cube though, and often we want to see things in a X and  Y type context.

There are a couple of ways to go about this.
  - One way is use explicit plotting based upon the x and y coordinates in the dataframe.
  - Another is to use a transform argument for `matplotlib`'s plotting commands.

In [None]:
axs = plt.subplot()
mesh = axs.pcolormesh(
    top_hugin_ds.cdp_x.values,
    top_hugin_ds.cdp_y.values,
    top_hugin_ds.twt_hugin.values,
    shading="auto"
)
axs.set_aspect(1)
_ = plt.colorbar(mesh, orientation="horizontal")

Using the transform can be useful when we want to plot objects using iline/xline notation but in x-y coordinate space, like an inline location for example (10100). We can also use the inverted form of the transform to convert x and y coordinates to iline and xline. `segysak` also has the `.seis.xysel` method to extract seismic based upon x and y trace locations. 

In [None]:
tform = seisnc_vol.seis.get_affine_transform()

axs = plt.subplot()
mesh = axs.pcolormesh(
    top_hugin_ds.iline,
    top_hugin_ds.xline,
    top_hugin_ds.twt_hugin.T,
    shading="auto",
    transform=tform + axs.transData
)
axs.set_aspect(1)
_ = axs.plot([10100, 10100], [2200, 2300], transform=tform + axs.transData, color="w")

## Plotting Horizons on vertical section views

The horizon data can also be assigned back to the original seismic dataset. This can be useful for doing simultaneous sub-selection of the two variables at once.

In [None]:
# assign horizon back to seismic
seisnc_vol["hugin"] = top_hugin_ds.twt_hugin
print(seisnc_vol)

For example, here we sub-select a single inline once, and then use the reference in two subsequent plotting calls, once for the seismic, and then again for the horizon. 

In [None]:
# plotting
iline_subsel = seisnc_vol.sel(iline=10100, twt=range(2402, 2900, 4), method='nearest')
fig, axs = plt.subplots(figsize=(20, 5))
iline_subsel.data.T.plot(ax=axs, yincrease=False)
_ = axs.plot(iline_subsel.xline, iline_subsel.hugin, 'k')

## Seismic amplitude maps

Extracting the intersection of a horizon with a seismic volume is realy simple in `xarray`. It is literally one line. In this case, `xarray` understands the `iline` and `xline` relationship between the input DataArray `seisnc_vol.hugin` and the seismic volume. When we pass it via the `interp` method, `xarray` performs a linear interpolation to find the intersection point returning a new amplitude DataArray.

In [None]:
amp = seisnc_vol.data.interp({"twt": seisnc_vol.hugin}, method='linear')

In [None]:
axs = plt.subplot()
mesh = axs.pcolormesh(amp.iline, amp.xline, amp.T, transform=tform + axs.transData, shading="auto", cmap="bwr_r", vmin=-6, vmax=6)
ctr = axs.contour(top_hugin_ds.cdp_x, top_hugin_ds.cdp_y, top_hugin_ds.twt_hugin, colors='w')
axs.set_aspect(1)
plt.colorbar(mesh)


# Mapping functions over blocks

 - Learn how to use Xarray to map functions on blocks of data, such as trace maths.

  - Horizon Flattening
<img src="hflat.png" alt="hflat" style="width: 60%;"/>

To do this we will introduce the `groupby` method for a dataset but to make it useful we need to create a trace identifier. `groupby` actually uses `pandas` in the backend and is the same. It will create a group of datasets based upon the key or keys.

In [None]:
for grp, subds in seisnc_vol.groupby(seisnc_vol.iline):
    print(grp)
    print(subds)
    break

In [None]:
# create a trace identifier
seisnc_vol["trace"] = (("iline", "xline"), np.arange(61*202, dtype=int).reshape(61, 202))

In [None]:
for grp, subds in seisnc_vol.groupby("trace"):
    print(grp)
    print(subds)
    break

The next step is writing a function that takes advantage of the data as it is made available from `groupby`.
To flatten the group be need to shift the time axis for each trace so that the horizon occurs at a constant time. The simplest way to do this is just to make that constant zero, so we subtract the horizon value form the time axis. 

Xarray also requires that out output cube be regularly sampled, so arbitraty shifts for each trace need to be resampled to a regular grid. That can be done using the `interp` function. `interp` applies a chosen interpolation methods (in this case linear interpolation) to resample the data against a new twt axis which we will call `twt_out`. Then the resampled trace is returned.

In [None]:
def hflat(ds, hor_var, twt_out):
    trace_out = ds.copy()
    trace_out["twt"] = ds.twt - np.squeeze(ds[hor_var].values)
    return trace_out.data.interp(twt=twt_out)

We are also going to need to specify what the output grid should be. We know that subtracting the horizon TWT from the TWT grid will result in a new full TWT grid that goes from -max to max-min of the input horizon. So we create a flattened TWT range with a sample interval of 1ms.

In [None]:
flat_twt = np.arange(-seisnc_vol.hugin.max(), seisnc_vol.twt.max()-seisnc_vol.hugin.min(), 1, dtype=int)

To recombine all the data back into a single `dataset` we can tag the `map` function onto the end of `groupby`. Doing this automatically applied what pandas and xarray call `split-apply-combine` logic, and is really handy.

Here I'm just going to apply the process to a single inline.

In [None]:
# applying groupby().map()
tg_gby = seisnc_vol.sel(iline=10100).groupby("trace").map(hflat, args=("hugin", flat_twt))

Lets plot up our normal and flattened volumes.

In [None]:
# plotting results
fig, axs = plt.subplots(ncols=2, figsize=(20, 5))
seisnc_vol \
    .sel(iline=10100, twt=range(2002, 2900, 4), method='nearest') \
    .data.T.plot(ax=axs[0], yincrease=False)
axs[0].plot(seisnc_vol.sel(iline=10100).xline, seisnc_vol.sel(iline=10100).hugin, 'k')
tg_gby \
    .sel(twt=range(-300, 300, 4), method='nearest') \
    .T.plot(ax=axs[1], yincrease=False)
axs[1].hlines(0, 0, 10000, "k")

Groupby is generally fine for small volumes, but when you start to scale up your datasize it can run into issues. Xarray links nicely to the `dask` distributed processing library and it can even lazyily load data from disk that won't fit into memory. There is `dask` tutorial on the SEGY-SAK RTD website but today I'm just going to demonstrate how we can use the experimental `map_blocks` function to to achieve the same outcome as `groupby` but in a `dask` friendly way.

Xarray has a chunking feature which allows you to break your Dataset down into smaller operational blocks. In this case we want the blocks to match the size of our operations which is just 1 trace. 

Because we won't necesarilly be doing the whole operation at once with `map_blocks` we also need to tell xarray what we think the output of our function will look like. This is done using a template. In our case the output is the same in every way, except the time dimension is resampled. Here we can just resample that dimension using `interp` to get the write output.

Note that the template is not actually calculated, just a place holder is created. This is because when we chunk xarray Datasets or DataArrays, every operation is delayed until the last possible moment. This is good for memory management and really simplifies the whole process. In this instance if you wanted the template computed you could either call `template.compute()` or access the numpy `values` array of the data variable.

Then, similar to `groupby` we pass `map_blocks` the flattening function, the extra arguments, and now the template.

In [None]:
seisnc_vol_chkd = seisnc_vol.chunk({"iline":1, "xline":1})
template = seisnc_vol_chkd.interp(twt=flat_twt)
tg_mb = seisnc_vol_chkd.map_blocks(hflat, args=("hugin", flat_twt), template=template.data)

`tg_mb` is also a delayed object here. There is a task registered for each trace, but they will only be calculated when we ask for the data. Such as when we perform plotting. To get the full volume, we have to call `compute` again.

In [None]:
fig, axs = plt.subplots(ncols=2, figsize=(20, 5))
seisnc_vol.sel(iline=10100, twt=range(2002, 2900, 4), method='nearest').data.T.plot(ax=axs[0], yincrease=False)
tg_mb.sel(iline=10100, twt=range(-300, 300, 4), method='nearest').T.plot(ax=axs[1], yincrease=False)

## `map_blocks` final thoughts

  - Not super mature yet but very useful.
  - Slightly different way of thinking to Python's normal instant run/result.
  - Other useful delayed functions are: `to_netcdf`, `rolling`, `interp`, but most xarray operations can be delayed.

# Vectorization of Seismic

 - I want to do machine learning and I need to tabularize my seismic and headers.
 - Now I need to send my results back to SEG-Y.


Converting an `xarray.Dataset` to a `pandas.DataFrame` is really simple due to the close ties between the two packages.

In [None]:
# creating a table from seismic
seisnc_vol_df = seisnc_vol.isel(iline=10).to_dataframe()
print(seisnc_vol_df)

The Dataframe will have what `pandas` calls a multi-index, so to remove it we just need to reset the index. Note this will have a big impact upon your memory footprint.

In [None]:
seisnc_reindex = seisnc_vol_df.reset_index()
print(seisnc_reindex)

In [None]:
print(seisnc_vol_df.info())
print(seisnc_reindex.info())

When operations have been completed in the tabular format we can return to `xarray`. First the multi-index must be restored to get the coordinates right.

In [None]:
seisnc_df_multi = seisnc_reindex.set_index(["iline", "xline", "twt"])
print(seisnc_df_multi)

And then we just use the `to_xarray` method.

In [None]:
seisnc_xr = seisnc_df_multi.to_xarray()
print(seisnc_xr)

The process isn't perfect though, we can see that 'cdp_x' and 'cdp_y' have come back as 3d cubes. And we will need to reset all the seisnc attributes are missing before we can export the data to SEG-Y using `segy_writer`.

In [None]:
seisnc_xr.attrs = seisnc_vol.attrs
display(seisnc_xr.attrs)

In [None]:
seisnc_xr["cdp_x"] = seisnc_xr["cdp_x"].mean(dim=["twt"])
seisnc_xr["cdp_y"] = seisnc_xr["cdp_y"].mean(dim=["twt"])
seisnc_xr = seisnc_xr.set_coords(["cdp_x", "cdp_y"])
print(seisnc_xr)

# Questions - Slack time because we will run over.

 - Fall backs to chat about memory management, dask, other file formats such as ZGY and Zarr
 - Demo of CLI for quick looks at headers or EBCIDC
 - Contribution Opportunities / Community led development
