# Prepare your data: Multiple acquisitions

Sometimes you might need to work with different DataArray at the same time. Xdas introduce the `DataCollection` object which is a nesting of DataArray objects. In the context of DAS it is typically used to combine multiple acquisition with potentially different sampling configurations together or to facilitate operations across different instruments.

It is imported to know that data collections come in two flavours:
- `DataMapping` that behaves as `dict`
- `DataSequence` that behaves as `list`

Here we introduce two scenarios:
- Multi-instruments acquisition.
- Extraction of a catalog of events. 

In [1]:
import numpy as np
import xdas as xd

## 1. Multi-instrument acquisitions

If you had the opportunity to use several instrument to interrogate several cables at the same time you might end up with several acquisitions. In this example, we investigate 10 min of data recorded simultaneously by three cables.

### Preparing the data

Let's have a look at the `data/gps_multicable` folder. The hdf5 files are stored in a folder architecture organized by:
- node: the landing point of the fiber, where the instrument is located
- fiber orientation: N for a fiber going to the north and S to the south
- acquisition: all the hdf5 files of the folder

Xdas implements the `open_datatree` function to deal with such file hierarchy.

The folder architecture is described by passing a string with {} for folders levels and the files with [].

In [2]:
dc = xd.open_mfdatatree(
    "data/gps_multicable/{node}/{cable}/[acquisition].hdf5", engine="asn"
)
dc

Node:
  CCN: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 9998)>
  SER: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 10000)>
      S: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 10000)>

The CCN node has 1 fiber going to the north and SER node has 2 fibers going to north and south.

In this example, we have one acquisition per fiber because no change in acquisition parameter was done over that time period. Otherwise several data arrays will be present in the acquisition list.

## Working with DataCollection objects

Collection produced with the `open_datatree` function are a nesting of DataMapping objects (behaves as a dict) down to the last leave that is a DataSequence objects (behaves as list)

Everything you can do with dict and list can be done with data mappings sequences.

### Getting and element

In [3]:
dc["CCN"]["N"][0]

<xdas.DataArray (time: 37500, distance: 9998)>
VirtualStack: 715.1MB (int16)
Coordinates:
  * time (time): 2023-11-03T12:20:06.000 to 2023-11-03T12:30:05.984
  * distance (distance): 0.000 to 153149.070

### Setting new elements

In [4]:
# Here we set copy of existing data for simplicity
dc_set = dc.copy(deep=True)
dc_set["CCN"]["S"] = dc["CCN"]["N"].copy()  # add same data to another fiber for example
dc_set["CCN"]["S"].append(dc["CCN"]["N"][0].copy())  # behaves like a list
dc_set.update({"CAL": dc["CCN"].copy()})  # behaves like a dict
dc_set

Node:
  CCN: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 9998)>
      S: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 9998)>
          1: <xdas.DataArray (time: 37500, distance: 9998)>
  SER: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 10000)>
      S: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 10000)>
  CAL: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 37500, distance: 9998)>

### Iterating

In [5]:
for node in dc:
    for cable in dc[node]:
        for da in dc[node][cable]:
            ...
            # do something with da

Some methods of the `DataArray` class has been ported to the `DataCollection` class. In that case the operation is applied to each element of the collection :

### Label-based indexing

In [6]:
dc_sel = dc.sel(time=slice("2023-11-03T12:26:40", "2023-11-03T12:27:50"))
dc_sel

Node:
  CCN: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 4376, distance: 9998)>
  SER: 
    Cable:
      N: 
        Acquisition:
          0: <xdas.DataArray (time: 4375, distance: 10000)>
      S: 
        Acquisition:
          0: <xdas.DataArray (time: 4376, distance: 10000)>

### Saved linked collection to disk

Once you are happy with you collection you can also write it down virtually for later work.

In [7]:
dc.to_netcdf("outputs/multicable.nc")  # virtual=True by default

## 2. Collection of events

In this example start from a single acquisition. We then construct a list or a dict of earthquakes with or without IDs.

In [8]:
# back to one acquisition
da = dc["SER"]["N"][0]
da.to_netcdf("outputs/singlecable.nc")  # save for later use

### As a DataSequence object

In [9]:
# Passing a list gives a DataSequence
xd.DataCollection(
    [
        da.sel(time=slice("2023-11-03T12:23:35", "2023-11-03T12:24:00")),
        da.sel(time=slice("2023-11-03T12:26:40", "2023-11-03T12:27:50")),
    ],
    name="event",  # optional name
)

Event:
  0: <xdas.DataArray (time: 1563, distance: 10000)>
  1: <xdas.DataArray (time: 4375, distance: 10000)>

### As a DataMapping object

In [10]:
# Passing a dict gives a DataMapping
xd.DataCollection(
    {
        "id_000000": da.sel(time=slice("2023-11-03T12:23:35", "2023-11-03T12:24:00")),
        "id_000001": da.sel(time=slice("2023-11-03T12:26:40", "2023-11-03T12:27:50")),
    },
    name="event",  # optional name
)

Event:
  id_000000: <xdas.DataArray (time: 1563, distance: 10000)>
  id_000001: <xdas.DataArray (time: 4375, distance: 10000)>

### Combining everything

For fun we can also combine both scenarios in a single DataCollection:

In [11]:
# Here each event is a DataCollection
xd.DataCollection(
    {
        "id_000000": dc.sel(time=slice("2023-11-03T12:23:35", "2023-11-03T12:24:00")),
        "id_000001": dc.sel(time=slice("2023-11-03T12:26:40", "2023-11-03T12:27:50")),
    },
    name="event",  # optional name
)

Event:
  id_000000: 
    Node:
      CCN: 
        Cable:
          N: 
            Acquisition:
              0: <xdas.DataArray (time: 1563, distance: 9998)>
      SER: 
        Cable:
          N: 
            Acquisition:
              0: <xdas.DataArray (time: 1563, distance: 10000)>
          S: 
            Acquisition:
              0: <xdas.DataArray (time: 1563, distance: 10000)>
  id_000001: 
    Node:
      CCN: 
        Cable:
          N: 
            Acquisition:
              0: <xdas.DataArray (time: 4376, distance: 9998)>
      SER: 
        Cable:
          N: 
            Acquisition:
              0: <xdas.DataArray (time: 4375, distance: 10000)>
          S: 
            Acquisition:
              0: <xdas.DataArray (time: 4376, distance: 10000)>