# 1.2 Waveform data handling in *Pyrocko*

Pyrocko ships different techniques to load and to display data from simple waveform file reading to a scan of directories using the `squirrel`.

This chapter shows, how to load, manipulate, process and/or save waveforms in *Pyrocko* (following multiple   [examples](https://pyrocko.org/docs/current/library/examples/)).

It includes 
* loading,
* plotting,
* cutting,
* filtering,
* spectral plot and
* saving

of waveforms.


## Contents:
* [Loading and first inspection](#sec1)
* [Simple processing steps - Cutting, Filtering, Spectral plot](#sec2)
* [Saving](#sec3)
* [Summary](#sum)


## Loading and first inspection <a class="anchor" id="sec1"></a>

*Pyrocko* waveforms are loaded directly using the `load` function. It will load waveforms from the path(s) into a `traces`. 

In [None]:
# Pyrockos' simple loading function
from pyrocko.io import load

# This is a list of traces containing the waveform data
traces = load(filename='data/data_GE.KTHA..HHZ_2020-10-30_11-30-26.mseed')

Alternatively there is the `Squirrel`, a versatile database manager ([squirrel introduction](https://pyrocko.org/docs/current/library/reference/squirrel/index.html)). Loading in this case is equally simple:

In [None]:
# Pyrockos' Squirrel class
from pyrocko.squirrel import Squirrel

# Initialization of the database manager
sq = Squirrel()

# Load waveform data into the squirrel database
sq.add('data/data_GE.KTHA..HHZ_2020-10-30_11-30-26.mseed')

The `Squirrel` becomes very handy for a lot of data, as it is fast and very effective in memory usage. Hence, in the following we will show both ways of data handling:
* handling data loaded via `load` and
* handling data loaded via the `Squirrel`.

To start with, let's check the meta data of the waveforms we have loaded.

In [None]:
# Accessing an individual trace (here the first) from the trace list
tr = traces[0].copy()
print(tr)

# OR accessing an individual trace (here the first) from the squirrel
tr = sq.get_waveforms()[0]
print(tr)

The output shows, that the trace extracted was recorded at the station `GE.KTHA..HHZ` on 2020-10-30 with a sampling rate of 0.01 s. Both traces loaded via `load` and the `Squirrel` are equivalent (excellent).

For the `Squirrel` instance further meta data is shown using the `print` command:

In [None]:
# Show squirrel meta data
print(sq)

We know now something about the record (when, which station, sampling rate etc.), but we have not seen the waveform yet. *Pyrocko* comes with an interactive waveform viewer and real time processing tool, the `Snuffler` (more later). Here we will just have a look at the waveforms ([further information here](https://pyrocko.org/docs/current/apps/snuffler/manual.html)).

One tip: Familiarize yourself with the display of long waveforms and zooming within `Snuffler`. 

In [None]:
# Plot of the data loaded using `load`
from pyrocko.trace import snuffle
snuffle(traces);

# OR plot of the loaded data using the squirrel
sq.snuffle();

## Simple processing steps - Cutting, Filtering, Spectral plot  <a class="anchor" id="sec2"></a>

In this section some easy but frequently used processing and analysis tools for seismic waveforms are demonstrated based on the loaded waveform.

The loaded waveform, as seen above, contains the seismic signal from `11:50:00` to around `12:05:00`, but also a long period of quiescence. Let's remove the quiet parts of the waveforms for a better glimpse at the actual signal. This can be done with the method `chop` (for individual traces) or the `get_waveforms` (for the `Squirrel`), which requires a time range to cut the waveforms to.
So here we define the time range:

In [None]:
# Import of the Pyrocko time formater
from pyrocko.util import str_to_time

# Defintion of tmin and tmax
tmin = str_to_time('2020-10-30 11:50:00')
tmax = str_to_time('2020-10-30 12:05:00')

Now we are able to slice the traces within both the `traces` list and the `Squirrel` to the decided time range:

In [None]:
# Generate a new list (traces_cut) with the sliced waveform
traces_cut = []
for tr in traces:
    traces_cut.append(
        tr.chop(
            tmin,
            tmax,
            include_last=True,
            inplace=False))

# OR use the Squirrel with its chopper_waveforms
sq_cut = Squirrel()
for tr in sq.get_waveforms(
        tmin=tmin, tmax=tmax, include_last=True):

    sq_cut.add_volatile_waveforms([tr])

To inspect, if we have been successful, we can use the `snuffle` method again:

In [None]:
# Plot of the data loaded using `load`
snuffle(traces_cut);

# OR plot of the loaded data using the squirrel
sq_cut.snuffle();

Here we have our earthquake signal. It is characterized by heterogeneous amplitudes and frequencies. Let's first checkout the distribution of amplitudes vs. frequencies as done within the signal processing module (using the Fourier transform):

<div class="alert alert-success">
    <p style="font-weight: bold; font-size: 150%">Task 2:</p>
    <ol>
        <li>Start Snuffler with the cut waveforms.</li>
        <li>Apply the snuffling .</li>
        <li>Calculate the amplitude spectrum using real fast fourier transformation.</li>
        <li>Plot the frequency-depending amplitude spectrum using the logarithmic axes scaling for the frequencies.</li>
    </ol>
</div>

A profound look onto the frequency-amplitude distribution over time is provided when calculating a `spectrogram`. In Pyrocko this is best achieved within `Snuffler` using the extension (a so called [`Snuffling`](https://git.pyrocko.org/pyrocko/contrib-snufflings)) `spectrogram`.

<div class="alert alert-success">
    <p style="font-weight: bold; font-size: 150%">Task 3 <em>(optional)</em>:</p>
    <ol>
        <li>Install the <em>Snufflings</em> from <a href="https://git.pyrocko.org/pyrocko/contrib-snufflings">git.pyrocko.org/pyrocko/contrib-snufflings</a>.</li>
        <li>Start Snuffler with the cut waveforms.</li>
        <li>Generate the spectrogram.</li>
    </ol>
</div>


Hope, you have been successful. In this case, what do we see?
* When does the earthquake start?
* How does the frequency content and the amplitudes change over time?
* In which frequency range do you observe the largest amplitudes?

To look closer into the waveform within the dominant frequency range, let's apply a filter on our cut waveforms:

In [None]:
# Filter each trace within the cut list (traces_cut)
for tr in traces_cut:
    tr.lowpass(4, 0.1)
    tr.highpass(4, 0.01)

# OR use the Squirrel with its chopper_waveforms
sq_filter = Squirrel()
for tr in sq_cut.get_waveforms():
    tr.lowpass(4, 0.1)
    tr.highpass(4, 0.01)

    sq_filter.add_volatile_waveforms([tr])

<div class="alert alert-warning">
    Filtering in <strong>Pyrocko</strong> might cause phase shifts of the waveform! Phase true filtering is done using <em>trace.transfer</em>!
</div>

And of course we should have a look again at the result.
* What has changed compared to the raw waveform?

In [None]:
# Plot of the data loaded using `load`
snuffle(traces_cut);

# OR plot of the loaded data using the squirrel
sq_filter.snuffle();

## Saving  <a class="anchor" id="sec3"></a>

We have applied multiple processing steps (cutting, filtering) to our raw waveform. The genereated waveform shall be stored now for any later use, so you don't have to apply the processing chain again. This is done using the `save` method.
So we will loop over each trace within our stream and write the trace in the same format as the one loaded (reminder: `data_GE.KTHA..HHZ_2020-10-30_11-30-26.mseed`), but adding a `processed_` flag into it: 

In [None]:
from pyrocko.io import save

# Save trace list
save(
    traces_cut,
    filename_template='processed_data_%(network)s.%(station)s.%(location)s.%(channel)s_%(tmin)s.mseed')

# Save squirrel trace list
save(
    sq_filter.get_waveforms(include_last=True),
    filename_template='processed_data_%(network)s.%(station)s.%(location)s.%(channel)s_%(tmin)s.mseed')


## Summary  <a class="anchor" id="sum"></a>

Here we have covered simple techniques to 
* **load** waveforms into an Pyrocko traces and the Pyrocko Squirrel database,
* waveform visualization techniques of the waveform and its frequency content within **Snuffler**,
* simple processing steps (**filtering** and **cutting**),
* writing of waveforms into a file (**save**).