# Data retrieval using Pyrocko with Squirrel

The Squirrel, a new database handler by Pyrocko, ships with some easy functions to retrieve data from online resources. (If you are interested in downloading data without the Squirrel check the notebook [pyrocko old](x_Pyrocko_old.ipynb)). The advantage of Squirrel is that it memorizes if the requested data is already in your squirrel database, and therefore does not download it again, but just reloads it. 

We will create a simple Squirrel event, station meta data and waveform request based on this [Reference](https://pyrocko.org/docs/current/library/examples/squirrel/cli_tool.html#variant-1-quick-and-dirty-script).


### Tabel of content:
- [Download Events](#Event-catalog)
- [Download Station Meta Data](#Stations)
- [Download Waveforms](#Waveform)

- [Summary](#Summary)

In [None]:
import numpy as np
from pyrocko import progress
from pyrocko.util import time_to_str, str_to_time_fillup as stt
from pyrocko.squirrel import Squirrel

First we initialize a squirrel instance.

In [None]:
sq = Squirrel()

## Event catalog
Squirrel requires a catalog providing agency. In the moment Global CMT (`gcmt`) and GEOFON (`geofon`) are implemented. Here we will choose GEOFON.

In [None]:
sq.add_catalog('geofon') # or gcmt

We do also need to set a time span, within which events are searched for. 

In [None]:
tmin = stt('2016-10-30 00:00:00')  # Start time of the search window
tmax = stt('2016-10-30 23:50:00')  # End time of the search window

sq.update(tmin=tmin, tmax=tmax)  # Define the time window within Squirrel

With the *get_events()* function, we select now all events within the time window defined above.

In [None]:
evs = sq.get_events()

In [None]:
len(evs)

We can iterate through the events. In the following we will print the time of the Norcia earthquake (as an example):

In [None]:
for ev in evs:
    if ev.magnitude > 6:
        print('Magnitude: {}, Region: {}'.format(ev.magnitude, ev.region))
        
        if ev.region == 'Central Italy' and ev.magnitude == 6.5:
            tevent = ev.time


## Stations

Equivalent to the event download we need to select a source of our station meta data first. We will download the data via an FDSN server, here provided by *BGR* (Bundesanstalt für Geowissenschaften und Rohstoffe). We are also able to restrict our search to specific seismic networks, stations or recorded channels. That will limit the number of downloaded station meta data and, hence, decrease computation time. 

In [None]:
# Add online data source.
sq.add_fdsn(
    'bgr',               # BGR FDSN source
    query_args=dict(
        network='GR',    # Restrict query to 'GR' network
        channel='LH?'))  # Restrict query to 'LH?' channels

sq.update(tmin=tmin, tmax=tmax)  # Update the Squirrel to ensure meta data is up to date

Similar to the FDSN *level* option, Squirrel offers different levels of downloading meta data:
* *get_stations()*
* *get_channels()* 
* *get_response()*

All of them can be further restricted by using the *code* argument. There you can give any `network.station.location(.channel)` code you are interested in.

In [None]:
stas = sq.get_stations(codes='*.*.*')
# stas = sq.get_stations(codes='*.BFO.*')
print(len(stas))

We are now able to iterate through the downloaded stations and check their content:

In [None]:
for sta in stas:
    print(sta)

Similarly to station data download we can also catch the channel information. In our case we limit the results to show only channels of `BFO` - the Black Forrest Observatory: 

In [None]:
chas = sq.get_channels(codes='*.BFO.*.*')
print(len(chas))

In [None]:
for cha in chas:
    print(cha)

The most detailed level of meta data download will include the response information of each requested channel. Here, we further need to restrict our query to catch only the response of the `LHZ` channel of `BFO` valid within our defined time window: 

In [None]:
sq.update_responses(codes='GR.BFO..LH?')  # Is required only once
resps = sq.get_response(tmin=tmin, tmax=tmax, codes='GR.BFO..LHZ')
print(resps)

Note that for *get_response()* only one station at once is allowed. Therefore, to get all responses of your stations you have to use a loop. You can give the channel object directly, as it will automatically retrieves all informations. **HINT** Don't forget to update the Squirrel Responses with an adopted code restriction as otherwise no response information will be seen.

In [None]:
for cha in chas:
    resps = sq.get_response(cha)
    print(cha)
    print(resps)

## Waveform

We have previously checked for events on 30th October 2016 (found the 2016 Norcia earthquake) and also collected available stations and channels (e.g. `GR.BFO`). Now we want to request waveforms for the chosen 2016 Norcia earthquake, for one specific station - so `GR.BFO`. 

As done for events and the responses we do also need to update the Squirrel to handle and download the waveforms correctly. This can be achieved by setting/updating the waveform promises ([Ref](https://pyrocko.org/docs/current/library/reference/squirrel/base.html#pyrocko.squirrel.base.Squirrel.update_waveform_promises)).



In [None]:
# Set the time window of the waveform data request
tmin = tevent + 0.  
tmax = tevent + 1500.

# Ensure meta-data from online sources is up to date for the selected time span:
sq.update(tmin=tmin, tmax=tmax)

# Allow waveform download for station BFO. This does not download anything
# yet, it just enables downloads later, when needed. Omit `tmin`, `tmax`,
# and `codes` to enable download of everything selected in `add_fdsn(...)`.
sq.update_waveform_promises(tmin=tmin, tmax=tmax, codes='*.BFO.*.*')

So far no data has been downloaded. Squirrel has only checked for available data. Now, this can be downloaded and stored in `batches` with *chopper_waveforms*.

In [None]:
batches = sq.chopper_waveforms(
        tmin=tmin,
        tmax=tmax,
        codes='*.BFO.*.LH*',
        want_incomplete=False,  # Skip incomplete traces
        snap_window=True)

In [None]:
for batch in batches:
    for tr in batch.traces:        
        print(tr)        
        tr.plot()

In a final step we will now remove the instrument response from the raw waveform to obtain true ground displacement. This is done iterating through the waveforms, getting the responses using *get_response()* and then removing the response:

In [None]:
# Frequency band for restitution in [Hz]:
fmin = 0.01
fmax = 0.05

# Time length for padding (half of overlap).
tpad = 1.0 / fmin

batches = sq.chopper_waveforms(
        tmin=tmin,
        tmax=tmax,
        codes='*.BFO.*.LH*',
        tpad=tpad,               # Add padding to absorb processing artifacts.
        want_incomplete=False,   # Skip incomplete windows.
        snap_window=True)

# Iterate through batches and traces within batches
for batch in batches:
    for tr in batch.traces:
        resp = sq.get_response(tr).get_effective(
            input_quantity='displacement')  # Get effective response to obtain true ground displacement
        
        tr = tr.transfer(
            tpad,                  # Fade in / fade out length.
            (0.5*fmin, fmin,
             fmax, 2.0*fmax),      # Frequency taper.
            resp,                  # Complex frequency response of instrument.
            invert=True,           # Use inverse of instrument response.
            cut_off_fading=False)  # Disable internal trimming.
        
        tr.chop(batch.tmin, batch.tmax)
        
        print(tr)        
        tr.plot()


## Summary

You have learned how to use pyrocko and the squirrel to collect information about events, stations and waveforms.