### `themachinethatgoesping` tutorial series
# Tutorial 3: Working with pings

In this tutorial, we explain what ping features are and how they are used

`themachinethatgoesping` concepts covered:
- Ping features
- filtering pings

## Summary

In [None]:
%matplotlib widget
import os

from matplotlib import pyplot as plt
from themachinethatgoesping.echosounders import index_functions
from themachinethatgoesping.echosounders import kongsbergall
from time import time

folders = []
folders.append("../unittest_data")

# find raw data files and open them
files = index_functions.find_files(folders, [".all","wcd"])
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)

# -- File caching --
# So the steps where the same as in the previous demo, no we create cache directories using the get_cache_file_paths function
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)
index_functions.remove_name_from_cache(cacheFilePaths, "FilePackageIndex")
# cacheFilePaths is a dictionary with the same keys as files, but the values are the paths to the cache files
# Passing cacheFilePahts to the FileHandler causes the FileHandler to either 
# - create the cache files (if they don't exist) 
# - or load them (if they do exist)

# compare loading times in repeated file loading
t1 = time()
# without using the cache
fh = kongsbergall.KongsbergAllFileHandler(files)
t2 = time()
# using the cache
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)

## Step-by-step
### 1. Find raw data files and open them (see previous demo)

In [None]:
# define a list of folder(s) to search for raw data files
# notes: 
#   - subdirectories will be searched as well
#   - you can add multiple folders by appending them to the list
#   - pair of files (e.g. .all and .wcd) don't have to be in the same folder
folders = []
folders.append("../unittest_data")

# find all Kongsberg files in the list of folders
from themachinethatgoesping.echosounders import index_functions
from themachinethatgoesping.echosounders import kongsbergall

files = index_functions.find_files(folders, [".all","wcd"])
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)


### 2. Extract all pings from the data (see demo 01)


In [None]:
all_pings = fh.get_pings()

# pings is a list of Ping objects in fh
# By default get_pings will sort the pings by time
print('Number of pings in fh:', len(all_pings))
print()

In [None]:
# Each ping object is associated with a physical ping from an echosounder.abs
# Pings objects thereby only load auxiliary information into memory (e.g. the geolocation)
# Large data, such as watercolumn samples remain in memory. They will be read when needed. (e.g. by calling ping.watercolumn.get_amplitudes())
# you can access the pings by index
first_ping = all_pings[0]

#print some information
print(first_ping)

In [None]:
# There is a number of functions that each ping, indipendent of the source format, provides.
# These are e.g:
print('geolocation:', first_ping.get_geolocation())                                    # geolocation and attitude of the transducer
print()
print('Channel id:', first_ping.get_channel_id())                                      # Channel id / Transducer id
print(f'Timestamp {first_ping.get_timestamp()} Datetime: {first_ping.get_datetime()}') # Ping timestamp / datetime

# Further access to the ping data are seperated into different namespaces:
# - ping.bottom for seafloor related data
# - ping.watercolumn for watercolumn related information



### HOWEVER: what if a specific ping does not contain watercolumn information? Or bottom detection information?


In [None]:
# This can be because the source format does not support this kind of information, or because collecting this information
# was disabled during recording time 
# Accessing functions from ping.bottom would then fail if there are not bottom detection information available for this ping
# To solve this problem, Ping uses so called features that dynamically inform you which data can be accessed for each ping

# Print features that could be available for a ping
# Each of these features is associated with two functions:
# has_<feature_name>() [e.g. has_timestamp()] which informs you if this feature is available
# get_<feature_name>() [e.g. get_timestamp()] which gives you the data associated with this feature
print('possible features Ping')
for f in first_ping.possible_features():
    print(f'-{f}')

In [None]:
#Further dataaccess is split into subgroups
print('possible feature groups Ping')
for f in first_ping.possible_feature_groups():
    print(f'-{f}')

# .file_data gives you access to the raw data packages and the source file associated with a specific ping
# this will be part of another tutorial
# .bottom handles access to seafloor detection related data
# .watercolumn to watercolumn related data
# feature groups are associated with two functions as well:
# .has_<feature_group_name>() (e.g. has_watercolumn())
# .<feature_group_name> (e.g. .watercolumn) Notice the missing '()'

# lets take a ping that has watercolumn data
for ping in all_pings:
    if ping.has_watercolumn():
        break;

# we now know that 'ping' contains watercolumn data
# we can thus safely call ping.watercolumn
# the subgroup watercolumn again contains features that work the same way as described above
print()
print('possible features Ping.watercolumn')
for f in ping.watercolumn.possible_features():
    print(f'-{f}')

## Filter pings by features

In [None]:
# Pings can be sorted into new lists
# we now make use of this and filter all pings by specific features we want to use in our processing
# for example, here we create a new list of pings that must contain watercolumn amplitude data

from tqdm.auto import tqdm

pings_with_watercolumn = []

for ping in tqdm(all_pings):
    if ping.has_watercolumn():
        if ping.watercolumn.has_amplitudes():
            pings_with_watercolumn.append(ping)

print('Pings with watercolumn amplitudes:', len(pings_with_watercolumn))

In [None]:
# We now know that we can safely call ping.watercolumn.get_amplitudes() for all pings in the newly created pings_with_watercolumn list

pings_with_watercolumn[10].watercolumn.get_amplitudes()