### `themachinethatgoesping` tutorial series
# Tutorial 2: Introduction

In this tutorial, we show how to speed op repeated file opening using caching.

`themachinethatgoesping` concepts covered:
- "File Handler" object
- data loading
- file caching

## Summary

In [1]:
%matplotlib widget
import os

from matplotlib import pyplot as plt
from themachinethatgoesping.echosounders import index_functions
from themachinethatgoesping.echosounders import kongsbergall
from time import time

folders = []
folders.append("../unittest_data")

# list raw data files
files = index_functions.find_files(folders, [".all","wcd"])
files.sort()

# -- File caching --
# So the steps where the same as in the previous demo, no we create cache directories using the get_cache_file_paths function
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)
index_functions.print_cache_file_statistics(cacheFilePaths)
index_functions.remove_name_from_cache(cacheFilePaths, "FilePackageIndex")
# cacheFilePaths is a dictionary with the same keys as files, but the values are the paths to the cache files
# Passing cacheFilePahts to the FileHandler causes the FileHandler to either 
# - create the cache files (if they don't exist) 
# - or load them (if they do exist)
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)

# compare loading times in repeated file loading
t1 = time()
# without using the cache
fh = kongsbergall.KongsbergAllFileHandler(files)
t2 = time()
# using the cache
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)

Found 16 files
FilePackageIndex: 0.03 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.1 'MB' / 0.45 %
FilePackageCache<RuntimeParameters>: 0.0 'MB' / 0.01 %
FilePackageCache<WaterColumnInformation>: 0.37 'MB' / 1.72 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.51 'MB' / 2.36 %
- Source files -: 21.56 'MB' / 100.0 %
indexing files ⠄ 100% [00m:00s<00m:00s] [..2459809945665151.wcd (16/16)]                                
indexing files ⠂ 100% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              
indexing files ⠐ 100% [00m:00s<00m:00s] [..9880139284219668.all (1/16)]                               
indexing files ⠠ 100% [00m:00s<00m:00s] [..2459809945665151.wcd (16/16)]                                
indexing files ⢀ 100% [00m:00s<00m:00s] [Fo

In [2]:
ping = fh.get_pings()[0]
print(ping)

KongsbergAllPing
################
-
Ping infos 
-------------                                                                                                                                
- Channel id:             TRX-210                                                                                                                          
- Time info:              21/08/2012 17:09:42.36                                                                                                          
                          [1345568982.359000]
- Features:               .get_timestamp, .get_datetime, .get_channel_id, .get_sensor_configuration, .get_sensor_data_latlon, .get_geolocation             
- Feature groups:         .bottom, .watercolumn                                                                                                            
- Features(.bottom):      .bottom : .get_two_way_travel_times, .get_xyz, .get_tx_signal_parameters, .get_number_of_tx_sectors, .get_beam_cros

## Step-by-step
### 1. Find raw data files (see previous demo)

In [3]:
# define a list of folder(s) to search for raw data files
# notes: 
#   - subdirectories will be searched as well
#   - you can add multiple folders by appending them to the list
#   - pair of files (e.g. .all and .wcd) don't have to be in the same folder
folders = []
folders.append("../unittest_data")

# find all Kongsberg files in the list of folders
from themachinethatgoesping.echosounders import index_functions
files = index_functions.find_files(folders, [".all","wcd"])

# show files found
print(f"The output is a {type(files)} object with {len(files)} elements:")
files.sort()
for i, file in enumerate(files):
    print(f"({i}/{len(files)}) {file}")

Found 16 files
The output is a <class 'list'> object with 16 elements:
(0/16) ../unittest_data/kongsberg/a/c/519880139284219668.all
(1/16) ../unittest_data/kongsberg/a/c/519880139284219668.wcd
(2/16) ../unittest_data/kongsberg/a/f/ALL/6516408039690331208.all
(3/16) ../unittest_data/kongsberg/a/f/WCD/6516408039690331208.wcd
(4/16) ../unittest_data/kongsberg/a/y/-1333931979274893952.all
(5/16) ../unittest_data/kongsberg/a/y/-1333931979274893952.wcd
(6/16) ../unittest_data/kongsberg/g/-7041029013895133878.all
(7/16) ../unittest_data/kongsberg/g/-7041029013895133878.wcd
(8/16) ../unittest_data/kongsberg/he/-3092155654265012286.all
(9/16) ../unittest_data/kongsberg/he/-3092155654265012286.wcd
(10/16) ../unittest_data/kongsberg/simon/7287506659992808476.all
(11/16) ../unittest_data/kongsberg/simon/7287506659992808476.wcd
(12/16) ../unittest_data/kongsberg/turbeams/6641182978793103390.all
(13/16) ../unittest_data/kongsberg/turbeams/6641182978793103390.wcd
(14/16) ../unittest_data/kongsberg/tu

### 2. Find/create cacheFilePaths data

In [4]:
# caching files when read the first time allows speeding-up loading next times

# each file has a corresponding cache file, by default:
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)

# By default, the cache files are stored in the same directory 
# as the notebook in a newly created "cache" folder

for f, c in cacheFilePaths.items():
    print(f"File: {f}")
    print(f"    Cache: {c}")

File: ../unittest_data/kongsberg/a/c/519880139284219668.all
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/c/519880139284219668.all.tmtgp.cache
File: ../unittest_data/kongsberg/a/c/519880139284219668.wcd
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/c/519880139284219668.wcd.tmtgp.cache
File: ../unittest_data/kongsberg/a/f/ALL/6516408039690331208.all
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/f/ALL/6516408039690331208.all.tmtgp.cache
File: ../unittest_data/kongsberg/a/f/WCD/6516408039690331208.wcd
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/f/WCD/6516408039690331208.wcd.tmtgp.cache
File: ../unittest_data/kongsberg/a/y/-133393

In [5]:
# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)

# Note, if this is the first time you run the code, 
# the cache files do not exist yet, and there are no statistics to print


-- Cache file statistics --
FilePackageIndex: 0.03 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.1 'MB' / 0.45 %
FilePackageCache<RuntimeParameters>: 0.0 'MB' / 0.01 %
FilePackageCache<WaterColumnInformation>: 0.37 'MB' / 1.72 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.51 'MB' / 2.36 %
- Source files -: 21.56 'MB' / 100.0 %


### 3. Load data with cache file paths

In [6]:
# load the data with the cache files (if this is the first time you run it, the cache files will be created):
from themachinethatgoesping.echosounders import kongsbergall
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)


indexing files ⢀ 98% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              


### 4. Compare loading times

In [7]:
# compare loading times in repeated file loading
t1 = time()
# load data without using the cache
fh = kongsbergall.KongsbergAllFileHandler(files)
t2 = time()
# load data using the cache files
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

indexing files ⠐ 100% [00m:00s<00m:00s] [..9880139284219668.all (1/16)]                               
indexing files ⠠ 100% [00m:00s<00m:00s] [..2459809945665151.wcd (16/16)]                                
indexing files ⢀ 100% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              
indexing files ⢀ 98% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              

-- Compare loading times --
Time without cache: 0.014 seconds
Time with cache:    0.008 seconds


### 4. Investigate cache files

In [8]:
# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)


-- Cache file statistics --
FilePackageIndex: 0.03 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.1 'MB' / 0.45 %
FilePackageCache<RuntimeParameters>: 0.0 'MB' / 0.01 %
FilePackageCache<WaterColumnInformation>: 0.37 'MB' / 1.72 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.51 'MB' / 2.36 %
- Source files -: 21.56 'MB' / 100.0 %
