### `themachinethatgoesping` tutorial series
# Tutorial 3: multiple files file caching

In this tutorial, we show how open multiple files at once and how to speed up repeated file opening using caching.

`themachinethatgoesping` concepts covered:
- "File Handler" object
- data loading for multiple files
- file caching

## Summary

In [18]:
%matplotlib widget
import os

from matplotlib import pyplot as plt
from themachinethatgoesping.echosounders import index_functions
from themachinethatgoesping.echosounders import kongsbergall
from time import time

folders = []
folders.append("../../unittest_data")
folders = ['/home/data/turbeams/TURBEAMS_data_crunshing/campaigns/TURBEAMS_April_2023/']

# list raw data files
files = index_functions.find_files(folders, [".all","wcd"])
files.sort()

# -- File caching --
# So the steps where the same as in the previous demo, no we create cache directories using the get_cache_file_paths function
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)
index_functions.print_cache_file_statistics(cacheFilePaths)
index_functions.remove_name_from_cache(cacheFilePaths, "FilePackageIndex")
# cacheFilePaths is a dictionary with the same keys as files, but the values are the paths to the cache files
# Passing cacheFilePahts to the FileHandler causes the FileHandler to either 
# - create the cache files (if they don't exist) 
# - or load them (if they do exist)
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)

# compare loading times in repeated file loading
t1 = time()
# without using the cache
fh = kongsbergall.KongsbergAllFileHandler(files)
t2 = time()
# using the cache
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)

Found 288 files
indexing files ⠠ 100% [00m:05s<00m:00s] [..1_180312_Belgica.wcd (286/288)]                                  
indexing files ⢀ 100% [00m:05s<00m:00s] [..1_181812_Belgica.wcd (288/288)]                                  
indexing files ⡀ 100% [00m:05s<00m:00s] [Found: 5130471 datagrams in 288 files (25571MB)]                                                 
Initializing navigation ⠁ 96% [00m:05s<00m:00s] [141/144]                   
Initializing ping interface ⠁ 99% [00m:14s<00m:00s] [Done]                                              
indexing files ⠂ 100% [00m:04s<00m:00s] [..1_181812_Belgica.all (287/288)]                                  
indexing files ⠁ 100% [00m:04s<00m:00s] [..1_181812_Belgica.wcd (288/288)]                                  
indexing files ⠈ 100% [00m:04s<00m:00s] [Found: 5130471 datagrams in 288 files (25571MB)]                                                 
Initializing navigation ⠄ 96% [00m:03s<00m:00s] [141/144]                   
Initializin

## Step-by-step

### 0. Basic Setup

In [19]:
# First you have to import themachinethatgoesping
# here we import it as 'Ping', note the capital P
import themachinethatgoesping as Ping

#set the data folders where data files can be found (../../unittest data holds a couple of very small test files)
data_folders = []
data_folders.append("../../unittest_data")

import themachinethatgoesping as Ping
from themachinethatgoesping.echosounders import index_functions


#set the data folders where data files can be found (../../unittest data holds a couple of very small test files)
data_folders = []
data_folders.append("../../unittest_data")

# To find all find all Kongsberg files in the list of data_folders
kongsberg_files = index_functions.find_files(data_folders, [".all","wcd"])

Found 16 files


### 2. Find/create cacheFilePaths data

In [20]:
# caching files when read the first time allows speeding-up loading next times

# each file has a corresponding cache file, by default:
cacheFilePaths_kongsberg = index_functions.get_cache_file_paths(kongsberg_files)

# By default, the cache files are stored in the same directory 
# as the notebook in a newly created "cache" folder

for f, c in cacheFilePaths_kongsberg.items():
    print(f"File: {f}")
    print(f"    Cache: {c}")

File: ../../unittest_data/kongsberg/simon/7287506659992808476.all
    Cache: /ssd/src/themachinethatgoesping/tutorials/tutorials/0_basic_concepts/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/simon/7287506659992808476.all.tmtgp.cache
File: ../../unittest_data/kongsberg/simon/7287506659992808476.wcd
    Cache: /ssd/src/themachinethatgoesping/tutorials/tutorials/0_basic_concepts/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/simon/7287506659992808476.wcd.tmtgp.cache
File: ../../unittest_data/kongsberg/a/y/-1333931979274893952.wcd
    Cache: /ssd/src/themachinethatgoesping/tutorials/tutorials/0_basic_concepts/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/y/-1333931979274893952.wcd.tmtgp.cache
File: ../../unittest_data/kongsberg/a/y/-1333931979274893952.all
    Cache: /ssd/src/themachinethatgoesping/tutorials/tutorials/0_basic_concepts/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_

In [21]:
# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths_kongsberg)

# Note, if this is the first time you run the code, 
# the cache files do not exist yet, and there are no statistics to print


-- Cache file statistics --
FilePackageIndex: 0.03 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.1 'MB' / 0.45 %
FilePackageCache<RuntimeParameters>: 0.0 'MB' / 0.01 %
FilePackageCache<WaterColumnInformation>: 0.37 'MB' / 1.72 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.51 'MB' / 2.36 %
- Source files -: 21.56 'MB' / 100.0 %


### 3. Load data with cache file paths

In [22]:
# load the data with the cache files (if this is the first time you run it, the cache files will be created):
from themachinethatgoesping.echosounders import kongsbergall
fh = kongsbergall.KongsbergAllFileHandler(kongsberg_files, file_cache_paths = cacheFilePaths_kongsberg)


indexing files ⢀ 98% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              


### 4. Compare loading times

In [23]:
# compare loading times in repeated file loading
t1 = time()
# load data without using the cache
fh = kongsbergall.KongsbergAllFileHandler(kongsberg_files)
t2 = time()
# load data using the cache files
fh = kongsbergall.KongsbergAllFileHandler(kongsberg_files, file_cache_paths = cacheFilePaths_kongsberg)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

indexing files ⠐ 100% [00m:00s<00m:00s] [..7506659992808476.all (1/16)]                               
indexing files ⠠ 100% [00m:00s<00m:00s] [..2155654265012286.wcd (16/16)]                                
indexing files ⢀ 100% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              
indexing files ⢀ 98% [00m:00s<00m:00s] [Found: 1168 datagrams in 16 files (21MB)]                                          
Initializing datagramdata interface ⠈ 0% [00m:00s<00m:00s]           
Initializing ping interface ⢀ 87% [00m:00s<00m:00s] [Done]                                              

-- Compare loading times --
Time without cache: 0.017 seconds
Time with cache:    0.007 seconds


### 4. Investigate cache files

In [24]:
# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths_kongsberg)


-- Cache file statistics --
FilePackageIndex: 0.03 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.1 'MB' / 0.45 %
FilePackageCache<RuntimeParameters>: 0.0 'MB' / 0.01 %
FilePackageCache<WaterColumnInformation>: 0.37 'MB' / 1.72 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.51 'MB' / 2.36 %
- Source files -: 21.56 'MB' / 100.0 %
