### `themachinethatgoesping` tutorial series
# Tutorial 2: Introduction

In this tutorial, we show how to speed op repeated file opening using caching.

`themachinethatgoesping` concepts covered:
- "File Handler" object
- data loading
- file caching

## Summary

In [1]:
%matplotlib widget
import os

from matplotlib import pyplot as plt
from themachinethatgoesping.echosounders import index_functions
from themachinethatgoesping.echosounders import kongsbergall
from time import time

folders = []
folders.append("../unittest_data")

# list raw data files
files = index_functions.find_files(folders, [".all","wcd"])
files.sort()

# -- File caching --
# So the steps where the same as in the previous demo, no we create cache directories using the get_cache_file_paths function
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)
index_functions.print_cache_file_statistics(cacheFilePaths)
index_functions.remove_name_from_cache(cacheFilePaths, "FilePackageIndex")
# cacheFilePaths is a dictionary with the same keys as files, but the values are the paths to the cache files
# Passing cacheFilePahts to the FileHandler causes the FileHandler to either 
# - create the cache files (if they don't exist) 
# - or load them (if they do exist)
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)

# compare loading times in repeated file loading
t1 = time()
# without using the cache
fh = kongsbergall.KongsbergAllFileHandler(files)
t2 = time()
# using the cache
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)

Found 18 files
FilePackageIndex: 0.04 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.09 'MB' / 0.33 %
FilePackageCache<WaterColumnInformation>: 0.51 'MB' / 1.93 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.65 'MB' / 2.46 %
- Source files -: 26.58 'MB' / 100.0 %
indexing files ⠐ 100% [00m:00s<00m:00s] [..6328335172073169.all (1/18)]                               
indexing files ⠠ 100% [00m:00s<00m:00s] [..3858047591065953.wcd (18/18)]                                
indexing files ⢀ 100% [00m:00s<00m:00s] [Found: 1509 datagrams in 18 files (26MB)]                                          
Initializing ping interface ⢀ 90% [00m:00s<00m:00s] [Done]                                              
indexing files ⠐ 100% [00m:00s<00m:00s] [..6328335172073169.all (1/18)]                               
indexing files ⠠ 100% [00m:00s<00m:00s] [..3858047591065953.wcd (18/18)]                                
indexing files ⢀ 100% [00m:00s<00m:00s] [Found: 1509 datagrams i



## Step-by-step
### 1. Find raw data files (see previous demo)

In [2]:
# define a list of folder(s) to search for raw data files
# notes: 
#   - subdirectories will be searched as well
#   - you can add multiple folders by appending them to the list
#   - pair of files (e.g. .all and .wcd) don't have to be in the same folder
folders = []
folders.append("../unittest_data")

# find all Kongsberg files in the list of folders
from themachinethatgoesping.echosounders import index_functions
files = index_functions.find_files(folders, [".all","wcd"])

# show files found
print(f"The output is a {type(files)} object with {len(files)} elements:")
files.sort()
for i, file in enumerate(files):
    print(f"({i}/{len(files)}) {file}")

Found 18 files
The output is a <class 'list'> object with 18 elements:
(0/18) ../unittest_data/kongsberg/a/c/8136328335172073169.all
(1/18) ../unittest_data/kongsberg/a/c/8136328335172073169.wcd
(2/18) ../unittest_data/kongsberg/a/f/ALL/7940434004712898291.all
(3/18) ../unittest_data/kongsberg/a/f/WCD/7940434004712898291.wcd
(4/18) ../unittest_data/kongsberg/a/y/-6430362035178526648.all
(5/18) ../unittest_data/kongsberg/a/y/-6430362035178526648.wcd
(6/18) ../unittest_data/kongsberg/e/-7731314027977193437.all
(7/18) ../unittest_data/kongsberg/e/76411649188412698.all
(8/18) ../unittest_data/kongsberg/g/-2784638328592650682.all
(9/18) ../unittest_data/kongsberg/g/-2784638328592650682.wcd
(10/18) ../unittest_data/kongsberg/he/-3740211369500593285.all
(11/18) ../unittest_data/kongsberg/he/-3740211369500593285.wcd
(12/18) ../unittest_data/kongsberg/simon/-4564033532462129271.all
(13/18) ../unittest_data/kongsberg/simon/-4564033532462129271.wcd
(14/18) ../unittest_data/kongsberg/turbeams/-786

### 2. Find/create cacheFilePaths data

In [3]:
# caching files when read the first time allows speeding-up loading next times

# each file has a corresponding cache file, by default:
cacheFilePaths = index_functions.get_cache_file_paths(file_paths=files)

# By default, the cache files are stored in the same directory 
# as the notebook in a newly created "cache" folder

for f, c in cacheFilePaths.items():
    print(f"File: {f}")
    print(f"    Cache: {c}")

File: ../unittest_data/kongsberg/a/c/8136328335172073169.all
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/c/8136328335172073169.all.tmtgp.cache
File: ../unittest_data/kongsberg/a/c/8136328335172073169.wcd
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/c/8136328335172073169.wcd.tmtgp.cache
File: ../unittest_data/kongsberg/a/f/ALL/7940434004712898291.all
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/f/ALL/7940434004712898291.all.tmtgp.cache
File: ../unittest_data/kongsberg/a/f/WCD/7940434004712898291.wcd
    Cache: /ssd/src/themachinethatgoesping/tutorials/demo/cache/root_/ssd/src/themachinethatgoesping/tutorials/unittest_data/kongsberg/a/f/WCD/7940434004712898291.wcd.tmtgp.cache
File: ../unittest_data/kongsberg/a/y/-64

In [4]:
# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)

# Note, if this is the first time you run the code, 
# the cache files do not exist yet, and there are no statistics to print


-- Cache file statistics --
FilePackageIndex: 0.04 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.09 'MB' / 0.33 %
FilePackageCache<WaterColumnInformation>: 0.51 'MB' / 1.93 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.65 'MB' / 2.46 %
- Source files -: 26.58 'MB' / 100.0 %


### 3. Load data with cache file paths

In [5]:
# load the data with the cache files (if this is the first time you run it, the cache files will be created):
from themachinethatgoesping.echosounders import kongsbergall
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)


indexing files ⢀ 99% [00m:00s<00m:00s] [Found: 1509 datagrams in 18 files (26MB)]                                          
Initializing ping interface ⢀ 90% [00m:00s<00m:00s] [Done]                                              




### 4. Compare loading times

In [6]:
# compare loading times in repeated file loading
t1 = time()
# load data without using the cache
fh = kongsbergall.KongsbergAllFileHandler(files)
t2 = time()
# load data using the cache files
fh = kongsbergall.KongsbergAllFileHandler(files, file_cache_paths = cacheFilePaths)
t3 = time()

print("\n-- Compare loading times --")
print(f"Time without cache: {round(t2-t1,3)} seconds")
print(f"Time with cache:    {round(t3-t2,3)} seconds")

indexing files ⠐ 100% [00m:00s<00m:00s] [..6328335172073169.all (1/18)]                               
indexing files ⠠ 100% [00m:00s<00m:00s] [..3858047591065953.wcd (18/18)]                                
indexing files ⢀ 100% [00m:00s<00m:00s] [Found: 1509 datagrams in 18 files (26MB)]                                          
Initializing ping interface ⢀ 90% [00m:00s<00m:00s] [Done]                                              
indexing files ⢀ 99% [00m:00s<00m:00s] [Found: 1509 datagrams in 18 files (26MB)]                                          
Initializing ping interface ⢀ 90% [00m:00s<00m:00s] [Done]                                              

-- Compare loading times --
Time without cache: 0.014 seconds
Time with cache:    0.008 seconds




### 4. Investigate cache files

In [7]:
# -- Investigate the created cache --
# Here we print the created cache files to understand how big they are
print("\n-- Cache file statistics --")
index_functions.print_cache_file_statistics(cacheFilePaths)


-- Cache file statistics --
FilePackageIndex: 0.04 'MB' / 0.15 %
NavigationInterpolatorLatLon: 0.09 'MB' / 0.33 %
FilePackageCache<WaterColumnInformation>: 0.51 'MB' / 1.93 %
FilePackageCache<SystemInformation>: 0.01 'MB' / 0.04 %
- Combined -: 0.65 'MB' / 2.46 %
- Source files -: 26.58 'MB' / 100.0 %
