# Pre-process the water column data
This notebook executes following steps:
* Read ASCII files exported from Sonarscope 
* Pre-process the data:
    * Format columns to to match Entwine header
    * Remove points with missing values
    * Crop data points to Belgian EEZ to remove spatial outliers
    * Calculate RGBA values for visualization of data in Potree viewer
* Export to CSV files for ingestion into Entwine

In [1]:
import dask
from dask.distributed import Client
from timbers_code.mbes_preprocessing import preprocess_mbes_file
from timbers_code.utils import get_bbox_fom_marineregions
import pandas as pd

## Set up dask client
You can check the status in the dask dashboard

In [2]:
client = Client()
client

0,1
Client  Scheduler: tcp://127.0.0.1:49714  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 12  Memory: 16.98 GB


## Set the input parameters
* index: csv file with input and output file paths for each survey line. MBES Data available on request.
* columns_map: mapping of Sonarscope header to Entwine header
* row_count_file: output path for csv files with rowcounts of each input file

In [3]:
index = pd.read_csv('index_mbes/Input_Output_21092.csv').iloc[18:,:]
columns_map = {'Lon':'X','Lat':'Y','Depth':'Z','Value': 'value_db'}
row_count_file = 'index_mbes/row_counts_21092.csv'
index

Unnamed: 0,SurveyLines,input_path,output_path
18,Line0026_0095_6565_21092,F:/VLIZ/TimbersWCdata_ALL/ASCII_Export_2021020...,data/tmp/20210204_VLIZ_TIMBERS/0026_20210204_1...
19,Line0027_0095_6565_21092,F:/VLIZ/TimbersWCdata_ALL/ASCII_Export_2021020...,data/tmp/20210204_VLIZ_TIMBERS/0027_20210204_1...
20,Line0029_0095_6565_21092,F:/VLIZ/TimbersWCdata_ALL/ASCII_Export_2021020...,data/tmp/20210204_VLIZ_TIMBERS/0029_20210204_1...
21,Line0030_0095_6565_21092,F:/VLIZ/TimbersWCdata_ALL/ASCII_Export_2021020...,data/tmp/20210204_VLIZ_TIMBERS/0030_20210204_1...


## Get a cropping box from marine regions in the CRS of the point data
mrgid 3293 corresponds to the Belgian EEZ (http://marineregions.org/mrgid/3293)

In [4]:
bbox = get_bbox_fom_marineregions(mrgid=3293, srs = "EPSG:4326")
bbox

(2.23833, 51.08931, 3.3704, 51.87611)

## Run the pre-processing

In [5]:
tasks = []
for input_path, output_path in index[['input_path','output_path']].itertuples(index=False):
    task = dask.delayed(preprocess_mbes_file)(raw_file_path=input_path,
                                              processed_file_path=output_path,
                                              columns_map = columns_map,
                                              crop_bbox = bbox)
    tasks.append(task)

In [6]:
%%time
results = dask.compute(*tasks)

Wall time: 3h 42min 13s


## Save the output with row counts

In [7]:
pd.DataFrame(results).to_csv(row_count_file,index=False)