# Tsdat Out-of-the-box tutorial
In this self-explained notebook, we walk you through an example pipeline predefined in `example_pipeline` module.

# Prerequisite
We assume that
- the conda environment is correctly setup
- required dependencies defined in `requirements-dev.txt` are installed.

# Working path configuration
In the following sessiong, we configure the working path to project root path to make sure the path of yaml files in `config/` are correctly resolved.
TODO: make relative path resolve more robust.

In [1]:
import os
from pathlib import Path

import warnings
warnings.filterwarnings('ignore')

In [14]:
# inspect current path and working path
try:
    if not path_original:
        path_original: str = os.getcwd()
except:
    path_original: str = os.getcwd()
print("current path: \n", path_original)
print("current working path: \n", os.getcwd())

# # command line fashion
# # in UNIX/MAC/LINUX
# ! pwd
# # in WINDOWS
# ! dir

current path: 
 /home/kefei/sandbox/jupyter-test/ingest-template/pipelines/example_pipeline
current working path: 
 /home/kefei/sandbox/jupyter-test/ingest-template


In [15]:
# retrieve root path. (two layers above) i.e., <path_to_git_clone>/ingest-template/
root_path = Path(path_original).parent.parent.absolute()
print("root path: \n", root_path)

# change working path to ROOT_PATH
os.chdir(root_path)
# double-check
print("current path: \n", path_original)
print("(new) working path: \n", os.getcwd())

root path: 
 /home/kefei/sandbox/jupyter-test/ingest-template
current path: 
 /home/kefei/sandbox/jupyter-test/ingest-template/pipelines/example_pipeline
(new) working path: 
 /home/kefei/sandbox/jupyter-test/ingest-template


### Pipeline Steps
- define configuration file path
- define pipeline configuration
- instantiate the pipeline
- (optional) validate pipeline output

In [13]:
import xarray as xr
from pathlib import Path
from tsdat import PipelineConfig, assert_close

In [12]:
# define configuration file path
config_path = Path("pipelines/example_pipeline/config/pipeline.yaml")
print("config_path: \n", config_path)
print("absolute config_path: \n", config_path.absolute())


config_path: 
 pipelines/example_pipeline/config/pipeline.yaml
absolute config_path: 
 /home/kefei/sandbox/jupyter-test/ingest-template/pipelines/example_pipeline/config/pipeline.yaml


In [11]:
# define pipeline configuration
config = PipelineConfig.from_yaml(config_path)
config



In [7]:
# instantiate the pipeline
pipeline = config.instantiate_pipeline()

In [8]:
# (optional) validate pipeline output

input_file = "pipelines/example_pipeline/test/data/input/buoy.z06.00.20201201.000000.waves.csv"
expected_file = "pipelines/example_pipeline/test/data/expected/morro.buoy_z06-waves.a1.20201201.000000.nc"

dataset = pipeline.run([input_file])
expected: xr.Dataset = xr.open_dataset(expected_file)  # type: ignore

# assert_close(dataset, expected, check_fill_value=False, check_attrs=False)

In [9]:
# inspect input file
import pandas as pd
df_input = pd.read_csv(input_file)
df_input.head()

Unnamed: 0,DataTimeStamp,WaveType,ZCN,Havg,Tavg,Hmax,Tmax,MaxCrest,Hsig,Tsig,...,MeanSpread,PeakPeriod,PeakDirection,PeakSpread,TP5,HM0,Te,DurationMs,F1,F2
0,2020-12-01 00:00:00,0.0,173.0,1.45,6.199,3.94,,,2.289,8.8,...,29.6,20.0,,,16.6,2.589,,,,
1,2020-12-01 00:20:00,0.0,185.0,1.47,5.9,3.569,,,2.299,8.0,...,30.1,18.2,,,14.8,2.589,,,,
2,2020-12-01 00:40:00,0.0,171.0,1.57,6.099,3.44,,,2.39,8.899,...,27.6,14.3,,,13.5,2.64,,,,
3,2020-12-01 01:00:00,0.0,166.0,1.57,6.3,3.71,,,2.46,9.1,...,29.1,18.2,,,18.1,2.75,,,,
4,2020-12-01 01:20:00,0.0,165.0,1.61,6.199,3.359,,,2.47,9.1,...,29.2,18.2,,,16.0,2.779,,,,


In [10]:
# inspect pipeline output
dataset