FCPG Tools Refactoring - Abstract Base Classes (ABC) v1
========================================================
**By:** @xaviernogueira

**Design philosphy:**
* Single responsibility - all functions should do a single task. Their functionality should not be repeated in other functions.
* Object oriented - Python class objects will be used to produce cleaner looking code as well as enable storage of relevant parameters between steps. Rasters will be stored in memory rather than being constantly written to disk.
* Modular - while the existing installation of FCPGTools pulls all tools in as functions, in version 2.0.0 there will be multiple modules/classes containing functions. This allows for lighter weight imports avoiding GDAL and TauDEM dependencies. This also allows for expirementation with new geoprocessing engines (i.e., [`pysheds`](https://github.com/mdbartos/pysheds)).
* Modern Python formatting - all functionality will be written following the [PEP8](https://peps.python.org/pep-0008/) style guide to match modern programming conventions.

**New features:**
* Multi-band support - since most hydrology relevant parameter grids are multi-band (with bands representing the time axis), all functions should work effectively the same regardless of how many bands are present. This can be handled by switching to an [`xarray`](https://docs.xarray.dev/en/stable/) tech stack.
* Pipeline facilitation - there should be oppurtunities to automate large parts of the work flow with no intervention. This will require certain function parameters to be pulled from raster metadata. Additionally, this requires replacing the existing design where rasters are read/write by [rasterio](https://rasterio.readthedocs.io/en/latest/) to a design **where raster objects can be held in memory between steps.**
* Performance optimization as a default - while some functions give the user an oppurtunity to input the # of cores they want to use on their computer, this optional parameter will likely not get used by more novice end users. Therefore I propose a workflow where a simple boolean `param:optimize` can control whether multi-processing is used. If `optimize=True` the program should automatically be able to identify the # of cores to use and allocate computation resources accordingly.

**Core raster functionality:**
* Create a Flow Direction Raster (FDR) from an DEM (can/should we support more formats than just .tif?).
* Convert ESRI FDR encodings to a D8 TauDEM encoding.
* Reproject/resample/clip an arbitrary parameter raster to match our FDR (currently uses GDALWarp).
* "Binarize" categorical rasters to allow for category accumulation calculations.
* Create a Flow Accumulation Cell (FAC) Raster (from the FDR) AND parameter grid accumulation rasters.
* Create a Flow Conditioned Parameter Grid by dividing the parameter grid accumulation raster by the FAC.

**Boundary condition functionality:**
* Find basin pour points (i.e., outflows and their coordinates) using max FAC values and HUC basin shapefiles (read as geoDataFrames).
* Update boundary conditions in a downstream basin using the extracted upstream pour points accumulation values.
    * Requires finding upstream pour points (their location and FAC values), and currently involves using a JSON dictionary to propagate values into the downstream raster.

# GeoSpatialEngine ABCs

**Description:** 

The idea here is that there is a set of funcitons that are deemed core/lite. **These functions are made concrete in different tech-stacks**, inhereting from the Abstract Base Class (think ODM2 API implementation). The `GeoSpatialEngineFull` ABC inehretes from the lite ABC, and is made concrete (i.e., inhereted from) using additional engines.

**Notes:** 
* Reading and writing rasters files should *ideally* be done in a separate IOEngine ABC. An issue we may run into here is that some engines (i.e., GDAL) work directly with paths via cmd line, therefore one would not need a xarray -> read/write functionality necessarily. **(bring up in a meeting)**.

## GeoSpatialEngineLite - ABC

**Core functionalities:**
* Prepare Flow Direction Raster:
    * Create an FDR from a DEM.
    * Convert FDR Formats (i.e., ESRI to TauDEM).
* Prepare parameter grids:
    * Reproject/resample/clip a raster to match a Flow Direction Raster.
    * One-hot-encode a categorical parameter raster (could this be done in bands instead of files?).
* Create Flow Accumulation Raster (FAC):
    * Create a FAC from a FDR.
    * Find basin pour points (outflow cells) and their accumulation values (cell or paramater).
    * Create a parameter grid accumulation raster (can be multi-dimensional along a time axis).
    * Create both FAC and parameter accumulation rasters **given boundary conditions (i.e., upstream basin pour points)**
* Flow Conditioned Parameter Grid (FCPG):
    * Create a multi-dimensional FCPG raster for a parameter grid stack.
    
**Notes:** 
* For `d8_fdr()` I am making `xarray.DataArray` returns default behavior, although one can save the intermediate files if they decide to.
* If we go past TauDEM we should support (and have some automatic way to verify) D8 cell value meanings?

In [6]:
import abc
import xarray as xr
from typing import Union

In [9]:
class GeoSpatialEngineLite(abc.ABC):
    #  Prepare flow direction raster (FDR)
    @abc.abstractmethod
    def d8_fdr(dem: Union[xr.DataArray, str], out_path: str = None,
               out_format: str = 'TauDEM') -> xr.DataArray:
        """
        Creates a flow direction raster from a DEM. Can either save the raster or keep in memory.
        :param dem: (str .tif path or xr.DataArray) the DEM from which to make the FDR.
        :param out_path: (optional, str) a valid string path to save the FDR.
        :param out_format: (str, default=TauDEM) type of D8 flow direction encoding for output.
        :returns: the FDR as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def convert_d8(d8_raster: Union[xr.DataArray, str], out_path: str = None,
                    in_format: str = 'ESRI', out_format: str = 'TauDEM') -> xr.DataArray:
        """
        Recodes a D8 FDR between different formats (default is ESRI -> TauDEM).
        :param d8_raster: (DataArray of str path) a D8 FDR .
        :param out_path: (str, default=None) allows the output raster to be saved at a string path. 
        :param in_format: (str, default=TauDEM) input rasters D8 flow direction encoding type.
        :param out_format: (str, default=TauDEM) type of D8 flow direction encoding for output.
        :returns: the recoded D8 FDR as a xarray DataArray object.
        """
        pass

    def find_cell_downstream(d8_raster: Union[xr.DataArray, str], coords: tuple) -> tuple:
        pass

    # prepare parameter grid rasters
    @abc.abstractmethod
    def clip(in_raster: Union[xr.DataArray, str], match_raster: Union[xr.DataArray, str] = None,
             out_path: str = None, extent_shapefile: str = None,
             custom_bbox: list = None) -> xr.DataArray:
        """
        
        :param in_raster:
        :param match_raster:
        :param out_path:
        :param extent_shapefile: 
        :param custom_bbox:
        :returns: ...  as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def reproject(in_raster: Union[xr.DataArray, str], match_raster: Union[xr.DataArray, str] = None,
             out_path: str = None, custom_crs: str = None) -> xr.DataArray:
        """
        
        :param in_raster:
        :param match_raster:
        :param out_path:
        :param custom_crs: 
        :returns: ...  as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def resample(in_raster: Union[xr.DataArray, str], match_raster: Union[xr.DataArray, str] = None,
             out_path: str = None, custom_cell_size: Union[float, int] = None) -> xr.DataArray:
        """
        
        :param in_raster:
        :param match_raster:
        :param out_path:
        :param custom_cell_size: 
        :returns: ...  as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def mask() -> xr.DataArray:
        """
        Primarily for masking rasters (i.e., FAC) by basin shapefiles, converting out-of-mask raster
        values to NoData. A cell value can also be used to create a mask for integer rasters.
        :param in_raster:
        :param mask_shapefile:
        :param out_path:
        :param mask_cell_value:
        :param inverse_mask: 
        :returns: ...  as a xarray DataArray object.
        """
        pass
    
    @abc.abstractmethod
    def binarize_categorical_rasters() -> xr.DataArray:
        """
        :returns: ...  as a xarray DataArray object.
        """
        pass
    
    # create/analyse flow accumulation rasters
    @abc.abstractmethod
    def fac_from_fdr(pour_points: dict = None) -> xr.DataArray:
        """
        :returns: ...  as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def parameter_accumulation() -> xr.DataArray:
        """
        :returns: ...  as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def find_pour_points() -> dict:
        """
        :returns: (dict)
        """
        pass

    # make FCPG raster
    @abc.abstractmethod
    def create_fcpg() -> xr.DataArray:
        """
        :returns: ...  as a xarray DataArray object.
        """
        pass

    # other raster functions
    @abc.abstractmethod
    def sample_raster(raster: xr.DataArray, coords: tuple) -> Union[float, int]:
        """
        :returns: ...  as a xarray DataArray object.
        """
        pass

    @abc.abstractmethod
    def get_min_cell(raster: xr.DataArray) -> list[tuple, Union[float, int]]:
        """
        :returns: (list) [coords:tuple, value:Union[float, int]]
        """
        pass

    @abc.abstractmethod
    def get_max_cell(raster: xr.DataArray) -> list[tuple, Union[float, int]]:
        """
        :returns: (list) [coords:tuple, value:Union[float, int]]
        """
        pass

    @abc.abstractmethod
    def update_cell_values(raster: xr.DataArray, coords: tuple, value: Union[float, int]) -> xr.DataArray:
        """
        :returns: ...  as a xarray DataArray object.
        """
        pass

## GeoSpatialEngineFull - ABC

#########################################
base_classes.py
class GeoSpatialEngineLite:
    def d_8(str) -> xarray:
class GeoSpatialEngineFull(GeoSpatialEngineLite):
    def d_8(): <---- GeoSpatialEngineLite
        """"""
    def clip:
    def reproject:
    def resample:
#########################################
engine.py
class TauGDALSpatialEngineFull(GeoSpatialEngineFull)
    def d_8():
        tuadem.d8
    def clip():
        gdal.clip
    def reproject():
        gdal.reproject
    def resample():
        gdal.resample
class PyShedSpatialEngineFull(GeoSpatialEngineFull)
    def d_8
        pysheds.d8
    def clip()
        pysheds.clip
    def reproject()
        pysheds.reproject
    def resample()
        pysheds.resample
class WhiteBox(GeoSpatialEngineFull)
    def d_8
        whitebox.d8
    def clip()
        whitebox.clip
    def reproject()
        whitebox.reproject
    def resample()
        whitebox.resample
#########################################
main.py
whitebox_engine = WhiteBoxSpatialEngineFull()
pysheds_engine = PyShedsSpatialEngineFull()
decay(pysheds_engine(resample_param(whitebox_engine))
#########################################
tools.py
resample_param(engine) -> xarray.DataArray
    engine.clip()
    engine.reproject()
    engine.resample()
decay(engine, raster:xarrayDataArray)
################################
alternative approach
resample_param(engine:str='GDAL'):
    if engine='GDAL'
        do this
    if engine='PySheds'
        do this other approach

In [None]:
class GeoSpatialEngineLite:
    def d8_fdr():
        # currently TauDEM
        pass


class GeoSpatialEngineFull(GeoSpatialEngineLite):
    def clip():
        # currently GDAL
        pass

    def reproject():
        # currently GDAL
        pass

    def resample():
        # currently GDAL
        pass


class PyShedsSpatialEngineFull(GeoSpatialEngineFull):
    def d8_fdr():
        # flow direction function
        pass


class GDALSpatialEngineFull(GeoSpatialEngineFull):
    def clip():
        # currently GDAL
        pass

    def reproject():
        # currently GDAL
        pass

    def resample():
        # currently GDAL
        pass

# Earlier stuff

In [1]:
import abc  # for abstract base class design
import os  # for file manipulations
import xarray as xr  # for in memory raster manipulation
import rioxarray  # for on disk manipulations (read/write)
from typing import Union  # for better type-hints

## Utility functions

In [2]:
# make a decorator to verify in and output paths
def verify_path_dir(in_path: str, make_dir: bool = True) -> Union[bool, None]:
    """
    Verifies that both an input path directories exist (for iterative use).
    If not but the next higher level directory does exist (and make_dir=True), the directory is created.
    :param in_path: str - a file path name.
    :param make_dir: bool (defaults to True) - whether to make the directory is easily possible.
    :returns: boolean - True is all input file directories exist or were created, false otherwise.
    """
    status = False

    # find if dir_path exists, or can be made
    if isinstance(in_path, str):
        dir_path = os.path.dirname(in_path)
        if not os.path.exists(dir_path):
            if make_dir:
                if os.path.exists(os.path.dirname(dir_path)):
                    try:
                        print(f'Creating output directory: {dir_path}')
                        os.makedirs(dir_path)
                        status = True
                    except Exception as e:
                        print(f'Could not make {dir_path} due to the following exception {e}')
                        return None
        else:
            status = True
    else:
        print(f'ERROR in :py:func:verify_path_dir() - in_path parameter is not a {type(in_path)} string!')
        return None

    return status

In [3]:
# place to test out functions
test_path = r'C:\Users\xrnogueira\Documents\FCPGtools\try_this\PR_test.ipynb'
#  verify_path_dir(test_path, make_dir=True)

## `ReadWriteEngine` - Input/Output Engine Class
**Notes:**
* The idea here is that all writing and reading to `xarray` can be done using class methods.
* Come back and add ways to write raster attributes using [`rioxarray.to_raster()`](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster).

In [4]:
class ReadWriteEngine(abc.ABC):

    @abc.abstractmethod
    def open_raster(self, in_raster_path: str) -> Union[xr.DataArray, None]:
        """
        Reads a raster file from path into an xarray DataArray.
        :param in_raster_path: (str) a valid path to a georeferenced raster.
        :return: ([xr.DataArray, None]) a DataArray object if a valid path is given,
        or None if in_raster_path does not exist.
        """
        if os.path.exists(in_raster_path):
            # use the rioxarray (via :param:engine) to open a .tif as a xr.DataArray
            # this is an experimental technique? other option...xr.open_dataarray(in_raster_path, decode_coords='ALL', engine="rasterio")
            raster = rioxarray.open_rasterio(in_raster_path, parse_coordinates=True)
            return raster

        else:
            print(f'ERROR: {in_raster_path} does not exist.')
            return None

    @abc.abstractmethod
    def get_raster_info(self, in_raster: Union[str, xr.DataArray]) -> dict:
        """
        Get raster information (used to inform geoprocessing parameters to match parameter grids to FDR)
        :param in_raster: either a raster path string or a DataArray
        :returns: a dictionary with key raster attributes
        """
        return dict

    @abc.abstractmethod
    def write_raster(self, in_raster: xr.DataArray, out_path: str) -> str:
        """
        Reads a raster file from path into an xarray DataArray.
        :param in_raster: (xr.DataArray) a valid path to a georeferenced raster.
        :param out_path: (str) a .tif path to write the raster to.
        :return: (str) out_path that the raster was writted to (if successful).
        """
        # check if the output directory exists
        out_dir = os.dirname(out_path)
        if os.path.exists(out_dir):
            pass

        # if the output directory does not exist, make it (if we can find the higher level directory)
        elif os.path.exists(os.dirname(out_dir)):
            os.makedirs(out_dir)
            pass

        # if we can't find a place to make the output directory, return an error
        else:
            return print(f'ERROR: Directory {os.dirname(out_dir)} does not exist and/or cannot be made.')

        # export the DataArry to a GeoTIFF raster (add tags or kwags later?)
        try:
            in_raster.to_raster(out_path, driver='GTiff', compute=True)
            return out_path
        except Exception as e:
            return print(f'Could not save raster to {out_path}\n Exception: {e}')

## `GeospatialEngineFull` Class - uses `xarray`
**Notes:**
* I am starting with defining the complete geospatial engine, then we can consider which aspects can be pulled in to `GeoSpatialEngineLite`.
    * There will be duplicate techniques to do the same thing between pysheds vs taudem as well as GDAL vs xarray.
* **Choose between an `xarray.DataArray` vs `xarray.Dataset` implementation!**
* The idea here is that `xarray` data comes in, and `xarray` data comes out! **All writing and reading is stored in the `IOEngine` class.*

**Class methods:**
* Get a parameter grid aligned (splits `resampleParam()` into 3 functions):
    * `reproject_raster()` - uses [`rioxarray.reproject_match()`](https://corteva.github.io/rioxarray/stable/rioxarray.html?highlight=write_crs#rioxarray.raster_array.RasterArray.reproject_match) OR [`rioxarray.reproject()`](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_dataset.RasterDataset.reproject) to reproject a raster.
    * `resample_raster()` - uses 
    * `clip_raster()`

In [58]:
class GeoSpatialEngineLite(abc.ABC):

    # NOTE: On second thought would this work better as a function?
    class GDALWarp():
        """
        GDALWarp wrapper to align a raster (via reprojection and/or resampling and/or clipping) with another raster.
        Alternatively, the GDALWarp command can be customized beyond defeault behavior pythonically.
        Use GDALWarp.execute() to execute on the command line via subprocesses.
        If the existing pamarmeters don't achieve the use case, one can pass in a custom gdal command via .execute(custom_cmd:str)
        """

        def add_resample(self, add: bool, xsize: Union[int, float], ysize: Union[int, float] = None) -> str:
            if add:
                if ysize is None:
                    ysize = xsize
                return f' -tr {xsize} {ysize}'
            else:
                return ''

        def add_reproject(self, add: bool, fdrcrs: str) -> str:
            if add:
                return f' -t_srs {fdrcrs}'
            else:
                return ''

        def add_clip(self, add: bool, xminmax: tuple, yminmax: tuple) -> str:
            if add:
                return f' -te {xminmax[0]} {yminmax[0]} {xminmax[1]} {yminmax[1]}'
            else:
                return ''

        def execute(self, in_raster: str, out_raster: str, match_raster: str, optimize_cores: bool = True,
                    reproject: bool = True, resample: bool = True, clip: bool = True,
                    custom_cmd: str = None, new_dtype: str = None, new_cellsize: Union[float, int] = None,
                    new_nodata: int = None, override_crs: str = None) -> str:
            """
            Executes GDALWarp cmd
            """

            # get params from raster info or override
            # mock params...
            cores, resample_method,  nodata, dtype, in_raster, out_raster = [1, 'bilinear', -1, 'int8', 'test_in.tif', 'test_out.tif']

            # build end ofthe command
            end_str = ' -co "PROFILE=GeoTIFF" -co "TILED=YES" -co "SPARSE_OK=TRUE" -co "COMPRESS=LZW" -co' \
                f' "ZLEVEL=9" -co "NUM_THREADS={cores}" -co "BIGTIFF=IF_SAFER" -r {resample_method} -dstnodata ' \
                f'{nodata} -ot {dtype} {in_raster} {out_raster}'

            # add each warp command
            resample_str = self.add_resample(bool(resample), 999, 999)
            project_str = self.add_reproject(bool(reproject), fdrcrs='TESTCRS')
            clip_str = self.add_clip(bool(clip), xminmax=(0, 9), yminmax=(0, 9))

            # build final parameter -  then execute!
            gdal_cmd = 'gdalwarp -overwrite' + resample_str + project_str + clip_str + end_str

            return gdal_cmd

    @abc.abstractmethod
    def recode_d8_fdr(self) -> str:
        """
        Recode a 8 directional Flow Direction Raster (FDR). Default is ESRI-TauDEM.
        :param in_raster:
        :param recode_dict: (optional, dict) an 8 item dictionary like {2: 4, 3: 6, ...}
        that allows for custom raster recoding.
        """
        pass

In [59]:
in_test = r'test_in.tif'
out_test = r'test_out.tif'
match_test = r'test_match.tif'

engine = GeoSpatialEngineLite.GDALWarp()

engine.execute(in_test, out_test, match_test)

# see if we need to get "" around some cmd keywords?

# Notes:
# resample, reproject, clip could be functions in the GeoSpatialEngine
def clip(engine='gdal'):

'gdalwarp -overwrite -tr 999 999 -t_srs TESTCRS -te 0 0 9 9 -co "PROFILE=GeoTIFF" -co "TILED=YES" -co "SPARSE_OK=TRUE" -co "COMPRESS=LZW" -co "ZLEVEL=9" -co "NUM_THREADS=1" -co "BIGTIFF=IF_SAFER" -r bilinear -dstnodata -1 -ot int8 test_in.tif test_out.tif'

In [None]:
class GeoSpatialEngineFull(abc.ABC):

    # define basic raster functions to prep parameter grids W/O GDAL
    @abc.abstractmethod
    def reproject_raster(self, in_raster: Union[str, xr.DataArray],
                         out_path: str = None) -> xr.DataArray:
        """
        Reproject a raster using GDAL warp
        """
        pass

    @abc.abstractmethod
    def resample_raster(self, in_raster: Union[str, xr.DataArray],
                        out_path: str = None) -> xr.DataArray:
    """
    v1 - use GDAL warp
    :returns: xarray DataArray resampled
    """
        pass

    @abc.abstractmethod
    def clip_raster(self, in_raster: Union[str, xr.DataArray],
                    out_path: str = None) -> xr.DataArray:
        pass

## Tools/stand-alone functions
* `requires_full_engine()` - a decorator function that verifies if a given function is available in the engine being used.

In [None]:
def requires_full_engine(func: callable, *args, **kwargs) -> callable:
    """
    A decorator to check if the full engine is required
    """
    def full_engine_function(engine, *args, **kwargs) -> any:
        if not isinstance(engine, FullEngine):
            raise ValueError(f'Invalid engine type. Function {func.__name__} requires FullEngine')
        return func(engine, *args, **kwargs)
    return full_engine_function


def get_cores() -> int:
    """
    Finds the # of cores available for multiprocessing
    """
    return int