FCPG Tools Refactoring - Abstract Base Classes (ABC) v1
========================================================
**By:** @xaviernogueira

**Design philosphy:**
* Single responsibility - all functions should do a single task. Their functionality should not be repeated in other functions.
* Object oriented - Python class objects will be used to produce cleaner looking code as well as enable storage of relevant parameters between steps. Rasters will be stored in memory rather than being constantly written to disk.
* Modular - while the existing installation of FCPGTools pulls all tools in as functions, in version 2.0.0 there will be multiple modules/classes containing functions. This allows for lighter weight imports avoiding GDAL and TauDEM dependencies. This also allows for expirementation with new geoprocessing engines (i.e., [`pysheds`](https://github.com/mdbartos/pysheds)).
* Modern Python formatting - all functionality will be written following the [PEP8](https://peps.python.org/pep-0008/) style guide to match modern programming conventions.

**New features:**
* Multi-band support - since most hydrology relevant parameter grids are multi-band (with bands representing the time axis), all functions should work effectively the same regardless of how many bands are present. This can be handled by switching to an [`xarray`](https://docs.xarray.dev/en/stable/) tech stack.
* Pipeline facilitation - there should be oppurtunities to automate large parts of the work flow with no intervention. This will require certain function parameters to be pulled from raster metadata. Additionally, this requires replacing the existing design where rasters are read/write by [rasterio](https://rasterio.readthedocs.io/en/latest/) to a design **where raster objects can be held in memory between steps.**
* Performance optimization as a default - while some functions give the user an oppurtunity to input the # of cores they want to use on their computer, this optional parameter will likely not get used by more novice end users. Therefore I propose a workflow where a simple boolean `param:optimize` can control whether multi-processing is used. If `optimize=True` the program should automatically be able to identify the # of cores to use and allocate computation resources accordingly.

**Core raster functionality:**
* Create a Flow Direction Raster (FDR) from an DEM (can/should we support more formats than just .tif?).
* Convert ESRI FDR encodings to a D8 TauDEM encoding.
* Reproject/resample/clip an arbitrary parameter raster to match our FDR (currently uses GDALWarp).
* "Binarize" categorical rasters to allow for category accumulation calculations.
* Create a Flow Accumulation Cell (FAC) Raster (from the FDR) AND parameter grid accumulation rasters.
* Create a Flow Conditioned Parameter Grid by dividing the parameter grid accumulation raster by the FAC.

**Boundary condition functionality:**
* Find basin pour points (i.e., outflows and their coordinates) using max FAC values and HUC basin shapefiles (read as geoDataFrames).
* Update boundary conditions in a downstream basin using the extracted upstream pour points accumulation values.
    * Requires finding upstream pour points (their location and FAC values), and currently involves using a JSON dictionary to propagate values into the downstream raster.

In [1]:
import abc  # for abstract base class design
import xarray as xr  # for in memory raster manipulation
import rioxarray  # for on disk manipulations (read/write)
from typing import Union  # for better type-hints

# `IOEngine` - Input/Output Engine Class
**Notes:**
* The idea here is that all writing and reading to `xarray` can be done using class methods.
* Come back and add ways to write raster attributes using [`rioxarray.to_raster()`](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_array.RasterArray.to_raster).

In [None]:
class IOEngine(abc.ABC):

    @abc.abstractmethod
    def open_raster(self, in_raster_path: str) -> Union[xr.DataArray, None]:
        """
        Reads a raster file from path into an xarray DataArray.
        :param in_raster_path: (str) a valid path to a georeferenced raster.
        :return: ([xr.DataArray, None]) a DataArray object if a valid path is given,
        or None if in_raster_path does not exist.
        """
        if os.path.exists(in_raster_path):
            # use the rioxarray (via :param:engine) to open a .tif as a xr.DataArray
            # this is an experimental technique? other option...xr.open_dataarray(in_raster_path, decode_coords='ALL', engine="rasterio")
            raster = rioxarray.open_rasterio(in_raster_path, parse_coordinates=True)
            return raster

        else:
            print(f'ERROR: {in_raster_path} does not exist.')
            return None

    @abc.abstractmethod
    def write_raster(self, in_raster: xr.DataArray, out_path: str) -> str:
        """
        Reads a raster file from path into an xarray DataArray.
        :param in_raster: (xr.DataArray) a valid path to a georeferenced raster.
        :param out_path: (str) a .tif path to write the raster to.
        :return: (str) out_path that the raster was writted to (if successful).
        """
        # check if the output directory exists
        out_dir = os.dirname(out_path)
        if os.path.exists(out_dir):
            pass

        # if the output directory does not exist, make it (if we can find the higher level directory)
        elif os.path.exists(os.dirname(out_dir)):
            os.makedirs(out_dir)
            pass

        # if we can't find a place to make the output directory, return an error
        else:
            return print(f'ERROR: Directory {os.dirname(out_dir)} does not exist and/or cannot be made.')

        # export the DataArry to a GeoTIFF raster (add tags or kwags later?)
        try:
            in_raster.to_raster(out_path, driver='GTiff', compute=True)
            return out_path
        except Exception as e:
            return print(f'Could not save raster to {out_path}\n Exception: {e}')

# `GeospatialEngineFull` Class - uses `xarray`
**Notes:**
* I am starting with defining the complete geospatial engine, then we can consider which aspects can be pulled in to `GeoSpatialEngineLite`.
    * There will be duplicate techniques to do the same thing between pysheds vs taudem as well as GDAL vs xarray.
* **Choose between an `xarray.DataArray` vs `xarray.Dataset` implementation!**
* The idea here is that `xarray` data comes in, and `xarray` data comes out! **All writing and reading is stored in the `IOEngine` class.*

**Class methods:**
* Get a parameter grid aligned (splits `resampleParam()` into 3 functions):
    * `reproject_raster()` - uses [`rioxarray.reproject_match()`](https://corteva.github.io/rioxarray/stable/rioxarray.html?highlight=write_crs#rioxarray.raster_array.RasterArray.reproject_match) OR [`rioxarray.reproject()`](https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_dataset.RasterDataset.reproject) to reproject a raster.
    * `resample_raster()` - uses 
    * `clip_raster()`

In [None]:
class GeoSpatialEngineFull(abc.ABC):

    # define basic raster functions to prep parameter grids W/O GDAL
    @abc.abstractmethod
    def reproject_raster(self, in_raster: Union[str, xr.DataArray],
                         out_path: str = None) -> xr.DataArray:
        """
        Reproject a raster using GDAL warp
        """
        pass

    @abc.abstractmethod
    def resample_raster(self, in_raster: Union[str, xr.DataArray],
                        out_path: str = None) -> xr.DataArray:
    """
    v1 - use GDAL warp
    :returns: xarray DataArray resampled
    """
        pass

    @abc.abstractmethod
    def clip_raster(self, in_raster: Union[str, xr.DataArray],
                    out_path: str = None) -> xr.DataArray:
        pass

# `GeoSpatialEngineDepreciated` Class - Uses an improved version of the existing tech stack, mirroring our updated engine architecture.
* The existing tech stack uses rasterio, GDAL, and TauDEM.
* This can be used to to sketch out existing functionality (in a modular way).

In [None]:
class GeoSpatialEngineDepreciated(abc.ABC):

    # define basic raster functions to prep parameter grids W/O GDAL
    @abc.abstractmethod
    def reproject_raster(self, in_raster: Union[str, xr.DataArray],
                         out_path: str = None) -> xr.DataArray:
        """
        Reproject a raster using GDAL warp
        """
        pass

    @abc.abstractmethod
    def resample_raster(self, in_raster: Union[str, xr.DataArray],
                        out_path: str = None) -> xr.DataArray:
    """
    v1 - use GDAL warp
    :returns: xarray DataArray resampled
    """
        pass

    @abc.abstractmethod
    def clip_raster(self, in_raster: Union[str, xr.DataArray],
                    out_path: str = None) -> xr.DataArray:
        pass

# Tools/stand-alone functions
* `requires_full_engine()` - a decorator function that verifies if a given function is available in the engine being used.

In [None]:
def requires_full_engine(func: callable, *args, **kwargs) -> callable:
    """
    A decorator to check if the full engine is required
    """
    def full_engine_function(engine, *args, **kwargs) -> any:
        if not isinstance(engine, FullEngine):
            raise ValueError(f'Invalid engine type. Function {func.__name__} requires FullEngine')
        return func(engine, *args, **kwargs)
    return full_engine_function
