# Xavier's Sandbox Notebook

**Key Definitions:**
* FDR = Flow Direction Raster.
* FAC = Flow Accumulation Raster.
* Pour Point = The outlet point of a hydrologic unit/basin (i.e., FAC.max()).

But first run `conda-develop "C:\PATH\FCPGtools"` to get FCPGtools importable in the `/sandbox/` directory.

In [1]:
import FCPGtools as fc
import os
import rasterio as rs
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import subprocess

# Key Rasters

### ESRI NHDPlus Flow Direction Raster (FDR)
An example (from [`NHDPlusV21_MA_02_02a_FdrFac_01.7z`](https://www.epa.gov/waterdata/nhdplus-mid-atlantic-data-vector-processing-unit-02))of 8 bit flow direction raster (8 cardinal directions) using [NHDPlus](https://www.epa.gov/waterdata/get-nhdplus-national-hydrography-dataset-plus-data#v2datamap) raster coding convention (see below). 

**Cardinality:**
* 1 -> E (note that this is the only cardinal direction that does not need to be reclassified)
* 2 -> SE
* 4 -> S
* 8 -> SW
* 16 -> W
* 32 -> NW
* 64 -> N
* 128 -> NE

![Central PA FDR](imgs\esri_nhdplus_fdr_example.png)

# FDR Conversion Functions

## Reclassify ESRI flow direction raster (FDR) values into TauDEM FDR compatible values.

In [5]:
def tauDrainDir(inRast, outRast, band=1, updateDict={}):
    """
    Parameters
    inRast : str
        Path to a raster encoded with ESRI flow direction values.
    outRast : str
        Path to output a raster with flow directions encoded for TauDEM. File will be
        overwritten if it already exists.
    band : int (optional)
        Band to read the flow direction grid from if inRast is multiband, defaults to 1.
    updateDict : dict (optional)
        Dictionary of Rasterio raster options used to create outRast. Defaults have been supplied, 
        but may not work in all situations and input file formats.
    verbose : bool (optional)
        Print output, defaults to False.

    Returns
    -------
    outRast : raster
        Reclassified flow direction raster at the path specified above.
    """

    # assert that inRast is a file, if not return 'inRast not found'

    # use rasterio to open inRast, assert that the specified band exists

    # save a copy of the input metadata via inRast.profile.copy()

    # make a tauDir .copy() of the input raster and remap HNDplus flow directions to Taudem flow direction!
    # i.e., 2 -> 8, 4 -> 7, etc.

    # use updateDict to update the metadata

    # use rasterio to write tauDIR to the outRast .tif path

    return None

## Create a flow accumulation raster (FAC) from a TauDEM FDR
Wrapper for [TauDEM's AreaD8 (8 cardinal directions) accumulation area function](https://hydrology.usu.edu/taudem/taudem5/help53/D8ContributingArea.html). Currently the function returns nothing and simply writes to `accumRast` path using the TauDEM cmd line command. 

**Note:** The TauDEM function allows the sum of all upslope cells to be calculated, OR a weight can be added based on some other grid with the cmd line `[ -wg < wgfile >]` where `<wgfile>` is some parameter grid. Could we integrate this better?

In [9]:
def tauFlowAccum(fdr, accumRast, cores=1, mpiCall='mpiexec', mpiArg='-n', verbose=False) -> None:
    """Wrapper for TauDEM AreaD8 :cite:`TauDEM` to produce a flow accumulation grid.

    Parameters
    ----------
    fdr : str
        Path to a flow direction raster in TauDEM format.
    accumRast : str
        Path to output the flow accumulation raster.
    cores : int (optional)
        Number of cores to use. Defaults to 1.
    mpiCall : str (optional)
        The command to use for mpi, defaults to mpiexec.
    mpiArg : str (optional)
        Argument flag passed to mpiCall, which is followed by the cores parameter, defaults to '-n'.
    verbose : bool (optional)
        Print output, defaults to False.

    Returns
    -------
    accumRast : raster
        Raster of accumulated parameter values at the path specified above.
    """
    # format a dictionary with taudem inputs

    # construct the TauDEM AreaD8 command: '{mpiCall} {mpiArg} {cores} aread8 -p {fdr} -ad8 {outFl} -nc'.format(**tauParams)

    # use subprocesses to call the command. Write to accumRast.

    return None

# Parameter grid prep functions

## Resample + reproject + clip a raster to match a FDR raster

**GDAL Warp Parameters:**
```
{'inParam': inParam,
                'outParam': outParam,
                'fdr': fdr,
                'cores': str(cores),
                'resampleMethod': resampleMethod,
                'xsize': xsize,
                'ysize': ysize,
                'fdrXmin': fdrXmin,
                'fdrXmax': fdrXmax,
                'fdrYmin': fdrYmin,
                'fdrYmax': fdrYmax,
                'fdrcrs': fdrcrs,
                'nodata': paramNoData,
                'datatype': outType}
```
**Note:** Currently cores defaults to 1, this requires users to know how many cores their computer has. It could be cool to check how many cores the computer has and use all of them if a boolean parameter is true (i.e., `optimize_cores:bool = True`).

In [6]:
def resampleParam(inParam, fdr, outParam, resampleMethod="bilinear", cores=1, forceProj=False,
                  forceProj4="\"+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs\"",
                  verbose=False):
    """
    Parameters
    ----------
    inParam : str
        Path to the input parameter data raster
    fdr : str
        Path to the flow direction raster
    outParam : str
        Path to the output file for the resampled parameter raster.
    resampleMethod : str (optional)
        resampling method, either 'bilinear' or 'near' for nearest neighbor. Bilinear should 
        generally be used for continuous data sets such as precipitation while nearest neighbor
        should generally be used for categorical 
        datasets such as land cover type. Defaults to bilinear.
    cores : int (optional)
        The number of cores to use. Defaults to 1.
    forceProj : bool (optional)
        Force the projection of the flow direction raster. This can be useful if the flow
        direction raster hasan unusual projection. Defaults to False. This parameter defaults
        to False; however, if set to True, forceProj4 must also be specified or the default
        proj4 string for USGS Albers will be used, see below.
    forceProj4 : str (optional)
        Proj4 string used to force the flow direction raster. This defaults to USGS Albers, but is not used 
        unless the forceProj parameter is set to True.
    verbose : bool (optional)
        Print output, defaults to False.

    Returns
    -------
    outParam : raster
        Resampled, reprojected, and clipped parameter raster.
    """
    # open in FDR raster with rasterop amd ge the coorindate system, cell size, and bounding coordinates

    # open the input raster and get the same attributes as the FDR raster (+ first band dtype)

    # get the output dtype. If it's 8-but, convert to 16but using a possibly outdated GDAL  `outType = 'Byte'`

    # check if resampling or repojection is required, if not, copy the metadata + raster as is

    # if things need to be changeg use a `gdalwarp` command line cmd via `subprocess.run(cmd, shell=True)`
    warpParams = {'dict of': 'gdal warp parameters'}

    cmd = 'gdalwarp -overwrite -tr {xsize} {ysize} -t_srs {fdrcrs} -te {fdrXmin} {fdrYmin} {fdrXmax} {fdrYmax} \
    -co "PROFILE=GeoTIFF" -co "TILED=YES" -co "SPARSE_OK=TRUE" -co "COMPRESS=LZW" -co "ZLEVEL=9" -co "NUM_THREADS={cores}" \
     -co "BIGTIFF=IF_SAFER" -r {resampleMethod} -dstnodata {nodata} -ot {datatype} {inParam} {outParam}'.format(
                **warpParams)

    result = subprocess.run(cmd, shell=True)
    result.stdout
    # note that the result.stdout output is calculated but not returned nor saved to a variable
    return None

## Convert a categorical raster into a set of binary rasters (raster "one-hot-encoding")

As it sounds. A categorical raster (i.e., land cover) with N unique values is converted into N binary rasters where the value is either 1, 0, or -1 for nodata. **Output raster paths are returned as a list**. 

Main function is `cat2bin(inCat, outWorkspace, par=True, verbose=False)`, which just parallelizes `binarizeCat(...)`.

**Note:** Is a list really the best data structure to return binary rasters? Could I dictionary be better? 

Also the multiprocessing is interesting and appears to be used in multiple places.

In [8]:
def binarizeCat(val, data, nodata, outWorkspace, baseName, ext, profile, verbose=False):
    """
    Turn a categorical raster (e.g. land cover type) into A SINGLE BINARY RASTER PER UNIQUE CATEGORY/val, 
    0 for areas where that class is not present, 1 for areas where that class is present, and -1 for
    regions of no data in the supplied raster. Used in :py:func:`cat2bin`.

    Parameters
    ----------
    data : np.array
        Numpy array of raster data to convert to binary.
    val : int
        Raster value to extract binary for from data.
    nodata : int or float
        Raster no data value.
    outWorkspace : str
        Path to folder to save binary output rasters to.
    baseName : str
        Base name for the output rasters.
    ext : str
        File extension for output rasters.
    profile : dict
        Rasterio metadata dictionary decribing the properties used to create the output raster.
    verbose : bool (optional)
        Print output, defaults to False.

    Returns
    -------
    catRaster : str
        Filepath to the binary raster created."""
    # catData = data.copy()
    # catData[(data != val) & (data != nodata)] = 0
    # catData[data == val] = 1
    # catData[data == nodata] = -1  # Use -1 as no data value
    # catData = catData.astype('int8')

    # writes raster and returns it's path
    return str


def cat2bin(inCat, outWorkspace, par=True, verbose=False) -> list:
    """"
    TLDR:Basically applies binarizeCat() in parallel and handles metadata writing.

    Description - 
    Turn a categorical raster (e.g. land cover type) into a set of binary rasters, one for each category in the
    supplied raster, zero for areas where that class is not present, 1 for areas where that class is present,
    and -1 for regions of no data in the supplied raster. Wrapper on :py:func:`binarizeCat`.

    Parameters
    ----------
    inCat : str
        Input categorical parameter raster.
    outWorkspace : str
        Workspace to save binary raster output files.
    par : bool (optional)
        Use parallel processing to generate binary rasters, defaults to True.
    verbose : bool (optional)
        Print output, defaults to False.

    Returns
    -------
    fileList : list
        List of filepaths to output files."""

    from functools import partial
    # Open raster with rasterio and copy metadata

    # use numpy to find unique raster values and drop nodata categories

    # use multiprocessing processPool for multi-threaded raster processing (if par=True)

    # use the binarizeCat() function to produce a list of binary rasters
    # fileList = pool.map(partial(binarizeCat, data=dat, nodata=nodata,
    # outWorkspace=outWorkspace, baseName=baseName, ext=ext, profile=profile), cats)

    # basically pool.map(partial(binarizeCat), cats) parallelizes val=cat for call N cats.
    return list

# Shapefile functions

## Get a geoDF with HUC4 level basins (larger) identified from up- and down- stream HUC12 level basins (subwatersheds)

In [None]:
def makePourBasins(wbd, fromHUC4, toHUC4, HUC12Key='HUC12', ToHUCKey='ToHUC') -> gpd.GeoDataFrame:
    """Make geodataframe of HUC12 basins flowing from HUC4 to toHUC4.

    Parameters
    ----------
    wbd : GeoDataframe
        HUC12-level geodataframe projected to the same coordinate reference system (CRS) as the flow accumulation (FAC) and flow direction (FDR) grids being used.
    fromHUC4 : str
        HUC4 string for the upstream basin (i.e., '1407').
    toHUC4 : str
        HUC string for the downstream basin (i.e., '1501').
    HUC12Key : str (optional)
        Column name for HUC codes to process down to HUC4 codes, defaults to 'HUC12'.
    ToHUCKey : str (optional)
        Column name for the column that indicates the downstream HUC for each row of the dataframe, defaults to 'TOHUC'.

    Returns
    -------
    pourBasins : GeoDataframe
        HUC12-level geodataframe of units that drain from fromHUC4 to toHUC4.
    """
    # NOTE: getHUC4() is literally just in:str -> out:str[:4] which pulls HUC12 ID from the wbd['HUC12'] column

    # wbd['HUC4'] = wbd[HUC12Key].map(getHUC4) - gets the HUC12 basin's HUC4 level basin membership
    # wbd['ToHUC4'] = wbd[ToHUCKey].map(getHUC4) - gets the output HUC12 basin's (i.e., 'toHUC') HUC4 level equivalency

    # returns a copy of geodataframe rows where HUC4 = param:fromHUC4 and toHUC4 = param:toHUC4
    return gpd.GeoDataFrame

## Find pour points for each HUC4 basin using the basin geoDF, FAC, and FDR
**Note:** Pour points are the **outlet** of a basin, defined as having the maximum Flow Accumulation Raster value. 

In [None]:
def findPourPoints(pourBasins, upfacfl, upfdrfl, plotBasins=False):
    """Finds unique pour points between two HUC4s.

    Parameters
    ----------
    pourBasins : GeoDataframe
        GeoDataframe of the HUC12 basins that flow into the downstream HUC4. Used to clip the upstream FAC grid to
         identify pour points.
    upfacfl : str
        Path to the upstream flow accumulation grid.
    upfdrfl : str
        Path to the upstream tauDEM flow direction grid.
    plotBasins : bool (Optional)
        Boolean to make plots of upstream HUC12s and identified pour points. Defaults to False.

    Returns
    -------
    finalPoints : list
        List of tuples containing (x,y,w). These pour points have not been incremented downstream and can be used to
         query accumulated (but not FCPGed) upstream parameter grids for information to cascade down to the next
          hydrologic region / geospatial tile downstream.
    """

    # make an empty list to store pour points pourPoints = []

    # iterate over each HUC12 basin in the geoDF that flows into the downstream HUC4 basin. --------------
    # get the HUC boudnary and make a shape out of it using getFeatures() -> list of coordinates

    # apply a mask on the FAC raster using the HUC boundary coorindates -> np.array and affine-transform

    # find the maximum flow accumulation value points array index (could be multiple, non georeferences)

    # get coorindates of the max FAC points by trnasforming their np index via the affine-transforn

    # zip the coorindates of al max elevation points / pour
    # ------------------------------

    # with the list of xy pour points, verify that their downstream cell (via FindDownstreamCellTauDir()) IS Nodata!
    # Note: this makes sure the pour point (i.e., max accumulation cell) is at the edge of the basin and not some pit.

    # if no pour point is found with a NoData cell adjacent -> just append the pout point as is? (seems poorly handled...)

    # get unique pour points and return in a list of tuples finalPoints = [(x, y, w), ...]

    return list  # of (x, y, w) points where w = FAC value