
# Getting Started with the WaterTAP Parameter Sweep Tool

<img src="assets_parameter_sweep_demo/watertap-logo.png" alt="NAWI logo" width="200" align="right"/>

*NAWI Analysis Team*\
*Kinshuk Panda, Ben Knueven, Alexander Dudchenko*

Other contributors to the tool:\
*Ethan Young, Jeffery Allen, Samuel Helman*


*10/12/2023*


Please cite this notebook as

```bibtex
@misc{howtouse_parameter_sweep,
    title = "Getting Started with the WaterTAP Parameter Sweep Tool (Citation Only)",
    keywords = "eagle, IDAES, NAWI, parallel programming, parameter sweep, pyomo, sampling, water treatment, WaterTAP",
    author = "Kinshuk Panda and Ben Knueven and Alexander Dudchenko and Ethan Young and Jeffrey Allen and Samuel Helman",
    year = "2023",
    language = "American English",
    series = "Presented at the Advanced Process Systems Engineering Stakeholder Summit, 11-12 October 2023, Falls Church, Virginia",
    type = "Other",
}
```

# Outline

This demo will briefly describe and demonstrate how the parameter sweep tool in WaterTAP can be used for various technoeconomic analyses.

1. Introduction to parameter sweep
2. Software demonstration
3. Advanced features

## Introduction

* **Parameter sweep** is a tool for performing optimization, sensitivity analyses, and uncertainty quantification.
* Experiments are run repeatedly with a different set of input parameters to see their effects on the output quantities of interest. 
* **Output metrics** can include levelized cost of water (LCOW), component cost, specific energy consumption, energy efficiency, water recovery rate, gained output ratio, and bulk temperature difference among others. 
* **Input parameters** are technology dependent, some of which include feed water salinity, membrane permeability, membrane area, pumping efficiency, thermal conductivity, operating temperature, number of stages, labor, and capital expenditure.



## Main Features of the Parameter Sweep Tool

* **Modular** : It can work with any pyomo model, i.e., any WaterTAP flowsheet.
* **Scalable** : Analyses can be run in parallel on a personal computer, HPC, or cloud. Uses HDF5 for output storage.
* **Flexible** : Can be customized and combined to create complex analysis worflows.

## Types of Parameter Sweeps

* In its current form, a user can run 3 types of parameter sweeps
    - **Simple parameter sweep**
    - Recursive parameter sweep
    - Differential parameter sweep
* Samples for the parameter sweep can be generated from a probability distribution or an *n*-dimensional euclidean space where *n* is the number of parameters in the sweep. Latin hypercube sampling is also enabled.

### Simple Parameter Sweep

![ParameterSweep](assets_parameter_sweep_demo/Parameter_Sweep_Flowchart.png)
*Generate samples from a distribution or a Euclidean space and solve flowsheets with those input values.*

## Primary requirements for running a parameter sweep

* Function to construct the Pyomo model of the flowsheet
* Function to create the sweep parameters
* Initialization and optimization functions (reinitialization function deprecated)
* Parallel computing information
* Output file information

## Test Flowsheet

* We use a simple **RO system with an energy recovery device** to demonstrate the capabilities of the parameter sweep tool. 
* It comprises a high pressure pump connected to a steady state zero-dimensional RO process model. A turbine-type isothermal energy recovery device model is connected downstream of the RO model.
* The feed water is an NaCl solution and is modeled using an property package within WaterTAP. 
* Outputs that can be measured from this flow sheet include LCOW, product flow rate and concentration, volumetric recovery, water recovery, and specific energy consumption.



<div>
    <img src="assets_parameter_sweep_demo/RO_ERD_flowsheet.png" alt="RO with energy recovery device" width="60%" height="auto" align="center"/>
</div>

In [None]:
# Make the necessary imports
from pprint import pprint
from IPython import get_ipython
from watertap.core.solvers import get_solver
from watertap.flowsheets.RO_with_energy_recovery.RO_with_energy_recovery import (
    optimize,
)
from watertap.flowsheets.RO_with_energy_recovery.monte_carlo_sampling_RO_ERD import (
    build_model,
    build_outputs,
)
from parameter_sweep import (
    LinearSample,
    ParameterSweep,
)
from assets_parameter_sweep_demo.parameter_sweep_demo_script import (
    build_sweep_params,
    create_recursive_parameter_sweep_object,
    create_differential_parameter_sweep_object,
)

In [None]:
def create_parameter_sweep_object(num_samples, num_procs):

    solver = get_solver()
    kwargs_dict = {
        # Arguments being used in the demo
        "h5_results_file_name": "ps_demo.h5", # Resulting output file name
        "build_model": build_model, # Function that builds the flowsheet model
        "build_model_kwargs": dict(read_model_defauls_from_file=False,
                                   defaults_fname="default_configuration.yaml"),
        "build_sweep_params": build_sweep_params, # Function for building sweep param dictionary
        "build_sweep_params_kwargs": dict(num_samples=num_samples,
                                          scenario="A_comp_vs_B_comp_vs_LCOW"),
        "build_outputs": build_outputs, # Function the builds outputs to save
        "build_outputs_kwargs": {},
        "optimize_function": optimize, # Optimize flow sheet function
        "optimize_kwargs": {"solver": solver, "check_termination": False},
        "initialize_function": None,
        "initialize_kwargs": {},
        "parallel_back_end": "MultiProcessing", # ConcurrentFutures, MPI, Ray available
        "number_of_subprocesses": num_procs,
        
        # Additional useful keyword arguments
        "csv_results_file_name": None, # For storing results as CSV
        "h5_parent_group_name": None,  # Useful for loop tool
        "update_sweep_params_before_init": False,
        "initialize_before_sweep": False,
        "reinitialize_function": None,
        "reinitialize_kwargs": {},
        "reinitialize_before_sweep": False,
        "probe_function": None,
        
        # Post-processing arguments
        "interpolate_nan_outputs": False,
        
        # Advanced Users
        "debugging_data_dir": None,
        "log_model_states": False,
        "custom_do_param_sweep": None, # Advanced users only!
        "custom_do_param_sweep_kwargs": {},
        
        # GUI-related
        "publish_progress": False, # Compatibility with WaterTAP GUI
        "publish_address": "http://localhost:8888",
    }
    ps = ParameterSweep(**kwargs_dict)
    return ps, kwargs_dict

In [None]:
num_samples = 4
num_procs = 4
ps, kwargs_dict = create_parameter_sweep_object(num_samples, num_procs)

In [None]:
results_array, results_dict = ps.parameter_sweep(
    kwargs_dict["build_model"],
    kwargs_dict["build_sweep_params"],
    build_outputs = kwargs_dict["build_outputs"],
    build_outputs_kwargs = kwargs_dict["build_outputs_kwargs"],
    num_samples = num_samples,
    seed=None,
    build_model_kwargs = kwargs_dict["build_model_kwargs"],
    build_sweep_params_kwargs = kwargs_dict["build_sweep_params_kwargs"]
)

In [None]:
pprint(results_dict)

In [None]:
pprint(results_array)

## Plotting Results

The resulting H5 files can then be processed to generate plots, e.g., 2D scatter plots, map plots, etc.

<table><tr>
    <td> <img src="assets_parameter_sweep_demo/scatter_LCOW_vs_Acomp.jpg" alt="LCOW vs water permeability" width="100%"/> </td>
    <td> <img src="assets_parameter_sweep_demo/maps_LCOW_recovery_NaCl_loading.jpg" alt="LCOW vs water recovery and NaCL concentration" width="100%"/> </td>
</tr></table>


# Advanced Usage and Features

## Types of Parameter Sweeps

* In its current form, a user can run 3 types of parameter sweeps
    - Simple parameter sweep
    - **Recursive parameter sweep**
    - Differential parameter sweep
* Samples for the parameter sweep can be generated from a probability distribution or an *n*-dimensional euclidean space where *n* is the number of parameters in the sweep. Latin hypercube sampling is also enabled.

### Recursive Parameter Sweep


<div>
    <img src="assets_parameter_sweep_demo/Recursive_Parameter_Sweep_flowchart.png" alt="RecursiveParameterSweep" width="100%" align="center"/>
</div>

*Run the simple parameter sweep in recursion if some runs fail to ensure that a user-specified number of sample results are generated. This involves resampling the input parameter space to compensate for the failed runs.*

In [None]:
num_samples = 20
num_procs = 2
rps, rps_kwargs_dict = create_recursive_parameter_sweep_object(num_samples, num_procs)
rps_results_array, rps_results_dict = rps.parameter_sweep(
    rps_kwargs_dict["build_model"],
    rps_kwargs_dict["build_sweep_params"],
    build_outputs=rps_kwargs_dict["build_outputs"],
    build_outputs_kwargs=rps_kwargs_dict["build_outputs_kwargs"],
    num_samples=num_samples,
    seed=None,
    build_model_kwargs=rps_kwargs_dict["build_model_kwargs"],
    build_sweep_params_kwargs=rps_kwargs_dict["build_sweep_params_kwargs"],
)

In [None]:
pprint(rps_results_array)

In [None]:
pprint(rps_results_dict)

## Types of Parameter Sweeps

* In its current form, a user can run 3 types of parameter sweeps
    - Simple parameter sweep
    - Recursive parameter sweep
    - **Differential parameter sweep**
* Samples for the parameter sweep can be generated from a probability distribution or an *n*-dimensional euclidean space where *n* is the number of parameters in the sweep. Latin hypercube sampling is also enabled.

### Differential Parameter Sweep

<div>
    <img src="assets_parameter_sweep_demo/Differential_Parameter_Sweep_Flowchart.png" alt="DifferentialParameterSweep"  width="90%" align="center"/>
</div>

* Gather sensitivity data when the input parameter space is more than one dimensional. 
* It comprises two types of sweeps, an outer "nominal" sweep and an inner "differential" sweep. 
* The nominal sweep is a simple parameter sweep over the sampled space. 
* The differential sweep is a simple parameter sweep that occurs at every nominal value, where one of the sweep parameters is perturbed keeping the others fixed to their nominal values.

In [None]:
num_samples = 2
num_procs = 1

model, dps, dkwargs_dict = create_differential_parameter_sweep_object(num_samples, num_procs)
dps_results_array, dps_results_dict = dps.parameter_sweep(
    dkwargs_dict["build_model"],
    dkwargs_dict["build_sweep_params"],
    build_outputs=dkwargs_dict["build_outputs"],
    build_outputs_kwargs=dkwargs_dict["build_outputs_kwargs"],
    num_samples=num_samples,
    seed=None,
    build_model_kwargs=dkwargs_dict["build_model_kwargs"],
    build_sweep_params_kwargs=dkwargs_dict["build_sweep_params_kwargs"],
)

In [None]:
pprint(dps_results_array)

In [None]:
pprint(dps_results_dict)

## Loop Tool

* The parameter sweep tool can be invoked from a looping tool, also within WaterTAP, that allows a user to iteratively run different design configurations, 
    - Different build options
    - Different flowchart initialization options
    - Different solve constraints
    - e.g., different pressure exchanger types in RO
* YAML based initial setup.
* The loop tool uses HDF5 format to systematically store outputs from the various parameter sweeps.
* The loop tool, in combination with differential parameter sweep, can be used to conduct high-impact stochastic value of innovation analysis (see [Dudchenko et al.](https://doi.org/10.1073/pnas.2022196118)). 

## Parallel Manager

The parallel manager allows the end-user to select which parallel backend to run their parameter sweep with. Currently supported parallel backends include:

* Message Passing Interface (MPI)
* Python multiprocessing
* Python concurrent futures
* Ray Core
* Serial execution

The parallel manager provides a unified API to use the parallel backends.

### Motivation

* Most people will be running parameter sweep on a shared memory system and are familiar with python multiprocessing and concurrent futures.
    - Use concurrent futures when pyomo model initialization and reinitialization is trivial
    - Use python multiprocessing when initialization/reinitialization is computationally intensive.
* MPI enables distributed parallel computing on an HPC
* Ray is an MPI alternative for distributed parallel computing that has a simpler API.

Multiple parallel backends necessitates an abstraction layer such that the onus of supporting these parallel paradigms does not fall on the average WaterTAP developer. *Parallel Manager is that abstraction layer.*

### Currently Supported Parallel Features

* Gather, all gather - Gather value on a specific or all process(es)
* Scatter - Scatter values from one process to all other processess
* Broadcast - Broadcast a data structure to all processes
* All reduce - Reduce values and distribute to all processes

### Benchmarking Results

Next we look at benchmarking results of running the parameter sweep tool

#### Python Concurrent Futures

<table><tr>
    <td> 
        <img src="assets_parameter_sweep_demo/RO-ERD_Mac_2500_time.png" alt="Strong Scaling for RO" width="100%"/>
        <center><em>Compute Time</em></center>
    </td>
    <td> 
        <img src="assets_parameter_sweep_demo/RO-ERD_Mac_2500_speedup.png" alt="RO-ERD Speedup" width="100%"/>
        <center><em>Speed up</em></center>
    </td>
</tr></table>
<center><em>Strong Scaling Results for RO with Energy Recovery Device on a Mac</em></center>

#### MPI on NREL's Eagle HPC

<table><tr>
    <td> 
        <img src="assets_parameter_sweep_demo/RO-ERD_Eagle_Strong_Scaling_100K.png" alt="Strong Scaling for RO" width="100%"/>
        <center><em>RO with Energy Recovery Device</em></center>
    </td>
    <td> 
        <img src="assets_parameter_sweep_demo/LSRRO_Eagle_strong_10K.png" alt="LSRRO Strong Scaling" width="100%"/>
        <center><em>LSRRO</em></center>
    </td>
</tr></table>
<center><em>Strong Scaling Results</em></center>

# Future Work

* Closer integration with WaterTAP GUI.
* Integration with plotting tools

# Useful Documentation Links

* [How to explore a model with parameter sweep](https://watertap.readthedocs.io/en/latest/how_to_guides/how_to_use_parameter_sweep.html#how-to-explore-a-model-with-parameter-sweep)
* [Monte Carlo testing with the Parameter Sweep](https://watertap.readthedocs.io/en/latest/how_to_guides/how_to_use_parameter_sweep_monte_carlo.html#monte-carlo-testing-with-the-parameter-sweep)
* [How to Run Differential Parameter Sweep](https://watertap.readthedocs.io/en/latest/how_to_guides/how_to_run_differential_parameter_sweep.html#how-to-run-differential-parameter-sweep)
* [How to use loopTool to explore flowsheets](https://watertap.readthedocs.io/en/latest/how_to_guides/how_to_use_loopTool_to_explore_flowsheets.html#how-to-use-looptool-to-explore-flowsheets)
* [MPI Parallel Usage](https://watertap.readthedocs.io/en/latest/technical_reference/tools/parameter_sweep.html#parallel-usage)

# Appendix

## Example Slurm Job submission script on NREL's Kestrel

### Slurm Batch File

```bash
#!/bin/bash 
#SBATCH --nodes=1  # Run the tasks on the same node
#SBATCH --ntasks-per-node=104 # Tasks per node to be run
#SBATCH --time=1:00:00   # Required, estimate 5 minutes
#SBATCH --account=hpcapps # Required
#SBATCH --partition=debug
#SBATCH --mail-user=kinshuk.panda@nrel.gov
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=REQUEUE

cd /home/kpanda/NAWI/watertap/tutorials/parameter_sweep_demo
module purge
module load craype-x86-spr
module load gcc/13.1.0 anaconda3/2022.05 netlib-lapack/3.11.0-gcc
conda activate /projects/hpcapps/kpanda/conda-envs/watertap

mkdir -p outputs
N_SAMPLES=5000
NPROCS=100

python parameter_sweep_demo_script.py $N_SAMPLES $NPROCS > outputs/fout_mp_${N_SAMPLES}_${NPROCS} 2> outputs/errout__mp_${N_SAMPLES}_${NPROCS}
```


### Parameter Sweep Script

In [None]:
from IPython.display import Code

Code(filename='assets_parameter_sweep_demo/parameter_sweep_demo_script.py', language="python")