# efficiencymap_pipeline

**Short Introduction**  
The `efficiencymap_pipeline` function is designed to **analyze and compare vehicle efficiency maps** derived from both on-road driving data and **dynamometer measurements**. It merges, filters, and normalizes data across multiple files, optionally **smooths** the results, and can **cluster dynamometer measurements** for **comparisons** against on-road points. Finally, it can **visualize efficiency maps** via scatter and contour plots, showing differences in efficiency between the two data sources. Users can also enable higher-fidelity interpolation to generate **finely quantized** efficiency contours.

---

## Parameters

- **data_cache** (`dict` or `None`)  
  A dictionary containing pre-loaded data keyed by filenames. If `None`, raw data is read from disk.

- **files** (`list` of `str`)  
  List of file paths to load and combine into a single efficiency dataset. Provide either `data_cache` or `files`.

- **gear** (`str`)  
  Identifies which gear (e.g., `'1'`, `'2'`, or `'vw'`) should be processed; also determines which files and full-load curves are used.

- **efficiency_limit_lower**, **efficiency_limit_upper** (`float`, defaults = 0, 1)  
  Range of efficiency values retained for analysis.

- **soc_limit_lower**, **soc_limit_upper** (`float`, defaults = 0, 1)  
  Range of State of Charge (SoC) values retained for analysis.

- **remove_neutral_gear** (`bool`, default=True)  
  If `True`, removes any rows flagged as “neutral” gear.

- **smoothing_kwargs** (`dict`, optional)  
  Arguments controlling time-domain smoothing of signals (used if `columns_to_smooth` is provided). Options:
  - `None`
  - `{'filter_type': 'moving_average', 'window_size': 3}`
  - `{'filter_type': 'exponential_moving_average', 'alpha': 0.4}`
  - `{'filter_type': 'savitzky_golay', 'window_size': 5, 'polyorder': 2}`
  - `{'filter_type': 'lowpass',  'cutoff_frequency': 0.1, 'order': 2}`

- **columns_to_smooth** (`list`, optional)  
  Specifies which columns (e.g., `'engine_rpm'`, `'vehicle_speed'`) to time-smooth.

- **substract_auxiliary_power** (`bool`, default=True)  
  If `True`, subtracts auxiliary power (e.g., `dcdc_power_hv`) to refine electrical power calculations, when available.

- **which_full_load_curve** (`str`, default='driving_data')  
  Determines how the full-load curve (boundary of max torque vs. rpm) is obtained:
  - `'driving_data'`  
  - `'dynamo_data'`  
  - `'adjusted'`  
  - `'overlap'`  

- **twoD_smoothing_kwargs** (`dict` or `None`)  
  When not `None`, triggers 2D smoothing (in torque vs. rpm) of efficiency values.
  **Example `twoD_smoothing_kwargs` configurations for `smooth_2d_data`:**

1. *Inverse Distance Weighting (IDW)*  
   ```python
   {
       "method": "idw",
       "power": 2,               # Power parameter for IDW (default=2)
       "num_neighbors": 10,      # Number of nearest neighbors (default=10)
       "outlier_detection": True,# Whether to detect outliers among neighbors
       "threshold_multiplier": 2 # Threshold in stdev units to exclude outliers
   }
    ```
2. *Gaussian Filter on a Grid*
    ```python
    {
        "method": "gaussian_filter",
        "sigma": 2,       # Std dev for Gaussian kernel
        "grid_size": 100  # Interpolation grid size in each dimension
    }
    ```
3. *Griddata Interpolation*
    ```python
    {
        "method": "griddata",
        "interp_method": "cubic"  # Could also be 'linear' or 'nearest'
    }
    ```
4. *Regression-Based Smoothing*
    ```python
    {
      "method": "regression",
      "model": "random_forest",   # Internally replaced by RandomForestRegressor
      "model_params": {
          "n_estimators": 50,     # Example RF hyperparameter
          "max_depth": 10
      }
    }
    ```
- **high_fidelity_interpolation** (`bool`, default=False)  
  If `True`, performs extra quantization & interpolation for higher-resolution efficiency maps.

- **n_quantize_bins** (`int`, default=15)  
  Binning resolution for quantized efficiency (used if `high_fidelity_interpolation` is `True`).

- **at_middle_of_bin** (`bool`, default=False)  
  Whether to treat bin centers (instead of edges) when quantizing torque and rpm.

- **n_interpolation_bins** (`int`, default=15)  
  Number of bins used for final interpolation if `high_fidelity_interpolation` is `True`.

- **global_offset** (`float`, default=0)  
  Constant added to all efficiency values before analysis (useful for systematic shifts).

- **generate_plots** (`bool`, default=False)  
  If `True`, generates scatter plots, contour plots, and cluster visuals.

- **verbose** (`bool`, default=False)  
  If `True`, logs detailed progress and intermediate outcomes.

- **filename_prefix** (`str`, default='efficiency_map')  
  Prefix for saved plots and data files.

- **figsize_1**, **figsize_2**, **figsize_3** (`tuple`, default=(10, 8))  
  Figure sizes for different plot types.

- **marker_size** (`int`, default=10)  
  Size of scatter plot markers.

---

## Return Values

A **4-element tuple**:
1. **`mean_abs_diff`** (`float`)  
   Mean absolute difference in efficiency (on-road vs. dynamometer), computed for data points within the full-load curve region.  
2. **`rmse`** (`float`)  
   Root-mean-square error in efficiency for that same set of points.  
3. **`elapsed_time`** (`float`)  
   Total runtime of the pipeline (in seconds).  
4. **`mean_abs_change`** (`float`)  
   Mean absolute change in efficiency values introduced by optional 2D smoothing.  

In [None]:
from modules.parametric_pipelines import efficiencymap_pipeline
from modules.data_handler import get_can_files

data_cache = None

# specify the folder containing the CAN files.
files = get_can_files(folder='data/mydata', exclude_keywords=[])  # Adjust the exclude_keywords as needed

# set process design parameters
efficiency_limit_lower = 0.4
efficiency_limit_upper = 1
soc_limit_lower = 10
soc_limit_upper = 90
remove_neutral_gear = True
smoothing_kwargs = {'filter_type': 'exponential_moving_average', 'alpha': 0.8}
columns_to_smooth = ['hv_battery_current', 'hv_battery_voltage', 'rear_motor_torque', 'engine_rpm']
substract_auxiliary_power = True
which_full_load_curve = 'overlap'
twoD_smoothing_kwargs = { # idw config
    'method': 'idw',
    'power': 0.8,
    'num_neighbors': 50,
    'outlier_detection': True,
    'threshold_multiplier': 1
}

n_quantize_bins = 30
at_middle_of_bin = False
n_interpolation_bins = 30
global_offset = 0
generate_plots = True
verbose = False
filename_prefix = 'em_pipeline_results'
gear = 1
figsize_1 = (6, 5)
figsize_2 = (8, 5)
figsize_3 = (10, 6)
marker_size = 5

mean_abs_diff, rmse, elapsed_time, mean_abs_change = efficiencymap_pipeline(
    data_cache = data_cache,
    files = files,
    gear = gear,
    efficiency_limit_lower = efficiency_limit_lower,
    efficiency_limit_upper = efficiency_limit_upper,
    soc_limit_lower = soc_limit_lower,
    soc_limit_upper = soc_limit_upper,
    remove_neutral_gear = remove_neutral_gear,
    smoothing_kwargs = smoothing_kwargs,
    columns_to_smooth = columns_to_smooth,
    substract_auxiliary_power = substract_auxiliary_power,
    which_full_load_curve = which_full_load_curve,
    twoD_smoothing_kwargs = twoD_smoothing_kwargs,
    high_fidelity_interpolation = True,
    n_quantize_bins = n_quantize_bins,
    at_middle_of_bin = at_middle_of_bin,
    n_interpolation_bins = n_interpolation_bins,
    global_offset = global_offset,
    generate_plots = generate_plots,
    verbose = verbose,
    filename_prefix = filename_prefix,
    figsize_1 = figsize_1,
    figsize_2=figsize_2,
    figsize_3=figsize_3,
    marker_size = marker_size
)

print(f'Mean abs diff: {round(100*mean_abs_diff, 2)}%')
print(f'Mean RMSE diff: {round(100*rmse, 2)}%')