TODO: 
----- > Motif 

# Layer-wise Relevance Propagation

> This notebooks has five main parts:
> - Introduction to Layer-wise Relevance Propagation
> - Dimensionality reduction
>
>   Following the steps of 04_dr notebooks.
>   - Gets the embeddings (or latent space) from a vultivariate time series given by an encoder (e.g. autoencoder) 
>   - Uses the obtained embeddings as input for a dimensionality reduction algorithm, to generate projections of the embeddings. (As 04.. does)
>
> - Compute Clusters
>   Following the steps of 04_dr notebooks. ¿Maybe it should also be in another notebook?
>   - The projections are clustered via hdbscan 
> - Anomalies simple detector
>   - Use basic statistics to obtain an anomaly score to visualize annomalies using a dynamic plot. Just for clarity.
> - Layer-wise relevance propagation
>   - Different implementations ti apply LRP to check the importance of each feature in the embeddings obtainment.
>   - Same check for selecting some points in the projections plot (assuming random selection). Checks the importance for each feature in the obtainment of the associated part of the embeddings.

> <span style="color:red; display:block;">
>  TODO: Save in 04_... an Artifact as in the previous nbs_pipeline notebooks and divide this notebooks in two (one for clustering and other for layer propagation. ¿Deberíamos separar también 04 en dos notebooks?
> </span>

## Introduction to Layer-wise Relevance Propagation
Layer-wise Relevance Propagation is a XAI technique introduced in 2015 by [Bach et all](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140&ref=blog.paperspace.com) for vision computing deep learning models that has been extensively used in different DL domains for better explainability. 

This method belongs to the attribution methods cattegory. According to [Towards Better Understanding Attribution Methods](https://openaccess.thecvf.com/content/CVPR2022/html/Rao_Towards_Better_Understanding_Attribution_Methods_CVPR_2022_paper.html) and [Oportunities and Challengues in Explainable Artificial Intelligence (XAI): A Survey](https://arxiv.org/pdf/2006.11371.pdf), it can be classified into three main groups: backpropagation based methods, activation based methods and perturbation-based methods.

TODO: Mejorar/revisar
la clasificación con https://arxiv.org/pdf/2006.11371.pdf

### Backpropagation-based or Gradient-based methods
These methods use backpropagation to calculate the relevance of input features, based on each feature's contribution to the model's output. Tipically rely on the gradients:
- with respect to the input
  - [DeepLift: Propagating Activation Differences](https://arxiv.org/abs/1704.02685). Decomposes the oitput prediction of a neural network on a especific input by backpropagating the activation of each neuron to its ```reference activation``` and assigns coontribution scores according to the difference.
  - [Guided BackPropagation/Guided saliency](https://arxiv.org/abs/1412.6806). "Variant of the [deconvolution approach](https://link.springer.com/chapter/10.1007/978-3-319-10590-1_53) for visualizing features learned by CNNs, which can also be applied to a broad range of network structures. Under this approach, the use of max-pooling in convolutional neural networks for small images is questioned and the replacement of max-pooling layers by a convolutional layer with increased stride is proposed, resulting in no loss of accuracy on several image recognition benchmarks." 
- with respect to intermediate layers
  - Saliency Maps. "Visualizing gradients, neural activation of individual layers using DeConv nets, guided backpropagation, etc. as images."
     - [NormGrad](https://openaccess.thecvf.com/content_CVPR_2020/html/Rebuffi_There_and_Back_Again_Revisiting_Backpropagation_Saliency_Methods_CVPR_2020_paper.html). "Based on the spatial contribution of gradients of convolutional weights"
       - Saliency maps combination at different layers to test the ability of saliency methods to extract complementary information at different network levels
        - Class-sensitivity metric and meta-learning inspired paradigm applicable to any saliency method for improving sensitivity to the output class being explained
     - [Compute gradient of the class score with respect to the input case](https://arxiv.org/pdf/1312.6034.pdf). Visualisation techniques for Convolutional Networks used for classification.
     - [Salient Deconvolutional Networks](https://link.springer.com/chapter/10.1007/978-3-319-46466-4_8).
  -  [Salient Relevance Maps](https://sciencedirect.com/science/article/pii/S0262885619300149)
  - [Excitation Backprop](https://link.springer.com/article/10.1007/s11263-017-1059-x). Top-down attention of a CNN classifier for generating task-specific attention map.
  
  - [FullGrad: Full-Gradient Representation for NN Visualization](https://proceedings.neurips.cc/paper/2019/hash/80537a945c7aaa788ccfcdf1b99b5d8f-Abstract.html). Decomposes the NN response into input sensitivity and per-neuron sensitivity components. 
- With respect to the last layer
  - [GradCam](https://openaccess.thecvf.com/content_iccv_2017/html/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.html), [GradCam++](https://ieeexplore.ieee.org/abstract/document/8354201)
- With respect to different layers
  - [LayerCAM: Hierarchical Class Activation Maps](https://ieeexplore.ieee.org/abstract/document/9462463)
  
Also, LRP can be included as a backpropagation-based methods. But it focuses on propagating relevances with different rules instead on focusing on the gradients. 
- [LRP: Layer-wise Relevance Propagation](https://iphome.hhi.de/samek/pdf/MonXAI19.pdf). "Operates by propagating the prediction backward in the neural network, using a set of purposely designed propagation rules".


### Activation-based methods

Weigh activation maps to assign importance of the final convolutional layer:
  - weighted by their gradients:
    - [GradCam](https://openaccess.thecvf.com/content_iccv_2017/html/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.html), [GradCam++](https://ieeexplore.ieee.org/abstract/document/8354201)
    - [LayerCAM: Hierarchical Class Activation Maps](https://ieeexplore.ieee.org/abstract/document/9462463)
    - [Gap CAM: Using Global Average Pooling in CNNs for generating class activation maps](https://openaccess.thecvf.com/content_cvpr_2016/html/Zhou_Learning_Deep_Features_CVPR_2016_paper.html)
  - Estimating their importance to the classification score 
    - [Ablation-cam](https://scholar.google.com/scholar?hl=es&as_sdt=0%2C5&q=Saurabh+Desai+and+Harish+G.+Ramaswamy.+Ablation-CAM%3A+Visual+Explanations+for+Deep+Convolutional+Network+via+Gradient-free+Localization.+In+WACV%2C+pages+983%E2%80%93991%2C+2020.&btnG=). Visual explanations for deep CNN via gradient-free localization. "Uses Ablation analysis to determine the importance (weights) of individual feature map units wrt class. Hilights the important regions in the image for predicting the concept. ..."
    - [Flow restiction](https://arxiv.org/abs/2001.00396), Adds noise to intermediate feature maps to restrict the flow of information and quantify how much information image regions provide.
<span style="color:red">TODO: quizá en TS largas pueda tener sentido algo del estilo aplicar algoritmos menos pesados, comprobar las secciones más relevantes y reducir la serie temporal seleccionando sólo las partes más importantes. </span>
      

### Perturbation-based methods

This methods treat the network as a black-box and assign importance by observing the change in output on perturbing the input. The explanation is generated by iteratively probing a trained ML model with those variations of the input. This can be done using different techniques:

- Occluding parts of the image
  - [Rise: Randomised Input Sampling for Explanation of black-box models](https://arxiv.org/abs/1806.07421). "Generates an importance map indicating how salient each pixel is for the model's prediction." "Estimates importance empirically by proving the model with randomly masked versions of the input image and obtaining the corresponding outputs."
  -  [LIME: Learning and Interpretable Model locally around the prediction](https://dl.acm.org/doi/abs/10.1145/2939672.2939778). "Explains the prediction of any classifier in an interpretable and faithful manner"
  - [Loss derivative back-propagation](https://link.springer.com/chapter/10.1007/978-3-319-10590-1_53). Classification of labelled images. "We train these models using a large set of N labeled images {x,y}, where label y_i is a discrete variable indicating the true class. A cross-entropy loss function, suoitable for image classification, is used to compare ~y_i and u_i. The parameters of the networks are trained by back-propagating the derivative of the loss with respect to the parameters throghout the network, and updating the parameters via stochastic gradient descent"
  - [DeconvNet: DeConvolution networks for convolution visualizations](https://link.springer.com/chapter/10.1007/978-3-319-10590-1_53). Activate the neourons of individual layers by occluding input instance and visualizing using DeConv
Nets.
- Optimising for a mask that maximizes/minimizes class confidence
  - [Real Time Image Saliency for black box classifiers](https://proceedings.neurips.cc/paper_files/paper/2017/hash/0060ef47b12160b9198302ebdb144dcf-Abstract.html). Saliency detection method. 
  - [Interpretable explanation of black boxes by Meaningful Perturbation](https://openaccess.thecvf.com/content_iccv_2017/html/Fong_Interpretable_Explanations_of_ICCV_2017_paper.html). Goal: find the part of an image most responsible for a classifier decision. Model-agnostic and testable. Interpretable image perturbations.

- Selecting specific features
   - [SHAP: A Unified Approach to Interpreting Model Predictions
](https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html) proves feature correlations by removing features in a game theoretic framework.
   - [Prediction Difference Analysis](https://arxiv.org/abs/1702.04595). Remove individual fetures  and finds the positive and negative correlation of individual features towards the output
- Features replacements. [Interpreting Black Box Models via Hypothesis Testing
](https://dl.acm.org/doi/abs/10.1145/3412815.3416889). "Conterfactual replacements of features to study feature importance. 
> TODO: REVISAR el artículo principal. Propone un modo de evaluar algoritmos de atribución. ¿Merece la pena revisarlo? Creo que se sale del scope.

## Dimensionality reduction

Generate projectsion of the embeddings

In [None]:
#Weight & Biases
import wandb

#Yaml
from yaml import load, FullLoader

#Embeddings
from dvats.all import *
from tsai.data.preparation import prepare_forecasting_data
from tsai.data.validation import get_forecasting_splits
from fastcore.all import *

#Dimensionality reduction
from tsai.imports import *

#Clustering
import hdbscan
import utils.config as cfg_
import time
import seaborn as sns


In [None]:
check_memory_usage = True
print_flag = True
show_time_series_flag = False

In [None]:
if check_memory_usage:
    import nbs_pipeline.utils.memory as mem
    import torch 
    gpu_device = torch.cuda.current_device()
    mem.gpu_memory_status(gpu_device)

In [None]:
#Get W&B API
api = wandb.Api()

### Get configuration parameters

This model needs to restore the encoder model fitted in the notebook `02x`, as well as the data and configuration

In [None]:
import os
path = os.path.expanduser("~/work/nbs_pipeline/")
name="05-xai-lrp"
runname = name
os.environ["WANDB_NOTEBOOK_NAME"] = path+name+".ipynb"

In [None]:
config_lrp = cfg_.get_artifact_config_xai_lrp(False)
if print_flag: cfg_.show_attrdict(config_lrp)

### W&B initialization

In [None]:
run_lrp = wandb.init(
    entity           = config_lrp.wandb_entity,
    project          = config_lrp.wandb_project if config_lrp.use_wandb else 'work-nbs', 
    group            = config_lrp.wandb_group,
    allow_val_change = config_lrp.allow_val_change, 
    job_type         = config_lrp.job_type, 
    mode             = 'online' if config_lrp.use_wandb else 'disabled',
    anonymous        = 'never' if config_lrp.use_wandb else 'must',
    config           =  config_lrp,
    resume           = 'allow',
    name = runname
)
config_lrp = wandb.config # Object for storing hyperparameters

### Restore the encoder model and its associated configuration
This model is neccesary for getting back the projections

In [None]:
embs, emb_artifact, emb_config = get_embeddings(config_lrp, run_lrp, api, print_flag)

In [None]:
if print_flag: 
    print(embs.shape)
    cfg_.show_attrdict(emb_config)
    print(emb_artifact.name)

In [None]:
config_dr, _ = cfg_.get_artifact_config_dimensionality_reduction(False)

In [None]:
cfg_.show_attrdict(config_dr)

In [None]:
df, df_artifact, enc_artifact, enc_learner = get_dataset(
    config_lrp, emb_config, config_dr,
    run_lrp, api, print_flag
)

In [None]:
if show_time_series_flag:
    # Show time series plot
    fig, ax = plt.subplots(1, figsize=(15,5), )
    cmap = matplotlib.colormaps.get_cmap('viridis')
    df.plot(color=cmap(0.05), ax=ax) # or use colormap=cmap
    # rect = Rectangle((5000, -4.2), 3000, 8.4, facecolor='lightgrey', alpha=0.5)
    # ax.add_patch(rect)
    plt.tight_layout()
    plt.legend()
    plt.show()

## Check embeddings and prepare enc_input

### Main parameters

In [None]:
config_lrp['stride']

In [None]:
stride = enc_artifact.metadata['stride']
print(stride)
batch_size = df_artifact['batch_size']
w = enc_artifact.metadata['w']

In [None]:
t_start = time.time()
enc_input, _ = prepare_forecasting_data(df, fcst_history = w)
t_end = time.time()
t = t_end - t_start
print("SW start | " , t_start, " | end ", t_end, "total (secs): ", t)
print(enc_input.shape)

In [None]:
if check_memory_usage: mem.gpu_memory_status(gpu_device)

In [None]:
if config_lrp['stride'] != stride:
    if stride != 1:
        print("Not implemented. Are the training and current stride values compatible?")
    else:
        stride = config_lrp['stride']
        if print_flag: print("stride --> ", stride)
        embs = embs[::stride]

In [None]:
if print_flag:
    print("w", w)
    print("stride", stride)
    print("batch_size", batch_size)
    print("enc_input ~", enc_input.shape)
    print("enc_artifact", enc_artifact.name)
    print("embs ~", embs.shape)

### Dimensions check

In [None]:
#Dimensions check
num_inputs = np.ceil(enc_input.shape[0]/stride)
num_embs = embs.shape[0]
test_eq(num_inputs, num_embs )
print(num_inputs, num_embs)

Average embeddings in the time dimension, if needed

## Dimensionality reduction using UMAP

In [None]:
#Ensure no nan ((Intento de Macu. La celda de comentada abajo es la original. Pero falla por Nan con sunspot))
embs_no_nan = embs[~np.isnan(embs).any(axis=1)]
embs_no_nan.shape

In [None]:
embs_no_nan.shape

In [None]:
config_dr

In [None]:
prjs = get_prjs(embs_no_nan, config_dr, config_lrp, False)

In [None]:
beep(0.15)
beep(0.15)
beep(0.15)


In [None]:
prjs[0:10] # En R head(res[1,],10)

Save the projections as an artifact

In [None]:
if config_lrp.use_wandb: 
    run_lrp.log_artifact(
        ReferenceArtifact(
            prjs, 
            'projections-xai-LRP-mvp-SWV', 
            type='projections', 
            metadata=dict(run_lrp.config)
        ), 
        aliases=f'run-{run_lrp.project}-{run_lrp.id}'
    )

## Create Precomputed Clusters

In order to integrate precomputed clusters into the embedding space, it's necessary to log artifacts that include the labels of the newly created clusters. 

The cluster creation process is presented below. This creation procedure can be modified according to specific needs. However, the structure of the new artifact must be preserved (it must be a numpy.ndarray and the number of elements must be equal to the number of points in the embedding space).

In [None]:
print(f'HDBSCAN supported metrics: {list(hdbscan.dist_metrics.METRIC_MAPPING.keys())}')

In [None]:
# Define HDBSCAN parameters
hdbscan_kwargs = {
    'min_cluster_size' : 7, #100, #100,
    'min_samples' : 3,
    'cluster_selection_epsilon' : 0.0001,
}
metric_kwargs = {
    'metric' : 'euclidean' #'jaccard'
}

In [None]:
# Create clusters using HDBSCAN
clusters = hdbscan.HDBSCAN(**hdbscan_kwargs, **metric_kwargs).fit(prjs)
clusters_labels = clusters.labels_
list(Counter(clusters_labels).items())

### Check cluster score

In [None]:
score = cluster_score(prjs, clusters_labels, True)

In [None]:
# Testing artifact structure 
test_eq_type(type(clusters_labels), np.ndarray)
test_eq(clusters_labels.size, prjs.shape[0])

In [None]:
# Create and log 'clusters_labels' artifact
clusters_ar = ReferenceArtifact(
    obj  = clusters_labels, 
    name = 'clusters_labels-xai-lrp-mvp-SWV',
    type = 'clusters'
)
clusters_ar.metadata, clusters_ar.manifest.entries.values()

In [None]:
run_lrp.log_artifact(clusters_ar, aliases=['hdbscan_jaccard'])

In [None]:
beep(0.25)
beep(0.25)
beep(0.25)

## Dynamic plot for determining whether a window of the time series is anomalous

#### Get Anomaly Score

In [None]:
# Create clusters using HDBSCAN
clusters = hdbscan.HDBSCAN(**hdbscan_kwargs, **metric_kwargs).fit(prjs)
clusters_labels = clusters.labels_
list(Counter(clusters_labels).items())

In [None]:
#anomaly_scores = detector(prjs_umap, clusters_labels)
anomaly_scores = clusters.outlier_scores_
if print_flag: print(anomaly_scores)

#### Check anomaly scores distribution

In [None]:
import seaborn as sns
plot_anomaly_scores_distribution(anomaly_scores)

## Select a threshold

In [None]:
print(anomaly_scores.shape)
print("min ", np.min(anomaly_scores))
print("max ", np.max(anomaly_scores))
anomaly_scores_mean = np.mean(anomaly_scores)
print("media ", anomaly_scores_mean)
anomaly_scores_std = np.std(anomaly_scores)
print("std ", anomaly_scores_std)

In [None]:
threshold = pd.Series(clusters.outlier_scores_).quantile(0.9)

## LRP

LRP is a technique of explainable artifficial intelligence (XAI). It is used to explain the predictions of models that are structured as neural networks. It operates by backpropagating the prediction in the neural network using a set of designed local ```propagation rules```.

### LRP structure

- Define the Neural Network (NN) model. It will be composed by different layers, each one wich its own neurons.
- Backpropagation
  - Propagation starts at the top layer (usually, the output layer of the NN)
  - Local Propagation Rules

    In each neuron, specific local ```propagation rules``` are applied to calculate how much ```relevance``` or importance should be passed to the next layer. The ```relevance``` is a real number defined via the ```propagation rule```.
  - Propagation
  
    The relevances calculated in the layer are transmitted downwards to the next layer.

      - Conservation property: the total amount of relevance received by each neuron is equally redistributed to the neurons in the layer bellow.
  - Repeat layer by layer. The propagation continues until it reaches the input features of the NN. At this point, each input feature has received a relevance score that reflects its contribution to the NN prediction. 
   


### Comparison with other techniques
LRP distinguishes itself from other explainability techniques in two ways:
- Other techniques are often more computationally expensive, as many of them involve multiple neural network evaluations.
- Some alternative techniques replace the gradient with a coarser estimate of effect. That involves optimising some local surrogate model or the explanation itself.

In contrast, LRP leverages the graph structure of deep neural networks to compute explanations quickly and reliably​​.

### Limitations

The main limitation of LRP is the way it handles features' contributions. The method for dealing with positive and negative contributions during the propagation phase may limit how much these relevances can grow. 

This may result on less detailed representation of how the input features may affect to the NN output. However, this aids in providing more stable explanations. 

Also, this mean that LRP is more focused on global trend or most influence features instead of granulated details or minor variations... 

...**and that is exactly what we are looking for here!** 

Great!

### LRP implementations
In order to decide a library for implementing LRP into deepvats, ```GitHub``` and ```PyPI``` available libraries have been checked.
ctions) |


| Library | GitHub URL | Base Framework | Supported Data Types | Associated Paper |
|---------|------------|----------------|----------------------|------------------|
| iNNvestigate | [GitHub](https://github.com/albermax/innvestigate) [PyPI](https://pypi.org/project/innvestigate/)| Various (Keras, TensorFlow, PyTorch - added in 2019) | Not specified. The paper talks about pixels and classifiers. Assumed to be used for vision classification models | [Paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140&ref=blog.paperspace.com) |
| Zennit | [PyPI](https://pypi.org/project/zennit/) | PyTorch | Not specified (Adaptable to different data types) | [Paper](https://arxiv.org/abs/2106.13200) |
| PyTorchRelevancePropagation | [GitHub](https://github.com/kaifishr/PyTorchRelevancePropagation) | PyTorch | Not specified | Not available |
| TorchLRP | [GitHub](https://github.com/fhvilshoj/TorchLRP) | PyTorch | Not specified | Not available, not in PyPI => Not relevant for this study|
| Layerwise-Relevance-Propagation-for-LSTMs | [GitHub](https://github.com/alewarne/Layerwise-Relevance-Propagation-for-LSTMs) | TensorFlow | Time Series (Specifically for LSTMs) | Not available |
| LRP Toolbox | [GitHub](https://github.com/sebastian-lapuschkin/lrp_toolbox) | Matlab, Python (no PyTorch in requirements), Caffe | Not specified. Examples show images and text. | Not available |
| lrp-pf-auc | [PyPI](https://pypi.org/project/lrp-pf-auc/) [Zenodo](https://zenodo.org/records/6821295) | Python (no PyTorch in requirements) | Not specified | Not available |
| keras-explain | [PyPI](https://pypi.org/project/keras-explain/) | Keras | Not specified | Not available |
| captum | [PyPI]() [GitHub]() | PyTorch | Not specified | [Paper](https://arxiv.org/abs/2009.07896) |

Thus, there are two options that can potentially be used for training our models and have associated papers: ```iNNvestigate```, ```Zennit``` and ```captum```. For simplicity, as we are focused on ```Pytorch``` and ```LRP```, ```zennit``` has been tested for integrating LRP into ```DeepVATS```. Basic use is shown in the next section going through its [tutorial](https://zennit.readthedocs.io/en/latest/getting-started.html), 

> Aquí intentos en Google Collab de usar captum y zennit con datos tabulares: [GCollab](https://colab.research.google.com/drive/1Bt-csfh1M-ttU2ww6akY_BwRJUA8uND5?authuser=0#scrollTo=lqgM6bIzHb6I)

<span style="color:red; display:inline-block;">

> TODO: Posible análisis para el paper/otro paper: analizar si iNNvestigate y captum se pueden incluir en DeepVATS y si merece la pena para poder visualizar otros modelos en el futuro o no. De primeras, Zennit me ha parecido el más sencillo de seguir y permite una selección clara de reglas usando tres clases sencillas para manejar el modelo
</span>.

https://colab.research.google.com/drive/1Bt-csfh1M-ttU2ww6akY_BwRJUA8uND5?authuser=0#scrollTo=QKjyonIIRS_h

### Zennit: Get Started

[Zennit](https://zennit.readthedocs.io/en/latest/getting-started.html) is a library that implements **propagation-based attribution methods** by *overwriting the gradient of PyTorch modules* in PyTorch's auto-differentiation engine (the part associated to the automatic gradient calculus of 'complex' functions).  Zennit uses this engine for modifying the way gradients are computed within the attribution process, allowing to apply attribution methods based on propagation. 

*Zennit will only work on models which are strictly implemented using Pytorch modules. Including activation functions.*

#### Introduction



##### Attribution process
The "attribution process" in the context of neural networks and machine learning refers to the technique of determining how different parts of the input to a model contribute to its output. 
    
The goal is to explain a model's decisions or predictions by identifying which input features are responsible for the final prediction and how much these features influence it. This process is crucial for understanding, interpreting, and trusting machine learning models, especially those that are complex and opaque, such as deep neural networks.

Some key considerations about the attribution process include:
    
- **Interpretability:** Provides a clear insight into why a model makes certain decisions, which is especially important in fields where decisions need to be explainable and justifiable, such as in medicine or banking.

- **Identification of Important Features:** Helps understand which features are most influential for the model's predictions, which can be useful for feature engineering or gaining a better understanding of the problem under study.

- **Attribution Techniques:** There are different methods for conducting the attribution process, such as Layer-wise Relevance Propagation (LRP), Shapley Decomposition, Grad-CAM, and others. Each of these methods has its own advantages, limitations, and suitable use cases.

- **Applications in Various Fields:** The attribution process is applied in a variety of fields, from image recognition and natural language processing to disease prediction and financial decision-making.

  In summary, the attribution process is a fundamental part of analyzing machine learning models, providing transparency and understanding in how models make predictions or decisions based on input data.

##### Propagation-based attribution methods
Explainable-AI techniques that propagate the  contribution of output neurons back to the input layers. Essentially, these methods attempt to explain how the input features of a model contribute to its final prediction. In the context of Zennit, these methods modify the gradients of the PyTorch modules during the autodifferentiation process to compute these contributions.

#### Main Zennit structures
The most important high-level structures in Zennit are ```composites```, ```Attributors```, and ```Canonizers```

##### Composites
Structures that map ```Rules``` to modules (torch.nn, MVP) based on their properties and context to modify their gradient. The most common composites for ```LRP``` are implemented in ```zennit.composites```.
That is: 

- Map ```Rules``` to modules. Each module (convolutional layers, ReLU activation layers, ...) may need specifics ways for computing their contribution to the module output. ```Composites``` are used to assign different rules to those modules to define how to get those contributions.
- Based on their propierties and context. The assignation of this rules is not random and depends on the charasteristics of each module and its context insithe the module.
- This mapping/assignation changes the way gradient is computed within retropropagation.


[Predefined composites](https://zennit.readthedocs.io/en/latest/reference/zennit.composites.html#module-zennit.composites): 
<span style="color:red; display:block;">
La relevancia se la he pedido a ChatGPT entendiendola como: demasiado metida en Imagen (esperará 3D), Aparentemente útil, Seguro que se puede usar. Hay que revisarlo investigando un poco más de cada una de ellas a la hora de fijarlas y de escribir el artículo. Tener en cuenta que nos interesa que sea algo especialmente genérico. O proponer las que sean úties y que el destinatario decida cuál quiere usar.
</span>
| Composite Name            | Description                                                        | Reference                                                                                     | Relevance for Time Series (MVP) |
|---------------------------|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|---------------------------------|
| BetaSmooth                | Explicit composite to modify ReLU gradients to smooth softplus gradients [Dombrowski et al., 2019]. | Dombrowski, A.-K., Alber, M., Anders, C. J., Ackermann, M., Müller, K.-R., & Kessel, P. (2019). Explanations can be manipulated and geometry is to blame. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, 13567–13578. [Link](https://proceedings.neurips.cc/paper/2019/hash/bb836c01cdc9120a9c984c525e4b1a4a-Abstract.html) | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |
| DeconvNet                 | Modifying gradients of all ReLUs according to DeconvNet [Zeiler and Fergus, 2014]. | Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, volume 8689 of Lecture Notes in Computer Science, 818–833. Springer. [Link](https://doi.org/10.1007/978-3-319-10590-1_53) | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant |
| EpsilonAlpha2Beta1        | Alpha2-beta1 rule for convolutional and epsilon rule for fully connected layers. | -                                     | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful |
| EpsilonAlpha2Beta1Flat    | Flat rule for first linear layer, alpha2-beta1 for convolutional, epsilon for fully connected layers. | -                                     | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful |
| EpsilonGammaBox           | ZBox rule for first convolutional layer, gamma for following convolutional, epsilon for fully connected layers. | -                                     | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |
| EpsilonPlus               | Zplus rule for convolutional layers and epsilon rule for fully connected layers. | -                                     | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful |
| EpsilonPlusFlat           | Flat rule for any first linear layer, zplus for other convolutional, epsilon for other fully connected layers. | -                                     | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful |
| ExcitationBackprop        | Implementing ExcitationBackprop [Zhang et al., 2016]. | Zhang, J., Lin, Z. L., Brandt, J., Shen, X., & Sclaroff, S. (2016). Top-down neural attention by excitation backprop. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, 543–559. Springer. [Link](https://doi.org/10.1007/978-3-319-46493-0_33) | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant |
| GuidedBackprop            | Modifying gradients of all ReLUs according to GuidedBackprop [Springenberg et al., 2015]. | Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. A. (2015). Striving for simplicity: the all convolutional net. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings. [Link](http://arxiv.org/abs/1412.6806) | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |
| LayerMapComposite         | A Composite for which hooks are specified by a mapping from module types to hooks. | -                                     | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful |
| MixedComposite            | A Composite for which hooks are specified by a list of composites.  | -                                     | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |
| NameLayerMapComposite     | A Composite for which hooks are specified by both a mapping from module names and module types to hooks. | -                                     | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |
| NameMapComposite          | A Composite for which hooks are specified by a mapping from module names to hooks. | -                                     | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |
| SpecialFirstLayerMapComposite | A Composite for which hooks are specified by a mapping from module types to hooks. | -                                     | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) Review |


##### [Predefined rules](https://zennit.readthedocs.io/en/latest/how-to/use-rules-composites-and-canonizers.html#rules)
| Rule Name              | Description                                                        | Relevance for Time Series (MVP) | Advantages                                      | Disadvantages                                 |
|------------------------|--------------------------------------------------------------------|---------------------------------|-------------------------------------------------|-----------------------------------------------|
| AlphaBeta              | Adaptable rule for different layer types.                          | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Adaptable to different layers.                   | Increases in complexity with layer depth.      |
| Epsilon                | A stable rule, often used as a default for many layers.            | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful    | Simple and stable.                               | Might not capture all relevant features.      |
| Flat                   | Suitable for input layers; provides a basic relevance mapping.     | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Beneficial for input layers.                     | Less informative for deeper layers.            |
| Gamma                  | Balances positive and negative contributions in layers.            | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Balances positive and negative contributions.    | Sensitive to hyperparameter tuning.            |
| ZBox                   | Specific to input normalisation processes.                         | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Specific to input normalization.                | Limited general applicability.                 |
| ZPlus                  | Focuses on positive contributions from the layers.                 | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Focuses on positive contributions.               | Neglects negative contributions.               |
| ZB                     | A balanced approach to attributing relevance.                     | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Provides a balanced attribution.                | Requires careful calibration and tuning.       |
| WSquare                | Emphasises the importance of weights in the network.              | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Highlights weight significance.                  | Not universally suitable for all networks.     |
| WSquareFlat            | A combination of WSquare and Flat rules.                           | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Combines features of WSquare and Flat rules.     | Limited in scope and application.              |
| GuidedBackpropReLU     | Alters ReLU gradients for visualisation in convolutional networks. | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Focused on visualisation in CNNs.                | Specific to CNNs and similar architectures.    |
| PatternAttribution     | Considers layer-wise patterns for attribution.                     | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Accounts for layer-specific patterns.            | Complexity due to need for precomputed patterns. |
| PatternNet             | Provides detailed layer-wise analysis based on patterns.           | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Detailed layer analysis.                         | Requires extensive pre-computation.            |



Teniendo en cuenta los comentarios de ChatGPT y que queremos explicar las atribuciones del recorrido completo, usando varias capas, la tabla quedaría más bien así
| Rule Name              | Description                                                        | Relevance for Time Series (MVP) | Advantages                                      | Disadvantages                                 |
|------------------------|--------------------------------------------------------------------|---------------------------------|-------------------------------------------------|-----------------------------------------------|
| AlphaBeta              | Adaptable rule for different layer types.                          | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Adaptable to different layers.                   | Increases in complexity with layer depth.      |
| Epsilon                | A stable rule, often used as a default for many layers.            | ![#008000](https://via.placeholder.com/15/008000/000000?text=+) Useful    | Simple and stable.                               | Might not capture all relevant features.      |
| Flat                   | Suitable for input layers; provides a basic relevance mapping.     | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Beneficial for input layers.                     | Less informative for deeper layers.            |
| Gamma                  | Balances positive and negative contributions in layers.            | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Balances positive and negative contributions.    | Sensitive to hyperparameter tuning.            |
| ZBox                   | Specific to input normalisation processes.                         | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Specific to input normalization.                | Limited general applicability.                 |
| ZPlus                  | Focuses on positive contributions from the layers.                 | ![#FFA500](https://via.placeholder.com/15/FFA500/000000?text=+) To Review | Focuses on positive contributions.               | Neglects negative contributions.               |
| ZB                     | A balanced approach to attributing relevance.                     | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Provides a balanced attribution.                | Requires careful calibration and tuning.       |
| WSquare                | Emphasises the importance of weights in the network.              | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Highlights weight significance.                  | Not universally suitable for all networks.     |
| WSquareFlat            | A combination of WSquare and Flat rules.                           | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Combines features of WSquare and Flat rules.     | Limited in scope and application.              |
| GuidedBackpropReLU     | Alters ReLU gradients for visualisation in convolutional networks. | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Focused on visualisation in CNNs.                | Specific to CNNs and similar architectures.    |
| PatternAttribution     | Considers layer-wise patterns for attribution.                     | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Accounts for layer-specific patterns.            | Complexity due to need for precomputed patterns. |
| PatternNet             | Provides detailed layer-wise analysis based on patterns.           | ![#808080](https://via.placeholder.com/15/808080/000000?text=+) Not Relevant | Detailed layer analysis.                         | Requires extensive pre-computation.            |


##### Attributors

[Attributtors](https://zennit.readthedocs.io/en/latest/how-to/write-custom-attributors.html) provide an additional layer of abstraction over the context of Composites. 

They are used to directly produce attributions, which may or may not be computed with modified gradients, if they are used, from Composites. 
More information on Attributors, examples and their use can be found in [Using Attributors](https://zennit.readthedocs.io/en/latest/how-to/use-attributors.html).

Attributors can be used to implement non-layer-wise or only partly layer-wise attribution methods. For this, it is enough to define a subclass of zennit.attribution.Attributor and implement its forward() and optionally its __init__() methods
We are focused on Layer-wise-retropropagation. However, in order to check the tool. The example attributor based in Gradient method will be implemented..: 

##### Canonizers(https://zennit.readthedocs.io/en/latest/how-to/use-rules-composites-and-canonizers.html)

Zennit implements propagation-based attribution methods by overwriting the gradient of PyTorch modules within PyTorch’s auto-differentiation engine. There are three building blocks in Zennit to achieve attributions: Rules, Composites and Canonizers. In short, Rules specify how to overwrite the gradient, Composites map rules to modules, and Canonizers transform some module types and configurations to a canonical form, necessary in some cases.

For some modules and operations, Layer-wise Relevance Propagation (LRP) is not implementation-invariant, eg. ```BatchNorm -> Dense -> ReLU``` will be attributed differently than ```Dense -> BatchNorm -> ReLU```. 

Therefore, LRP needs a canonical form of the model, which is implemented in Canonizers. 


### Compute LRP relevance 
> Following the Startting Guide & using relevat composites, attributtors and canonizers according to chatGPT. Well, attributtors should not at first be used, but let's check how they work.

#### Submodules summary for rules decision

Before computing the LRP relevance, let's check the layer in the model so we can better decide the rules that we should use according to their nature.

In [None]:
_, _ = learner_module_leaves_subtables(enc_learner, True)

#### Import the data to be analysed
> TODO: Addapt so only a specific range of TS is used according to a selected ProjectionPoints plot section

In [None]:
from tsai.data.core import get_ts_dls
from tsai.basics import *

In [None]:
lrp_input = enc_input #[ ::enc_artifact.metadata['stride']]
lrp_input.shape

In [None]:
w = 30
stride = 5
print(enc_artifact.metadata['stride'])
#stride = enc_artifact.metadata['stride']
splits = get_forecasting_splits(
        df = df, 
        fcst_history = w,
        fcst_horizon = 1,
        stride = enc_artifact.metadata['stride'], 
        test_size = 0.2,
        show_plot = True
    )

In [None]:
tfms = [ToFloat(), None]
batch_tfms = [
    TSStandardize(
        by_sample=enc_artifact.metadata['norm_by_sample'],
        use_single_batch=enc_artifact.metadata['norm_use_single_batch']
    )
]
dls = get_ts_dls(
    lrp_input, 
    splits=splits, 
    tfms=tfms, 
    bs=enc_artifact.metadata['batch_size'], 
    batch_tfms=batch_tfms
)
dls.show_at(0)

##### Prepare the data

In [None]:
lrp_input = enc_input
print(lrp_input.shape)
lrp_input_strided = lrp_input[::enc_artifact.metadata['stride']]
print(lrp_input_strided.shape)
print(type(lrp_input_strided))
rows_with_nan = np.isnan(lrp_input_strided).any(axis=(1,2))
lrp_input_strided = lrp_input_strided[~rows_with_nan]
lrp_input_strided.shape

In [None]:
# make sure the input requires a gradient
lrp_input_torch = torch.cuda.FloatTensor(lrp_input_strided)
lrp_input_torch.requires_grad = True
# Ensure input to be in GPU
lrp_input_torch.to('cuda')
lrp_input_torch.shape

#### Load the trained model

In [None]:
#window_size = enc_artifact.metadata['w']
#window_size

In [None]:
#batch_size = enc_artifact.metadata['batch_size']
#batch_size

In [None]:
# Ensure model to be in GPU
model = enc_learner.model.to('cuda')

#### Option 1: composite (basic version)

> Using ```EpsilonPlusFlat``` composite

Create a composite instance

In [None]:
from zennit.composites import EpsilonPlusFlat
import copy

In [None]:
# create a composite instance
composite = EpsilonPlusFlat()

In [None]:
# compute the output and gradient within the composite's context
def compute_output_and_gradient(model, input_data):
    model.eval()
    # Just in case to check if the difference in the relevance is cause of no-reversion (not should as it is inside with)
    original_state_dict = copy.deepcopy(model.state_dict())

    with composite.context(model) as modified_model:
        output = modified_model(input_data)
        output.backward(
            gradient=torch.ones_like(input_data),
            retain_graph = True
        ) 
        relevance = torch.autograd.grad(
            output, input_data, 
            torch.ones_like(input_data),
            retain_graph=True
        )
    model.load_state_dict(original_state_dict)
    return output, relevance

In [None]:
# TODO: Revisar por qué salen NaNs
output, relevance = compute_output_and_gradient(model, lrp_input_torch)
relevance = relevance[0]
relevance.shape

##### Reduce relevance dimensions to get an array with a relevance per each feature

In [None]:
def reduce_dimensions(relevance):
    #Get the mean per each feature
    print(relevance.shape)
    importances = relevance.nanmean(axis=2)
    print(importances.shape)
    #Get a global mean instead of one per sample
    importances_mean = importances.nanmean(dim=0).detach().cpu().numpy()
    print(importances_mean.shape)
    #print(importances_mean)
    #Take care of negative values
    min_importance = min(importances_mean)
    if  min_importance < 0:
        importances_mean = importances_mean - min_importance
    print(min(importances_mean))
    print(importances_mean)
    #See as percentage
    importances_sum = np.nansum(importances_mean)
    importances_percentage = (importances_mean/importances_sum)
    print(importances_percentage)
    return importances_percentage

In [None]:
importances = reduce_dimensions(relevance)

##### Visualize

###### Get feature names

In [None]:
features_names = list(df.columns)
features_names

###### Create auxiliar function to check a diagram bar
> TODO: Think how a heatmap would be sawn as and implement if possible

In [None]:
import matplotlib.pyplot as plt

In [None]:
def plot_features_importance(title, features_names, importances_percentage):
    # Crear diagrama de barras
    plt.figure(figsize=(10, 6))
    plt.bar(features_names, importances_percentage)
    plt.xlabel('Features')
    plt.ylabel('Importance')
    plt.title(title + '| Features importance (%)')
    plt.xticks(rotation=45)
    plt.show()

In [None]:
plot_features_importance ('LRP', features_names, importances)

Thus, the two most meaningful features are

In [None]:
def get_meaningful_ids(importances, features_names, numvars = 2):
    meaningful_ids = np.argsort(importances)[-numvars:]
    meaningful_features = [(i,features_names[i], importances[i]) for i in meaningful_ids]
    return meaningful_features, meaningful_ids

In [None]:
meaningful_features, meaningful_ids = get_meaningful_ids(importances, features_names, 2)
meaningful_features

#### Option 2: attributtors 
> Using ```SmoothGrad``` attributor as in the example
>
> Allows to use other attribution-based XAI techniques
> 
> Not relevant as we want to use RLP. But tried for checking

In [None]:
from zennit.attribution import SmoothGrad

In [None]:
attributor = SmoothGrad(model, noise_level=0.1, n_iter=10)

In [None]:
def compute_output_and_relevance_atributor(model, attributor, input):
    # we do not need a composite to compute vanilla SmoothGrad
    model.eval()
    #with atributor_ as attributor:
    with attributor:
         # gradient/ relevance
         output, relevance = attributor(
             input, 
             torch.ones_like(input)
        )
    print('Attributor:', relevance[:2], relevance.shape)
    return output, relevance

In [None]:
_, relevance_smooth_grad = compute_output_and_relevance_atributor(model, attributor, lrp_input_torch)
print(relevance_smooth_grad.shape)

In [None]:
importances_attributor = reduce_dimensions(relevance_smooth_grad)

In [None]:
plot_features_importance('LRP | Attributor Smooth Grad', features_names, importances_attributor)

In [None]:
meaningful_features_attributors, meaningful_ids_attributors = get_meaningful_ids(importances_attributor, features_names)
meaningful_features_attributors

In [None]:
# Remember to compare
meaningful_features

#### Option 3: Canonizers & Attributor & Rule
> Using ```SequentialMergeBatchNorm``` canonizer

> and ```Gradient``` attributor

> and ```EpsilonGammaBox``` as rule

In [None]:
from zennit.canonizers import SequentialMergeBatchNorm
from zennit.attribution import Gradient
from zennit.composites import EpsilonGammaBox

In [None]:
model.eval()
canonizers = [SequentialMergeBatchNorm()]
composite = EpsilonGammaBox(low=-3., high=3., canonizers=canonizers)
attributor = Gradient(model, composite)

In [None]:
def compute_output_and_relevance_canonizer_atributor(model, composite, attributor, input):
    composite.register(model)
    with attributor:
        output, relevance = attributor(
             input,
             torch.ones_like(input)
        )
    composite.remove()
    return output, relevance

In [None]:
_, relevance_canonizer_atributor_rule = compute_output_and_relevance_canonizer_atributor(model, composite, attributor, lrp_input_torch)
importances_canonizer_atributor_rule = reduce_dimensions(relevance_canonizer_atributor_rule)

In [None]:
plot_features_importance('LRP | Canonizer SequentialMergeBatchNorm', features_names, importances_canonizer_atributor_rule)

In [None]:
meaningful_features_canonizer_attributor_rule, meaningful_ids_canonizer_attributor_rule = get_meaningful_ids(importances_canonizer_atributor_rule, features_names)
meaningful_features_canonizer_attributor_rule

In [None]:
print("Sumarise")
print("Composite meaningful features: ", meaningful_features)
print("Attributor meaningful features: ", meaningful_features_attributors)
print("Canonizer & Atributor & Rule meaningful features: ", meaningful_features_canonizer_attributor_rule)

#### Option 4: Canonizer & Rule
> The best option for our goal. Next step: select specific rules for our model (MPV)

> Using ```SequentialMergeBatchNorm``` composite

> and ```EpsilonGammaBox``` as rule

In [None]:
from zennit.canonizers import SequentialMergeBatchNorm
from zennit.composites import EpsilonGammaBox

In [None]:
model.eval()
canonizer2 = SequentialMergeBatchNorm()
composite2 = EpsilonGammaBox(low=-3., high=3., canonizers=canonizer2)

In [None]:
# compute the output and gradient within the composite's context
def compute_output_and_gradient_canonizer_rule(model, composite2, input_data):
    model.eval()
    
    composite.register(model)

    #Do something with the model
    output = model(input_data)
    
    relevance, = torch.autograd.grad(
        output, input_data, 
        torch.ones_like(input_data),
        retain_graph=True
    )
    
    composite.remove()
    return output, relevance

In [None]:
_, relevance_canonizer_rule = compute_output_and_gradient_canonizer_rule(model, composite2, lrp_input_torch)

In [None]:
importances_canonizer_rule = reduce_dimensions(relevance_canonizer_rule)

In [None]:
plot_features_importance('LRP | Canonizer SequentialMergeBatchNorm', features_names, importances_canonizer_rule)

In [None]:
meaningful_features_canonizer_rule, meaningful_ids_canonizer_rule = get_meaningful_ids(importances_canonizer_rule, features_names)
meaningful_features_canonizer_rule

In [None]:
print("Sumarise")
print("Composite meaningful features: ", meaningful_features)
print("Attributor meaningful features: ", meaningful_features_attributors)
print("Canonizer & Atributor & Rule meaningful features: ", meaningful_features_canonizer_attributor_rule)
print("Canonizer & Rule meaningful features: ", meaningful_features_canonizer_rule)

## Final decision

For MVP, the modules types are 

- Add
- BatchNorm1d
- Concat
- Conv1d
- Dropout
- MaxPool1d
- ReLU

The main for LRP analysis are the following, sumarised to check the rules and composites that may be used


| Layer Type  | Rule Recommendation | Composite Recommendation | Canonizer Recommendation | Notes |
|-------------|---------------------|--------------------------|-------------------------|-------|
| BatchNorm1d | - | - | SequentialMergeBatchNorm() for ensuring correct execution order | Ideal for normalizing batch layers. |
| Conv1d      | Epsilon rule | Epsilon - * | - | Further investigation needed to select the specific composite. |
| MaxPool1d   | - | Epsilon | - | Epsilon is stable, no change needed. |
| ReLU        | - | BetaSmooth; DeconvNet | - | DeconvNet may be more suited to visual computing. |

Thus, final implementation should look like ```compute_output_and_gradient_canonizer_rule``` version. The next step is to select the parameters for epsilon composite

## Embedding selection

In this section, you must select a subset of the plot. By default, some random indices will be selected at first.

In [None]:
#!pip install -U plotly
#! pip install -U kaleido

In [None]:
#
#! mamba install canvas -c conda-forge | No conseguido, usar pip install canvas si hace falta | En revisión
#! mamba install -y -c conda-forge ipympl==0.9.3 #0.5.1
#! mamba update -y -c conda-forge nbdime
#! conda install -y -c conda-forge ipywidgets
#! jupyter nbextension enable --py --sys-prefix ipympl

In [None]:
from IPython.display import display, clear_output, HTML as IPHTML
from ipywidgets import Button, Output, VBox, HBox, HTML, Layout, FloatSlider

import plotly.graph_objs as go
import plotly.offline as py
import plotly.io as pio
! pip install kaleido
import kaleido

### Default selection

In [None]:
selected_prjs_points_total = min(prjs.shape[0], 10)
selected_indices = np.random.permutation(lrp_input_torch.size(0))[:selected_prjs_points_total]
print(selected_indices)

### Interactive plot

In [None]:
# TODO: Lo suyo sería pasar cosas a funciones/a W&B y que el notebook empezara directamente aquí, usando para prjs el array de proyecciones que se haya calculado en el notebook anterior en lugar de volverlo a recalcular

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import ipywidgets as widgets
from IPython.display import display
from functools import partial

In [None]:
# Crear una instancia de la clase InteractivePlotter
anomaly_plotter = InteractiveAnomalyPlot(selected_indices, threshold, True, w)

In [None]:
threshold

In [None]:
anomaly_plotter.plot_projections_clusters_interactive(
    prjs,clusters_labels, umap_parameters(config_dr, config_lrp), anomaly_scores, print_flag = False)

In [None]:
anomaly_plotter.selected_indices

In [None]:
#anomaly_scores

### Compute LRP

In [None]:
lrp_input_subset_torch = lrp_input_torch[selected_indices]
lrp_input_subset_torch.shape

In [None]:
model.eval()
canonizer_subset = SequentialMergeBatchNorm()

In [None]:
_, relevance_subset = compute_output_and_gradient_canonizer_rule(model, canonizer_subset, lrp_input_subset_torch)

In [None]:
importances_subset = reduce_dimensions(relevance_subset)

In [None]:
plot_features_importance('LRP | Canonizer SequentialMergeBatchNorm', features_names, importances_subset)

In [None]:
#meaningful_features_subset, meaningful_features_subset_ids = get_meaningful_ids(importances_canonizer_rule, features_names)
meaningful_features_subset, meaningful_features_subset_ids = get_meaningful_ids(importances_subset, features_names)

In [None]:
meaningful_features_subset

### Linking back points of the 2D projection to the original time series and plotting the associated windows
The variable ```selected_indices``` contains an array of the selected points indices selected in the previous 2D projection. From this indices, we will get the corresponding windows of the original space

In [None]:
anomaly_plotter.selected_indices

In [None]:
lrp_input.shape

In [None]:
# Crear el gráfico interactivo
ts_plot_interactive(df, anomaly_plotter.selected_indices, meaningful_features_subset_ids, w, stride = stride)

In [None]:
#| hide
run_lrp.finish()

In [None]:
beep(0.025)
beep(0.025)
beep(0.025)
print("Execution ended")