# Semi-Automated Embracing Approach Validation

This notebook provides the workflow within all the steps to validate the Semi-Automated Embracing Approach for (Privacy) Threat Modelling.

#### Setup

Run the following cells before starting with your validation.

*Remember to install dependencies only once, so just comment the cell after executed. On the other hand, module imports need to be done on each run.*

In [None]:
#!pip install -r requirements.txt

In [None]:
import os
import pandas as pd
from itables import init_notebook_mode

from embracing import compute_semantic_similarity_scores, compute_semantic_similarity_scores_between_files

init_notebook_mode(all_interactive=True)

## Compute Semantic Similarity Scores



In [None]:
preliminary_threat_list_path = f'./data/validation/tool_final_threats.csv'
tool_min_score, tool_max_score = compute_semantic_similarity_scores(preliminary_threat_list_path, f'./data/tool_final_threats_ss_scores.csv')

In [None]:
preliminary_threat_list_path = f'./data/validation/vehits_final_threats.csv'
validator_min_score, validator_max_score = compute_semantic_similarity_scores(preliminary_threat_list_path, f'./data/vehits_final_threats_ss_scores.csv')

## Compute Semantic Similarity Scores for Validation

The following code computes the semantic similarity scores between each final threat from the automated list with each threat from the validation list. Then, the maximum value is considered as a metrics of comparison.

In [None]:
tool_output_path = f'./data/validation/tool_final_threats.csv'
validator_source_output_path = f'./data/validation/vehits_final_threats.csv'
output_comparison_ss_scores_path = f'./data/validation/comparison_ss_scores.csv'
min_score, max_score = compute_semantic_similarity_scores_between_files(tool_output_path, validator_source_output_path, output_comparison_ss_scores_path)

### Statistics


In [None]:
# ss_scores_df = pd.read_csv(f'./data/final_comparison_semantic_similarity_scores.csv')
# if 'min_score' not in globals() or 'max_score' not in globals():
#     min_score, max_score = ss_scores_df['score'].round(2).min(), ss_scores_df['score'].round(2).max()
ss_scores_df

In [None]:
density_plot = ss_scores_df.plot.density(title=f'Similarity score range is between {min_score} and {max_score}')
density_plot.set_xlim(min_score, max_score)
hist_plot = ss_scores_df.plot.hist(title=f'Average similarity score is {ss_scores_df["score"].mean().round(2)}')


In [None]:
filtered_df = ss_scores_df.loc[ss_scores_df['score'] <= 0.999999]
filtered_df