# TEAM 3

This notebook provides the workflow within all the steps that compose the TEAM 3 algorithm within the SPADA methodology for threat modelling.

#### Setup

Run the following cells before starting with your analysis.

*Remember to install dependencies only once, so just comment the cell after executed. On the other hand, module imports need to be done on each run.*

In [None]:
#!pip install -r requirements.txt

In [None]:
import os
import pandas as pd
from ipywidgets import widgets
from itables import init_notebook_mode

from embracing_utils import compute_semantic_similarity_scores, merge_csv_files, filter_dataframe_by_threshold, find_synset_relations, synset_relations

init_notebook_mode(all_interactive=True)

### Input Threats Upload and Semantic Similarity Computation

Please upload the input threats list in CSV format (suggested path: `./data/TEAM 3/<round_no>/<file>.csv`).

Remember to perform this operation at the start of a new round. Just change the `round_no` variable with the name of the current step and the `input_threat_list_path` variable with the path to the threat list.

In [None]:
round_no = 'third'  # CHANGE ME!
input_threat_list_path = (f"./data/TEAM 3/{round_no}/input_threats.csv")  # CHANGE ME!
# Create new folder for the current round (if not exists)
if not os.path.exists(f"./data/TEAM 3/{round_no}"):
    os.makedirs(f"./data/TEAM 3/{round_no}")

You can proceed and run the next cell within the correct path to compute the semantic similarity scores.

In [None]:
min_score, max_score = compute_semantic_similarity_scores(
    input_threat_list_path, f"./data/TEAM 3/{round_no}/semantic_similarity_scores.csv")

Alternatively, especially if the number of input threats is large, you can compute the semantic similarity scores by running the `compute_tuple_similarity_scores.py` script in a Terminal:

```sh
python3 compute_tuple_similarity_scores.py --k <k> --in_path ./data/input_threats.csv [--scores_path ./data/input_threats_semantic_similarity_scores.csv] --out_path ./data/output_similarity_scores.csv
```

Where:

- --k <k>: Specifies the tuple cardinality for similarity scoring.
- --in_path: Path to the input threats CSV file.
- --scores_path: Path to the precomputed semantic similarity scores CSV (only needed if k > 2).
- --out_path: Path to the output file for the computed similarity scores.

The script will compute the semantic similarity scores and store them into the output path provide (e.g., `./data/TEAM 3/{input_threats_filename}_ss_scores.csv`).

Once the script terminates, you can load the semantic similarity scores as follows:

In [None]:
ss_scores_df = pd.read_csv(f"./data/TEAM 3/{round_no}/semantic_similarity_scores.csv")
print(f'Total number of input threats submitted for {round_no} round: {len(ss_scores_df.index)}')

### Embraceable Candidates Elicitation

Please set your desirable semantic similarity score threshold by running the following cell and adjusting the slider according to your target number of desiderable final threats.

In [None]:
ss_threshold = widgets.FloatSlider(value=0.5,min=min_score, max=max_score, step=0.01, description='threshold:', readout_format='.2f')
ss_threshold

In [None]:
embraceable_candidates = filter_dataframe_by_threshold(ss_scores_df, 'score', ss_threshold.value)
print(f'The list of embraceable candidates above the threshold {round(ss_threshold.value, 2)} contains {len(embraceable_candidates)} pairs.')
embraceable_candidates

Once you reached your desiderable amount of embrace candidates, just run the following cell to store the data and proceed to the next step.

In [None]:
embraceable_candidates.to_csv(
    f"./data/TEAM 3/{round_no}/embraceable_candidates_{round(ss_threshold.value, 2)}.csv",
    index=True,
    encoding="utf-8",
)
merge_csv_files(
    input_threat_list_path,
    f"./data/TEAM 3/{round_no}/embraceable_candidates_{round(ss_threshold.value, 2)}.csv",
    f"./data/{round_no}/tmp.csv",
)
embraceable_candidates = pd.read_csv(
    f"./data/TEAM 3/{round_no}/embraceable_candidates_{round(ss_threshold.value, 2)}.csv"
)
embraceable_candidates.set_index("index", drop=True, inplace=True)

### Threat Embracing

Now you are able to further investigate all the threat pairs with a score equal or greater to such a threshold. To display the table, please run the following cell. Then, you should iterate for each pair candidate and annotate the embracing in an external (Excel-like) sheet.

In [None]:
embraceable_candidates

For each row of interesed, specify its index and run the following cell to focus the analysis on such a specific threat pair.

In [None]:
# Please specify the index of the pair you want to embrace.
index_to_embrace = 70  # CHANGE ME!
embraceable_candidates.iloc[embraceable_candidates.index==index_to_embrace]

Now that you shifted the focus of the analysis on a specific threat pair, run the following cell to obtain automatically identified (if present) synset relations.

In [None]:
s1 = embraceable_candidates.iloc[embraceable_candidates.index==index_to_embrace]['sentence1'].values[0]
s2 = embraceable_candidates.iloc[embraceable_candidates.index==index_to_embrace]['sentence2'].values[0]
is_partof, is_typeof = find_synset_relations(s1, s2)
print(f'\nPart of relation(s) found: {is_partof}\tType of relation(s) found: {is_typeof}')

To support the most appropriate choice of wording/level of detail, run the last cell for an overview of the synset relations related to the nouns identified in both threat labels.

In [None]:
synset_dict1 = synset_relations(embraceable_candidates.iloc[embraceable_candidates.index==index_to_embrace]['sentence1'].values[0])
synset_dict2 = synset_relations(embraceable_candidates.iloc[embraceable_candidates.index==index_to_embrace]['sentence2'].values[0])

focus_df = pd.concat([pd.DataFrame.from_dict(synset_dict1['terms']), pd.DataFrame.from_dict(synset_dict2['terms'])])
if not focus_df.empty:
    focus_df['synonyms'] = focus_df.get('synonyms').str.slice(0,3)
    focus_df['hypernyms [L1]'] = focus_df.get('hypernyms [L1]').str.slice(0,1)
    focus_df['hyponyms [L1]'] = focus_df.get('hyponyms [L1]').str.slice(0,1)
    focus_df['meronyms'] = focus_df.get('meronyms').str.slice(0,1)
    focus_df['holonyms'] = focus_df.get('holonyms').str.slice(0,1)
    focus_df

#### Store Synset Relations Extraction from Embraceable Candidates

You can also store the synset relations in batch by running the following cell, should you need them for a future analysis.

In [None]:
import sys


# Reference to current sdtout
old_stdout = sys.stdout

# Create new folder for the current round (if not exists)
if not os.path.exists(f'./data/{round_no}/synset_relations'):
    os.makedirs(f'./data/{round_no}/synset_relations')

for i, pair in embraceable_candidates.iterrows():
    path = f'./data/{round_no}/synset_relations/pair_comparison_{i}.txt'
    sys.stdout = open(path, 'w')

    s1 = embraceable_candidates.iloc[embraceable_candidates.index==i]['sentence1'].values[0]
    s2 = embraceable_candidates.iloc[embraceable_candidates.index==i]['sentence2'].values[0]
    is_partof, is_typeof = find_synset_relations(s1, s2)
    print(f'\nPart of relation(s) found: {is_partof}\tType of relation(s) found: {is_typeof}')

    sys.stdout = old_stdout

    synset_dict1 = synset_relations(embraceable_candidates.iloc[embraceable_candidates.index==i]['sentence1'].values[0])
    synset_dict2 = synset_relations(embraceable_candidates.iloc[embraceable_candidates.index==i]['sentence2'].values[0])

    focus_df = pd.concat([pd.DataFrame.from_dict(synset_dict1['terms']), pd.DataFrame.from_dict(synset_dict2['terms'])])
    if not focus_df.empty:
        focus_df['synonyms'] = focus_df.get('synonyms').str.slice(0,3)
        focus_df['hypernyms [L1]'] = focus_df.get('hypernyms [L1]').str.slice(0,1)
        focus_df['hyponyms [L1]'] = focus_df.get('hyponyms [L1]').str.slice(0,1)
        focus_df['meronyms'] = focus_df.get('meronyms').str.slice(0,1)
        focus_df['holonyms'] = focus_df.get('holonyms').str.slice(0,1)
        focus_df.to_csv(f'./data/{round_no}/synset_relations/single_terms_{i}.csv', index=False, encoding='utf-8')

# Reset stdout
sys.stdout = old_stdout