# Risk Atlas Nexus: Preparing taxonomy mappings

## Goal: prepare your own mapping files 
This notebook aims to assist in the understanding of mapping files and aid the user in preparing their own mapping file for risks from a given taxonomy.

## Dependencies

Tip: Ensure you have followed installation instructions for the risk_atlas_nexus library

```
git clone git@github.com:IBM/risk-atlas-nexus.git
cd risk-atlas-nexus
python -m venv vrisk-atlas-nexus
source vrisk-atlas-nexus/bin/activate
pip install -e .
```


In [None]:
import os

from sssom_schema import Mapping, MappingSet
from sssom.sssom_document import MappingSetDocument 
from sssom.util import MappingSetDataFrame 
from sssom.writers import write_table
from curies import Converter
from enum import Enum

from risk_atlas_nexus import RiskAtlasNexus


## Introduction

### How are mappings stored in Risk Atlas Nexus?
To express some semantically meaningful mapping between risks from different taxonomies, Risk Atlas Nexus makes use of 
the [Simple Standard for Sharing Ontological Mappings (SSSOM)](https://academic.oup.com/database/article/doi/10.1093/database/baac035/6591806)
. The mappings are maintained in SSOM TSV files and are converted to LinkML data
YAML using Python helper scripts.

### Anatomy of a TSV file
A SSSOM/TSV file contains **one** mapping set object, composed of **two** different parts:
- the metadata block, which contains essentially all the slots of a [MappingSet](https://mapping-commons.github.io/sssom/MappingSet/) class except the mappings slot;
- the mappings block (also called the TSV section), which contains the individual mappings.

#### Find out more
- Read about [The SSSOM/TSV serialisation format](https://mapping-commons.github.io/sssom/spec-formats-tsv/)

## Scenario: prepare new mapping file
Consider a case where you would like to generate mappings for your new list of risk against risks already in the nexus graph.
In this notebook we can see how to prepare a TSV file either:
 1. Manually
 2. Experimental: semi-automatically, with the aid of library functions as shown below

Note: In both cases it is strongly recommended mappings should be carefully reviewed before being used or contributed to the Risk Atlas Nexus project.


# Helper functions

A few utility functions to generate the mapping block output are provided below.

In [None]:
class MappingMethod(Enum):
    SEMANTIC = "SEMANTIC"
    RITS_INFERENCE = "RITS_INFERENCE"

def prepare_mapping_metadata(cm) -> MappingSet:
    mapping_set_metadata = MappingSet(license=cm["license"], curie_map=cm["curie_map"], mapping_set_id=cm["mapping_set_id"], mapping_set_description=cm["mapping_set_description"], mapping_date=cm["mapping_date"])
    return mapping_set_metadata

def prepare_mapping_block(new_risks, existing_risks, new_prefix, mapping_method=MappingMethod.SEMANTIC):
    ran = RiskAtlasNexus()
    mappings = ran.generate_proposed_mappings(new_risks=new_risks, existing_risks=existing_risks, inference_engine=None, new_prefix=new_prefix, mapping_method=mapping_method)
    return mappings

def combine_blocks_and_write_to_file(cm, metadata, mappings, path):
    metadata.mappings = mappings
    converter = Converter.from_prefix_map(cm["curie_map"])
    document = MappingSetDocument(mapping_set=metadata, converter=converter)
    print(f"\n# The mapping set document instance has been prepared.") 
    msdf = MappingSetDataFrame.from_mapping_set_document(document)
    print(f"\n# The mapping set dataframe instance has been prepared.") 
    with open(path, "w") as tmp_file:
        write_table(msdf, tmp_file)


## Creating a TSV file:

### Prepare the metadata block
The variables declared in yaml in the cell below will be used to below to create a MappingSet instance to create the mapping metadata. Edit them for your case, and remember to add your new prefix to curie map.

In [None]:
import yaml
cm = yaml.safe_load("""
curie_map:
 nistai: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
 ibmairisk: https://www.ibm.com/docs/en/watsonx/saas?topic=
 semapv: https://w3id.org/semapv/vocab/
 skos: http://www.w3.org/2004/02/skos/core#
 new_prefix: https://github.com/ibm/risk-atlas-nexus
mapping_set_id: https://github.com/IBM/risk-atlas-nexus/tree/main/src/data/mappings/my_mapping.tsv
mapping_set_description: Mapping from IBM AI Risk Atlas to NIST RMF Gen AI Profile
license: https://www.apache.org/licenses/LICENSE-2.0.html
mapping_date: "2025-01-29"
""")

print(f"\n# The YAML you will use has been prepared.") 
print(cm)

In [None]:
ms_metadata = prepare_mapping_metadata(cm)
print(f"\n# The mapping set metadata instance has been prepared.") 
ms_metadata

## Manual creation
You can choose to prepare a list of mappings manually, to populate the mapping block. These should be in [Mapping](https://mapping-commons.github.io/sssom/Mapping/) format.

In [None]:
# Prepare the mapping block (manual_mb)
m1 = Mapping(predicate_id='skos:relatedMatch', mapping_justification='semapv:ManualMappingCuration', subject_id='new_prefix:my-risk-1-id', subject_label='Violent Crimes', object_id='ibm-risk-atlas:atlas-harmful-output', object_label=' Harmful output', author_id=['my_author_email_address'], mapping_date='2025-03-31', comment='A sample mapping')
m2 = Mapping(predicate_id='rdfs:seeAlso', mapping_justification='semapv:ManualMappingCuration', subject_id='new_prefix:my-risk-2-id', subject_label='Nonviolent Crimes', object_id='ibm-risk-atlas:atlas-harmful-output', object_label=' Harmful output', author_id=['my_author_email_address'], mapping_date='2025-03-31',  comment='A sample mapping')
manual_mb = [m1, m2]

# bring it together with metadata and write to file
tmp_path = os.path.join("test_write_sssom_dataframe_manual.tsv")
combine_blocks_and_write_to_file(cm=cm, metadata=ms_metadata, mappings=manual_mb, path=tmp_path)


## Auto creation

Alternatively, can choose to prepare a list of mappings semi-automatically, to populate the mapping block, using library methods.  This takes as input two lists of risks which are to be mapped to each other.

Two methods are available to propose mappings:
- Semantic (queries an embedding of available risks)
- Inference (LLM query to find if risks might be related)


In [None]:
# Set up Risk Atlas Nexus with all risks or the subset of risks you want to map to.
# In this case, specify IBM AI Risk Atlas only
ran = RiskAtlasNexus()
all_risks = ran.get_all_risks(taxonomy="ibm-risk-atlas")

print(f"\n# The taxonomy ibm-risk-atlas has {len(all_risks)} risks you can map to.") # 67
print(all_risks[:2])

# Set up a second instance Risk Atlas Nexus with only the risks you are proposing to map

# If your risks exist already in the Atlas, just do:
my_taxonomy_name = "ailuminate-v1.0" # for example
risks_to_map = ran.get_all_risks(taxonomy=my_taxonomy_name)

# else if they do not yet exist, use the lines below
'''
# Create an instance which extends the graph with your custom definitions
my_base_dir='<my_user_input_dir_path>' # path where your custom yaml is
my_extended_ran = RiskAtlasNexus(base_dir=my_base_dir)
risks_to_map = my_extended_ran.get_all_risks(taxonomy="<my-taxonomy-name>")
'''

print(f"\n# The taxonomy <my-taxonomy-name> has {len(risks_to_map)} risks you can map to.") 
print(risks_to_map[:2])

auto_semantic_mb = prepare_mapping_block(risks_to_map, all_risks, "new_prefix", "SEMANTIC")
print(f"\n# The mapping set block instance has been prepared.") 

# bring it together with metadata and write to file
tmp_path = os.path.join("test_write_sssom_dataframe_automatic_semantic.tsv")
combine_blocks_and_write_to_file(cm=cm, metadata=ms_metadata, mappings=auto_semantic_mb, path=tmp_path)


# Next steps

1. Verify the mappings
2. Lift them to YAML format,
3. Save them in the [knowledge graph data mapping folder](https://github.com/ibm/risk-atlas-nexus/src/risk_atlas_nexus/data/knowledge_graph/mapping/)
