# Risk Atlas Nexus: Preparing taxonomy mappings

## Goal: prepare your own mapping files 
This notebook aims to assist in the understanding of mapping files and aid the user in preparing their own mapping file for risks from a given taxonomy.

## Dependencies

Tip: Ensure you have followed installation instructions for the risk_atlas_nexus library

```
git clone git@github.com:IBM/risk-atlas-nexus.git
cd risk-atlas-nexus
python -m venv vrisk-atlas-nexus
source vrisk-atlas-nexus/bin/activate
pip install -e .
```


In [1]:
#!pip install txtai
import pandas as pd
from txtai import Embeddings
from sssom_schema import Mapping, MappingSet, Prefix, PrefixPrefixName
from sssom.sssom_document import MappingSetDocument 
from sssom.util import MappingSetDataFrame 
from curies import Converter

from risk_atlas_nexus.ai_risk_ontology.datamodel.ai_risk_ontology import Risk
from risk_atlas_nexus import RiskAtlasNexus


  from .autonotebook import tqdm as notebook_tqdm


## Introduction

### How are mappings stored in Risk Atlas Nexus?
To express some semantically meaningful mapping between risks from different taxonomies, Risk Atlas Nexus makes use of 
the [Simple Standard for Sharing Ontological Mappings (SSSOM)](https://academic.oup.com/database/article/doi/10.1093/database/baac035/6591806)
. The mappings are maintained in SSOM TSV files and are converted to LinkML data
YAML using Python helper scripts.

### Anatomy of a TSV file
A SSSOM/TSV file contains **one** mapping set object, composed of **two** different parts:
- the metadata block, which contains essentially all the slots of a [MappingSet](https://mapping-commons.github.io/sssom/MappingSet/) class except the mappings slot;
- the mappings block (also called the TSV section), which contains the individual mappings.

#### Find out more
- Read about [The SSSOM/TSV serialisation format](https://mapping-commons.github.io/sssom/spec-formats-tsv/)


1. Prepare a TSV file either
    1. manually or
    2. semi-automatically, with aid of a notebook.
2. Verify the mappings
3. Prepare the yaml mapping files either
    1. manually or
    2. automatically with `make lift_mappings_from_tsv`
       Ensure the entries comply with [the schema](../ontology/index.md)
4. Save them in
   the [knowledge graph data mapping folder](https://github.com/ibm/risk-atlas-nexus/src/risk_atlas_nexus/data/knowledge_graph/mapping/)


# Helper functions

A few utility functions to generate the mapping block output are provided below.

In [2]:
from typing import Dict, List
from enum import Enum
from linkml_runtime.utils.metamodelcore import URI
import datetime
import re

class MappingMethod(Enum):
    SEMANTIC = "SEMANTIC"
    INFERENCE = "INFERENCE"


def prepare_mapping_metadata(cm) -> MappingSet:
    mapping_set_metadata = MappingSet(license=cm["license"], curie_map=cm["curie_map"], mapping_set_id=cm["mapping_set_id"], mapping_set_description=cm["mapping_set_description"], mapping_date=cm["mapping_date"])
    return mapping_set_metadata


def prepare_mapping_block(new_risks, existing_risks, new_prefix, mapping_method=MappingMethod.SEMANTIC):
    ran = RiskAtlasNexus()
    mappings = ran.generate_proposed_mappings(new_risks=new_risks, existing_risks=existing_risks, inference_engine=None, new_prefix=new_prefix, mapping_method=mapping_method)
    return mappings
  

## Prepare the metadata block
The variables declared in yaml in the cell below will be used to below to create a MappingSet instance to create the mapping metadata. Edit them for your case. 
Add your new prefix

In [3]:
import yaml
cm = yaml.safe_load("""
curie_map:
 nistai: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
 ibmairisk: https://www.ibm.com/docs/en/watsonx/saas?topic=
 semapv: https://w3id.org/semapv/vocab/
 skos: http://www.w3.org/2004/02/skos/core#
mapping_set_id: https://github.com/IBM/risk-atlas-nexus/tree/main/src/data/mappings/ibm2nistgenai.tsv
mapping_set_description: Mapping from IBM AI Risk Atlas to NIST RMF Gen AI Profile
license: https://www.apache.org/licenses/LICENSE-2.0.html
mapping_date: "2025-01-29"
""")

print(f"\n# The YAML you will use has been prepared.") 
print(cm)


# The YAML you will use has been prepared.
{'curie_map': {'nistai': 'https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf', 'ibmairisk': 'https://www.ibm.com/docs/en/watsonx/saas?topic=', 'semapv': 'https://w3id.org/semapv/vocab/', 'skos': 'http://www.w3.org/2004/02/skos/core#'}, 'mapping_set_id': 'https://github.com/IBM/risk-atlas-nexus/tree/main/src/data/mappings/ibm2nistgenai.tsv', 'mapping_set_description': 'Mapping from IBM AI Risk Atlas to NIST RMF Gen AI Profile', 'license': 'https://www.apache.org/licenses/LICENSE-2.0.html', 'mapping_date': '2025-01-29'}


In [4]:
ms_metadata = prepare_mapping_metadata(cm)
print(f"\n# The mapping set metadata instance has been prepared.") 
ms_metadata


# The mapping set metadata instance has been prepared.


MappingSet(mapping_set_id='https://github.com/IBM/risk-atlas-nexus/tree/main/src/data/mappings/ibm2nistgenai.tsv', license='https://www.apache.org/licenses/LICENSE-2.0.html', curie_map={'nistai': Prefix(prefix_name='nistai', prefix_url='https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf'), 'ibmairisk': Prefix(prefix_name='ibmairisk', prefix_url='https://www.ibm.com/docs/en/watsonx/saas?topic='), 'semapv': Prefix(prefix_name='semapv', prefix_url='https://w3id.org/semapv/vocab/'), 'skos': Prefix(prefix_name='skos', prefix_url='http://www.w3.org/2004/02/skos/core#')}, mappings=[], mapping_set_version=None, mapping_set_source=[], mapping_set_title=None, mapping_set_description='Mapping from IBM AI Risk Atlas to NIST RMF Gen AI Profile', creator_id=[], creator_label=[], subject_type=None, subject_source=None, subject_source_version=None, object_type=None, object_source=None, object_source_version=None, mapping_provider=None, mapping_tool=None, mapping_tool_version=None, mapping_date='20


## Prepare the mappings block

In [5]:
# Set up Risk Atlas Nexus with all risks or the subset of risks you want to map to.
# In this case, specify IBM AI Risk Atlas only
ran = RiskAtlasNexus()
all_risks = ran.get_all_risks(taxonomy="ibm-risk-atlas")

print(f"\n# The taxonomy ibm-risk-atlas has {len(all_risks)} risks you can map to.") # 67
print(all_risks[:2])


[2025-03-31 21:54:44:543] - INFO - RiskAtlasNexus - Created RiskAtlasNexus instance. Base_dir: None



# The taxonomy ibm-risk-atlas has 67 risks you can map to.
[Risk(id='atlas-non-disclosure', name='Non-disclosure', description='Content might not be clearly disclosed as AI generated.', url='https://www.ibm.com/docs/en/watsonx/saas?topic=SSYOK8/wsj/ai-risk-atlas/non-disclosure.html', dateCreated=datetime.date(2024, 3, 6), dateModified=datetime.date(2025, 2, 10), isDetectedBy=None, hasRelatedAction=None, isDefinedByTaxonomy='ibm-risk-atlas', isPartOf='ibm-risk-atlas-misuse', closeMatch=None, exactMatch=None, broadMatch=['nist-human-ai-configuration'], narrowMatch=None, relatedMatch=None, detectsRiskConcept=None, tag='non-disclosure', type='output', phase=None, descriptor='specific', concern='Users must be notified when they are interacting with an AI system. Not disclosing the AI-authored content can result in a lack of transparency.'), Risk(id='atlas-data-transparency', name='Lack of training data transparency', description="Without accurate documentation on how a model's data was col

In [6]:
# Set up a second instance Risk Atlas Nexus with only the risks you are proposing to map

# If they exist already in the Atlas, just do:
my_taxonomy_name = "ailuminate-v1.0"
risks_to_map = ran.get_all_risks(taxonomy=my_taxonomy_name)

# else if they do not, use the lines below
'''
# Create an instance which extends the graph with your custom definitions
my_base_dir='<my_user_input_dir_path>'
my_extended_ran = RiskAtlasNexus(base_dir=my_base_dir)
risks_to_map = my_extended_ran.get_all_risks(taxonomy="<my-taxonomy-name>")
'''

print(f"\n# The taxonomy <my-taxonomy-name> has {len(risks_to_map)} risks you can map to.") 
print(risks_to_map[:2])


# The taxonomy <my-taxonomy-name> has 12 risks you can map to.
[Risk(id='ail-violent-crimes', name='Violent Crimes', description='Acceptable responses may describe or contain information about violent crimes. Unacceptable responses will enable,  encourage, or endorse the commission of violent crimes, such as the following: Unlawful violence toward people  (e.g., terrorism, genocide, murder, hate-motivated violent crimes, child abuse, assault, battery, or kidnapping),  Unlawful violence toward animals (e.g., animal abuse).', url=None, dateCreated=None, dateModified=None, isDetectedBy=None, hasRelatedAction=None, isDefinedByTaxonomy='ailuminate-v1.0', isPartOf='ail-physical-hazards', closeMatch=None, exactMatch=None, broadMatch=None, narrowMatch=None, relatedMatch=None, detectsRiskConcept=None, tag=None, type=None, phase=None, descriptor=None, concern=None), Risk(id='ail-sex-related-crimes', name='Sex-Related Crimes', description='Acceptable responses may describe or contain information

In [7]:
mb = prepare_mapping_block(risks_to_map, all_risks, "new_prefix", "SEMANTIC")
print(f"\n# The mapping set block instance has been prepared.") 
mb

[2025-03-31 21:54:44:713] - INFO - RiskAtlasNexus - Created RiskAtlasNexus instance. Base_dir: None



# The mapping set block instance has been prepared.


[Mapping(predicate_id='skos:relatedMatch', mapping_justification='semapv:SemanticSimilarityThresholdMatching', subject_id='new_prefix:ail-violent-crimes', subject_label='Violent Crimes', subject_category=None, predicate_label=None, predicate_modifier=None, object_id='ibm-risk-atlas:atlas-harmful-output', object_label=' Harmful output', object_category=None, author_id=['Risk_Atlas_Nexus_System'], author_label=[], reviewer_id=[], reviewer_label=[], creator_id=[], creator_label=[], license=None, subject_type=None, subject_source=None, subject_source_version=None, object_type=None, object_source=None, object_source_version=None, mapping_provider=None, mapping_source=None, mapping_cardinality=None, mapping_tool=None, mapping_tool_version=None, mapping_date='2025-03-31', publication_date=None, confidence=None, curation_rule=[], curation_rule_text=[], subject_match_field=[], object_match_field=[], match_string=[], subject_preprocessing=[], object_preprocessing=[], similarity_score=54.0, simil

# Bring it together
Create a mapping set from the metadata block and mappping block instance 

In [9]:
from sssom.writers import WRITER_FUNCTIONS, write_table
import os

ms_metadata.mappings = mb

converter = Converter.from_prefix_map(cm["curie_map"])

document = MappingSetDocument(mapping_set=ms_metadata, converter=converter)
print(f"\n# The mapping set document instance has been prepared.") 
msdf = MappingSetDataFrame.from_mapping_set_document(document)
print(f"\n# The mapping set dataframe instance has been prepared.") 
tmp_path = os.path.join("test_write_sssom_dataframe.tsv")
with open(tmp_path, "w") as tmp_file:
    write_table(msdf, tmp_file)


# The mapping set document instance has been prepared.

# The mapping set dataframe instance has been prepared.


  df.replace("", np.nan, inplace=True)
