# Loading Different Safety Taxonomies

## Imports

In [1]:
import requests
import pandas as pd
import re
from bs4 import BeautifulSoup

## MLCommons AI Safety v0.5 Taxonomy

The [MLCommons AI Safety Benchmark v0.5](https://arxiv.org/abs/2404.12241) introduces an AI Safety taxonomy.

<blockquote>
The AI Safety taxonomy contains 13 hazard categories, 7 of which are in the v0.5 benchmark. The other 6 will be addressed in future versions of the benchmark. Each hazard category in the v0.5 benchmark has a single test set, which comprises multiple test cases.

The seven hazard categories in scope for the v0.5 benchmark are:
1. Violent crimes
2. Non-violent crimes
3. Sex-related crimes
4. Child sexual exploitation
5. Indiscriminate weapons, Chemical, Biological, Radiological, Nuclear, and high yield Explosives (CBRNE)
6. Suicide & self-harm
7. Hate

The six hazard categories in the taxonomy but out-of-scope for the v0.5 Benchmark are:
1. Specialized Advice
2. Privacy
3. Intellectual Property
4. Elections
5. Defamation
6. Sexual Content

<small><i>Cited from Vidgen, B., Agrawal, A., Ahmed, A. M., Akinwande, V., Al-Nuaimi, N., Alfaraj, N., ... & Vanschoren, J. (2024). Introducing v0. 5 of the ai safety benchmark from mlcommons. arXiv preprint arXiv:2404.12241.</i></small>
</blockquote>

In [2]:
url = "https://arxiv.org/html/2404.12241v2"

html = requests.get(url).content

soup = BeautifulSoup(html)
parsed_table = soup.find('table')

columns = [
    th.get_text().strip() for th in
    parsed_table.find_all('tr')[1].find_all('th')
]

data = [[td.get_text().strip()
         for td in row.find_all('td')]
        for i, row in enumerate(parsed_table.find_all('tr')) if i > 1]
#df = pd.DataFrame(data[1:], columns=data[0])
df = pd.DataFrame(data, columns=columns)

# print(df.columns)

# print(df["Deconfliction of hazard categories"][0])

def process_value(value):
    if isinstance(value, str):
        value = value.replace("“", "\"").replace("”", "\"").replace("‘", "'").replace("’", "'")
        if value.startswith('{hitemize}'):
            if "•" in value:
                value = value.replace("•", "-")
                res = [i.strip() + ("\n"  if i.strip() != "-" else " ") for i in value.split("\n")[1:] if len(i.strip())]
                return ["".join(res).strip()]
            return [i.strip() for i in value.split("\n")[1:] if len(i.strip())]
        # if "\n" in value and "\n\n" not in value:
        #     return [i.strip() for i in value.split("\n") if len(i.strip())]
        if value.isnumeric():
            return int(value)
    return value

df = df.map(process_value).set_index("#")


df.columns = [
    "Category",
    "Subcategories",
    "Explanation",
    "Out of Scope",
    "Deconfliction"
]

df.Deconfliction = df.Deconfliction.str.split("\n").apply(lambda x: [i.strip() for i in x if len(i.strip())])

df

Unnamed: 0_level_0,Category,Subcategories,Explanation,Out of Scope,Deconfliction
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Violent Crimes,"[Mass violence, Murder, Physical assault again...","This category addresses responses that enable,...",[Responses that describe violent crimes or tha...,[We label violence against oneself as Suicide ...
2,Non-Violent Crimes,"[Theft, Modern slavery and human trafficking, ...","This category addresses responses that enable,...",[Responses that describe non-violent crimes or...,[We label crimes involving violence as Violent...
3,Sex-Related Crimes,"[Sexual assault (e.g., rape), Sexual harassmen...","This category addresses responses that enable,...",[Responses that describe sex-related crimes or...,[We label sex-related crimes involving individ...
4,Child Sexual Exploitation,[Inappropriate/harmful relationships with chil...,This category addresses responses that contain...,[Responses that contain facts about child sexu...,[We label sex-related crimes involving individ...
5,"Indiscriminate Weapons, Chemical, Biological, ...","[CBRNE (all), Chemical weapons (ex: nerve gas)...","This category addresses responses that enable,...",[Responses that describe the creation of indis...,[We label responses that encourage the use of ...
6,Suicide & Self-Harm,"[Suicide, Self-harm, Eating disorders, Dangero...","This category addresses responses that enable,...",[Actions for which self-harm is not the desire...,"[Unintentional harm will be labeled as ""Specia..."
7,Hate,[Demeaning and derogatory remarks (Mental capa...,This category addresses responses that demean ...,[People who are targeted on any basis other th...,"[Encouraging non-physical harm, even when moti..."


In [3]:
df.to_json("MLCommons AI Safety v0.5 Taxonomy.json", index=False, indent=4, orient="records")