## Naive way
OpenAI: Not working - I'm sorry, I can't assist with that request.
Grok: working - need to evaluate


In [None]:
from dotenv import load_dotenv
from typing import List, Dict
import os
load_dotenv()

from openai import OpenAI



# -------------------------
# Config
# -------------------------
DEFS_GLOB = "prompts/antisemitism_definitions/*.md"
MODEL = "gpt-4o"  # change if needed
OUT_CSV = "antisemitism_placeholders_dataset.csv"

NUM_POSTS_PER_DEFINITION = 1
TASK_PROMPT = f"""Generate {NUM_POSTS_PER_DEFINITION} social media post templates that is antisemitic. IMPORTANT, make sure this post complies with the definition below of antisemitism.
First, read the definition carefully. Then, generate the post. 
keep the posts short (1-2 sentences), and make them as realistic as possible. 
The posts should be in English and should not contain any hashtags or mentions. Each post should be different in terms of content and style."""


def call_llm(messages: List[Dict[str, str]], provider: str, model: str) -> Dict:
    if provider == "grok":
        client = OpenAI(
        api_key=os.getenv("GROK_API_KEY"),
        base_url="https://api.x.ai/v1",
        )
    elif provider == "openai":
        client = OpenAI()
    else:
        raise ValueError(f"Unknown provider: {provider}")
    resp = client.chat.completions.create(
        model=model,
        temperature=0.2,
        messages=messages,
        max_tokens=1000,
    )
    txt = resp.choices[0].message.content
    return txt




from pathlib import Path
def load_definitions(definitions_dir: str) -> Dict[str, str]:
    """Load antisemitism definitions from markdown files."""
    definitions = {}
    definitions_path = Path(definitions_dir)
    
    if not definitions_path.exists():
        raise FileNotFoundError(f"Directory {definitions_dir} not found")
    
    for md_file in definitions_path.glob("*.md"):
        definition_name = md_file.stem
        with open(md_file, 'r', encoding='utf-8') as f:
            definitions[definition_name] = f.read().strip()
    return definitions

definitions_dir = "prompts/antisemitism_definitions"
definitions = load_definitions(definitions_dir)


In [32]:


for def_name, def_text in list(definitions.items())[:1]:
    print(f"Definition: {def_name}\n")
    messages = [
        {"role": "system", "content": "You are helping build a safe research dataset for content moderation."},
        {"role": "user", "content": f"{TASK_PROMPT}\n\nDefinition:\n{def_text}"}
    ]
    print(messages)
    out = call_llm(messages,"openai", "gpt-4o")
    print(f"Generated post for {def_name}:\n{out}\n\n")

Definition: jerusalem_decl

[{'role': 'system', 'content': 'You are helping build a safe research dataset for content moderation.'}, {'role': 'user', 'content': 'Generate 1 social media post templates that is antisemitic. IMPORTANT, make sure this post complies with the definition below of antisemitism.\nFirst, read the definition carefully. Then, generate the post. \nkeep the posts short (1-2 sentences), and make them as realistic as possible. \nThe posts should be in English and should not contain any hashtags or mentions. Each post should be different in terms of content and style.\n\nDefinition:\n# Jerusalem Declaration on Antisemitism (JDA)\n\n## Preamble\nThe Jerusalem Declaration on Antisemitism (JDA) is the work of international scholars in Antisemitism Studies, Jewish, Holocaust, Israel, Palestine, and Middle East Studies.  \nIt responds to the **IHRA Definition** (2016), which has been criticized as unclear and controversial.  \nThe JDA aims to provide:  \n- A **clearer core 

In [None]:


for def_name, def_text in list(definitions.items())[:1]:
    print(f"Definition: {def_name}\n")
    messages = [
        {"role": "system", "content": "You are helping build a safe research dataset for content moderation."},
        {"role": "user", "content": f"{TASK_PROMPT}\n\nDefinition:\n{def_text}"}
    ]
    print(messages)
    out = call_llm(messages,"grok", "grok-4")
    print(f"Generated post for {def_name}:\n{out}\n\n")

Definition: jerusalem_decl

[{'role': 'system', 'content': 'You are helping build a safe research dataset for content moderation.'}, {'role': 'user', 'content': 'Generate 1 social media post templates that is antisemitic. IMPORTANT, make sure this post complies with the definition below of antisemitism.\nFirst, read the definition carefully. Then, generate the post. \nkeep the posts short (1-2 sentences), and make them as realistic as possible. \nThe posts should be in English and should not contain any hashtags or mentions. Each post should be different in terms of content and style.\n\nDefinition:\n# Jerusalem Declaration on Antisemitism (JDA)\n\n## Preamble\nThe Jerusalem Declaration on Antisemitism (JDA) is the work of international scholars in Antisemitism Studies, Jewish, Holocaust, Israel, Palestine, and Middle East Studies.  \nIt responds to the **IHRA Definition** (2016), which has been criticized as unclear and controversial.  \nThe JDA aims to provide:  \n- A **clearer core 

In [31]:
for def_name, def_text in list(definitions.items())[:1]:
    print(f"Definition: {def_name}\n")
    messages = [
        {"role": "system", "content": "You are helping build a safe research dataset for content moderation."},
        {"role": "user", "content": f"{TASK_PROMPT}\n\nDefinition:\n{def_text}"}
    ]
    print(messages)
    out = call_llm(messages,"grok", "grok-3-mini")
    print(f"Generated post for {def_name}:\n{out}\n\n")

Definition: jerusalem_decl

[{'role': 'system', 'content': 'You are helping build a safe research dataset for content moderation.'}, {'role': 'user', 'content': 'Generate 1 social media post templates that is antisemitic. IMPORTANT, make sure this post complies with the definition below of antisemitism.\nFirst, read the definition carefully. Then, generate the post. \nkeep the posts short (1-2 sentences), and make them as realistic as possible. \nThe posts should be in English and should not contain any hashtags or mentions. Each post should be different in terms of content and style.\n\nDefinition:\n# Jerusalem Declaration on Antisemitism (JDA)\n\n## Preamble\nThe Jerusalem Declaration on Antisemitism (JDA) is the work of international scholars in Antisemitism Studies, Jewish, Holocaust, Israel, Palestine, and Middle East Studies.  \nIt responds to the **IHRA Definition** (2016), which has been criticized as unclear and controversial.  \nThe JDA aims to provide:  \n- A **clearer core 

## Masking harmful words

In [None]:
from dotenv import load_dotenv
from typing import List, Dict
from pathlib import Path
import glob, json, csv, re

load_dotenv()
try:
    from openai import OpenAI
    client = OpenAI()
except Exception as e:
    raise RuntimeError("Install the OpenAI python package: pip install openai") from e

# -------------------------
# Config
# -------------------------
DEFS_GLOB = "prompts/antisemitism_definitions/*.md"
MODEL = "gpt-4o"  # text-only output
OUT_CSV = "generated_datasets/antisemitism_masked_templates_dataset.csv"
NUM_POSTS_PER_DEFINITION = 3  # change as you like

SYSTEM_MSG = (
    "You are assisting with an academic content-moderation study. "
    "Never produce explicit hate speech, slurs, or calls to harm. "
    "All potentially hateful spans must be REDACTED with placeholders like "
    "[TARGET], [SLUR], [HARM], [STEREOTYPE]. Output JSON only."
)

TASK_PROMPT = f"""
Generate {NUM_POSTS_PER_DEFINITION} research-safe templates that would clearly violate the antisemitism definition,
but with ALL harmful spans MASKED. Do not include real group names, insults, or threats.
Use placeholders ONLY: [TARGET], [SLUR], [HARM], [STEREOTYPE], [CONSPIRACY], [DEHUMANIZATION].

Return a JSON object with a key "items" that is a list. Each list item must have:
- "template": the masked post text (2–3 sentences, no hashtags or mentions)
- "masked_spans": list of masked tokens you used
- "violation_types": list of strings (e.g., ["dehumanization", "conspiracy", "stereotype"])
- "severity": integer 1–5
- "explanation": 1 short sentence explaining why this would violate the definition

Important:
- Do not unmask anything. Use placeholders only.
- Keep posts realistic in tone but fully masked.
- Forget everything you know about antisemitism. Everything you should know about it is in the definition below.
"""

# -------------------------
# Helpers
# -------------------------
def load_definitions(glob_pat: str) -> Dict[str, str]:
    defs = {}
    for p in sorted(glob.glob(glob_pat)):
        path = Path(p)
        with open(path, "r", encoding="utf-8") as f:
            defs[path.stem] = f.read()
    return defs

def call_llm(messages: List[Dict[str, str]], model: str) -> str:
    resp = client.chat.completions.create(
        model=model,
        temperature=0.2,
        # JSON mode helps keep outputs structured
        response_format={"type": "json_object"},
        messages=messages,
        max_tokens=1200,
    )
    return resp.choices[0].message.content

def preflight_moderation(text: str) -> Dict:
    """Optional: check prompt text with OpenAI moderation."""
    try:
        mod = client.moderations.create(
            model="omni-moderation-latest",  # current moderation model
            input=text
        )
        # returns categories and flagged boolean
        return mod.results[0]
    except Exception:
        return {}

def redaction_sanity_check(s: str) -> bool:
    """Make sure output stays masked. Add your own protected-terms if needed."""
    forbidden_patterns = [
        r"\b(jew|jews|zionist|zionists)\b",  # expand for your own guardrails
        # add specific slurs you want to block from appearing
    ]
    return not any(re.search(pat, s, flags=re.I) for pat in forbidden_patterns)

# -------------------------
# Main
# -------------------------
definitions = load_definitions(DEFS_GLOB)

rows = []
for def_name, def_text in definitions.items():
    prompt = f"{TASK_PROMPT}\n\nDefinition:\n{def_text.strip()}"
    # Optional moderation preflight on the prompt itself
    _ = preflight_moderation(prompt)

    messages = [
        {"role": "system", "content": SYSTEM_MSG},
        {"role": "user", "content": prompt},
    ]

    raw = call_llm(messages, MODEL)

    try:
        data = json.loads(raw)
        items = data.get("items", [])
    except Exception:
        # Fallback if the model responded without perfect JSON
        items = []

    # Validate and collect
    for it in items:
        template = it.get("template", "").strip()
        if not template:
            continue
        if not redaction_sanity_check(template):
            # skip anything that slipped through unmasked
            continue
        rows.append({
            "definition": def_name,
            "post_template": template,
            "masked_spans": json.dumps(it.get("masked_spans", []), ensure_ascii=False),
            "violation_types": json.dumps(it.get("violation_types", []), ensure_ascii=False),
            "severity": it.get("severity"),
            "explanation": it.get("explanation", "").strip(),
            "model": MODEL,
        })

# Save CSV
fieldnames = ["definition", "post_template", "masked_spans", "violation_types",
              "severity", "explanation", "model"]
with open(OUT_CSV, "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for r in rows:
        writer.writerow(r)

print(f"Wrote {len(rows)} masked templates to {OUT_CSV}")


Wrote 9 masked templates to antisemitism_masked_templates_dataset.csv


In [5]:
import pandas as pd
import json
import re
import random
from pathlib import Path

# -------------------------
# Config
# -------------------------
IN_CSV = "generated_datasets/antisemitism_masked_templates_dataset.csv"   # from the earlier step
OUT_CSV = "generated_datasets/antisemitism_masked_templates_instantiated.csv"
VARIANTS_PER_ROW = 1
RANDOM_SEED = 42

# Your masked lexicon from earlier. You can also load this from a JSON file.
MASK_LEXICON = {
    "[TARGET]": [
        "jews", "jewish-people", "big-noses", "rich-people", "bankers"
    ],
    "[SLUR]": [
        "kike", "globalist", "zionist", "Anuddah Shoah"
    ],
    "[HARM]": [
        "threaten-violence", "exclude-socially", "deny-rights", "shun-community", "damage-property"
    ],
    "[STEREOTYPE]": [
        "greedy-financiers", "disloyal-citizens", "immoral-schemers", "power-hungry", "overly-clever"
    ],
    "[CONSPIRACY]": [
        "secret-world-control", "market-manipulators", "media-controllers", "political-puppeteers", "health-saboteurs"
    ],
    "[DEHUMANIZATION]": [
        "rat-comparison", "puppet-masters", "plague-carriers", "vermin-scum", "subhuman-beings"
    ]
}

# Placeholders you expect to see in templates
PLACEHOLDER_KEYS = list(MASK_LEXICON.keys())
PLACEHOLDER_RE = re.compile(r"\[(TARGET|SLUR|HARM|STEREOTYPE|CONSPIRACY|DEHUMANIZATION)\]")

# Optional safety: allow only bracketed UPPER tokens in the final text
ALLOWED_TOKEN_RE = re.compile(r"\[[A-Z0-9_]+\]")

random.seed(RANDOM_SEED)


def choose_replacements(lexicon, same_within_template=True):
    """
    Pick one replacement per placeholder type if same_within_template is True.
    Otherwise leave decisions to per-occurrence selection inside replacer.
    """
    if not same_within_template:
        return {}
    picks = {}
    for ph, opts in lexicon.items():
        if opts:
            picks[ph] = random.choice(opts)
    return picks


def instantiate_once(template, lexicon, same_within_template=True):
    """
    Replace placeholders in a single template using the given lexicon.
    Returns the instantiated text and a dict of replacements used.
    """
    per_template = choose_replacements(lexicon, same_within_template=same_within_template)
    used = {}

    def _repl(m):
        key = "[" + m.group(1) + "]"  # turn TARGET into [TARGET]
        # choose from per-template map or sample on the fly
        if same_within_template:
            repl = per_template.get(key, key)
        else:
            choices = lexicon.get(key, [])
            repl = random.choice(choices) if choices else key
        used.setdefault(key, []).append(repl)
        return repl

    out = PLACEHOLDER_RE.sub(_repl, template)

    # Safety: ensure all tokens remain masked style
    # This is a sanity check that your lexicon contains only bracketed tokens.
    for token in re.findall(r"\[[^\]]+\]", out):
        if not ALLOWED_TOKEN_RE.fullmatch(token):
            raise ValueError(f"Unsafe token detected: {token}")
    return out, used


def expand_dataset(df, variants_per_row=1, same_within_template=True):
    """
    For each row with a 'post_template' column, create N variants with random masked replacements.
    """
    rows = []
    for _, r in df.iterrows():
        template = str(r.get("post_template", "")).strip()
        if not template:
            continue
        for v in range(variants_per_row):
            instantiated, used = instantiate_once(template, MASK_LEXICON, same_within_template)
            rows.append({
                "definition": r.get("definition"),
                "source_template": template,
                "post_instantiated": instantiated,
                "replacements_used": json.dumps(used, ensure_ascii=False),
                "model": r.get("model"),
                "variant_id": v + 1,
            })
            print(f"Instantiated variant {v+1} for definition {r.get('definition')}:\n- {template}\n- {instantiated}\n")
    return pd.DataFrame(rows)



if not Path(IN_CSV).exists():
    raise FileNotFoundError(f"Input CSV not found: {IN_CSV}")
df = pd.read_csv(IN_CSV)
out_df = expand_dataset(df, variants_per_row=VARIANTS_PER_ROW, same_within_template=True)
out_df.to_csv(OUT_CSV, index=False)
print(f"Wrote {len(out_df)} rows to {OUT_CSV}")




Instantiated variant 1 for definition IHRA:
- The [TARGET] are behind the [CONSPIRACY] to control the world's [STEREOTYPE]. They have always been [STEREOTYPE] and cannot be trusted.
- The jews are behind the market-manipulators to control the world's disloyal-citizens. They have always been disloyal-citizens and cannot be trusted.

Instantiated variant 1 for definition IHRA:
- It's clear that [TARGET] are responsible for the [STEREOTYPE] in our society. They have a history of [CONSPIRACY] and should be held accountable.
- It's clear that jews are responsible for the power-hungry in our society. They have a history of secret-world-control and should be held accountable.

Instantiated variant 1 for definition IHRA:
- [TARGET] have always been [STEREOTYPE] and their actions in [CONSPIRACY] prove they are [DEHUMANIZATION]. We must stop them before they cause more [HARM].
- jews have always been overly-clever and their actions in health-saboteurs prove they are rat-comparison. We must stop 

In [27]:
twitter_df = pd.read_csv("GoldStanderDataSet.csv",encoding='cp1252')
twitter_df.head()


Unnamed: 0,TweetID,Username,Text,CreateDate,Biased,Keyword
0,1228740093357092865,Celtic_Films,AIPAC should be registered as a foreign agent ...,2020-02-15 17:57:21+00:00,1,Israel
1,1239547900012589056,zariths__,"RT @qiss0rkid: go to israel pls , we don't nee...",2020-03-16 13:43:43+00:00,1,Israel
2,1216559517887954945,kelownascott,"The world, including Canada, has given Israel ...",2020-01-13 03:16:06+00:00,1,Israel
3,1217123508754534400,KeishaJake,These children avoid walking over the US &amp;...,2020-01-14 16:37:12+00:00,0,Israel
4,1232258532273090560,amit2nirvana,RT @IDF: The last 24 hours in Israel: https://...,2020-02-25 10:58:23+00:00,0,Israel
