Prompts taken from work by Antonio Formato

See: https://github.com/format81/GenAI-STIX2.1-Generator/blob/main/GenAI_Stix2_1_Generator.ipynb

Scrapes a web page, and feeds the data into TrustGraph prompts.

The prompts are configured into TrustGraph's prompt-manager.  The prompt configuration I used is in prompt-configuraiton.txt.  This should replace all of the prompt configuration in docker-compose.yaml.

# Install

In [1]:
!pip install trustgraph-base
!pip install bs4
!pip install pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# URL to use, same as notebook above
url = "https://cloud.google.com/blog/topics/threat-intelligence/untangling-iran-apt42-operations/"

# Web-scraper

Just copied from Antonio Formato's notebook

In [3]:
# Web scraper
import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_text(url):
    # Add user-agent to avoid issue when scrapping most website
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}

    # Send a GET request to the URL
    response = requests.get(url, headers=headers)

    # If the GET request is successful, the status code will be 200
    if response.status_code == 200:
        # Get the content of the response
        page_content = response.content
        # Create a BeautifulSoup object and specify the parser
        soup = BeautifulSoup(page_content, "html.parser")
        # Get the text of the soup object
        text = soup.get_text()
        # Return the text
        return text
    else:
        return "Failed to scrape the website"


text = scrape_text(url)

In [4]:
print(text[:1000] + "...")

Uncharmed: Untangling Iran's APT42 Operations | Google Cloud BlogJump to ContentCloudBlogContact sales Get started for free CloudBlogSolutions & technologyAI & Machine LearningAPI ManagementApplication DevelopmentApplication ModernizationChrome EnterpriseComputeContainers & KubernetesData AnalyticsDatabasesDevOps & SREMaps & GeospatialSecuritySecurity & IdentityThreat IntelligenceInfrastructureInfrastructure ModernizationNetworkingProductivity & CollaborationSAP on Google CloudStorage & Data TransferSustainabilityEcosystemIT LeadersIndustriesFinancial ServicesHealthcare & Life SciencesManufacturingMedia & EntertainmentPublic SectorRetailSupply ChainTelecommunicationsPartnersStartups & SMBTraining & CertificationsInside Google CloudGoogle Cloud Next & EventsGoogle Maps PlatformGoogle WorkspaceDevelopers & PractitionersTransform with Google CloudContact sales Get started for free Threat IntelligenceUncharmed: Untangling Iran's APT42 OperationsMay 1, 2024Mandiant Written by: Ofir Rozmann,

# Initialise API

Assumes TrustGraph is running on same machine as Jupyter notebook.  Also STIX special-purpose prompts need to be created as described above.

In [5]:
import trustgraph.api as tg

api = tg.Api()

In [6]:
import json

## Extract STIX Domain Objects

Just calls the prompt engine

In [7]:
sdo = api.prompt(
    "stix-sdo",
    {
        "text": text
    }
)

In [8]:
stix_sdo = json.dumps(sdo, indent=4)
print(stix_sdo)

[
    {
        "type": "intrusion-set",
        "spec_version": "2.1",
        "id": "intrusion-set--34c5172b-f793-4a39-942a-928862051178",
        "created": "2024-05-01T00:00:00Z",
        "modified": "2024-05-01T00:00:00Z",
        "name": "APT42",
        "description": "APT42, an Iranian state-sponsored cyber espionage actor, is known for its extensive credential harvesting operations and cloud-based intrusions. The group frequently targets Western and Middle Eastern NGOs, media, academia, legal services, and activists. APT42 leverages social engineering, custom malware like NICECURL and TAMECAT, and built-in cloud features to achieve its objectives.",
        "aliases": [
            "CALANQUE",
            "Charming Kitten",
            "Mint Sandstorm",
            "Phosphorus",
            "TA453",
            "Yellow Garuda",
            "ITG18"
        ],
        "primary_motivation": "espionage",
        "secondary_motivations": [
            "collection"
        ]
    },


## STIX Cyber-observable Objects

Just calls the prompt engine

In [9]:
sco = api.prompt(
    "stix-sco",
    {
        "text": text
    }
)

In [10]:
stix_sco = json.dumps(sco, indent=4)
print(stix_sco)

[
    {
        "type": "domain-name",
        "spec_version": "2.1",
        "id": "domain-name--09966076-857b-4726-8008-87e793a92c75",
        "value": "washinqtonpost.press"
    },
    {
        "type": "domain-name",
        "spec_version": "2.1",
        "id": "domain-name--92c84bca-4c93-4a83-811b-1f8a520958a6",
        "value": "ksview.top"
    },
    {
        "type": "domain-name",
        "spec_version": "2.1",
        "id": "domain-name--e59930a5-2035-4137-8a21-6f8832133422",
        "value": "honest-halcyon-fresher.buzz"
    },
    {
        "type": "domain-name",
        "spec_version": "2.1",
        "id": "domain-name--f0848c9a-8025-4a27-8145-502f2a40f252",
        "value": "sites.google.com"
    },
    {
        "type": "domain-name",
        "spec_version": "2.1",
        "id": "domain-name--8a740a38-080f-4a23-a748-80a8f2a08a74",
        "value": "n9.cl"
    },
    {
        "type": "domain-name",
        "spec_version": "2.1",
        "id": "domain-name--2389f238-2389-

# STIX 2.1 Relationship Object

Just calls the prompt engine.  The SDO and SCO data are also passed in.

In [11]:
sro = api.prompt(
    "stix-sro",
    {
        "text": text,
        "stix_sdo": stix_sdo,
        "stix_sco": stix_sco,
    }
)

In [12]:
stix_sro = json.dumps(sro, indent=4)
print(stix_sro)

[
    {
        "type": "relationship",
        "spec_version": "2.1",
        "id": "relationship--38294a77-765b-4633-b525-32f712a39a78",
        "created": "2024-05-01T00:00:00Z",
        "modified": "2024-05-01T00:00:00Z",
        "relationship_type": "uses",
        "source_ref": "intrusion-set--34c5172b-f793-4a39-942a-928862051178",
        "target_ref": "malware--a641e44f-b5b8-4901-a054-420f278f7442"
    },
    {
        "type": "relationship",
        "spec_version": "2.1",
        "id": "relationship--92886205-1178-4a39-942a-34c5172bf793",
        "created": "2024-05-01T00:00:00Z",
        "modified": "2024-05-01T00:00:00Z",
        "relationship_type": "uses",
        "source_ref": "intrusion-set--34c5172b-f793-4a39-942a-928862051178",
        "target_ref": "malware--8b275078-520a-4e78-a769-609814f276f8"
    },
    {
        "type": "relationship",
        "spec_version": "2.1",
        "id": "relationship--558832f7-12a3-4633-9425-9a78c37745f8",
        "created": "2024-05-01T