# Analytic Scheme of Maneuver Generator

This notebook uses analytic plans to generate an analytic scheme of maneuver.

## Background

As described in TC 3-12.2.4.1, "The Analytic Scheme of Maneuver is the plan to collect and analyze technical data to meet specific information requirements. It identifies what data to analyze, how to analyze it, and why it is being analyzed." The analytic scheme of maneuver, or ASOM, consists of the following components:

* **Priority information requirement**:
* **Indicator**:
* **Evidence**:
* **Data**:
* **NAI**:
* **Analytic**:



## Environment Setup

This section sets up the environment. It installs packages necessary to generate the analytic plans, imports modules, initializes helper functions, and finally defines global variables. This section also mounts Google Drive to the runtime and moves into the project folder.

### Install Packages

In [None]:
!pip install -U -q "google" 1> /dev/null

### Import Modules

In [None]:
from google.colab import userdata
from google.colab import drive
import json
import os
import datetime
import pandas as pd
import re
from collections import defaultdict

### Initialize Helper Functions

The first function, `log`, logs a message to the console prepended with the current timestamp in the ISO8601 format.

In [None]:
def log(message, end="\n", flush = True):
    """
    Logs a message to the console, prepended with the current timestamp
    in ISO 8601 format.

    Args:
    message (str): The string message to log.
    """

    # Access the global flag controlling verbosity
    global verbose

    # Get the current date and time
    current_time = datetime.datetime.now()

    # Format the timestamp in ISO 8601 format
    timestamp = current_time.isoformat()

    # Construct the final log string using an f-string for clean formatting
    log_string = f"[{timestamp}] {message}"

    # Print the log string to the console if logging is turned on (verbose = True)
    if (verbose == True):
        print(log_string, end = end, flush = flush)

The second function, `build_asom`, accepts a series of MITRE ATT&CK techniques as input and returns a collection of analytic plans that correspond to those techniques.

In [None]:
def build_asom(attack_chain):
    """
    Builds a list of JSON objects (asom) by processing technique files
    based on the provided attack_chain.

    Args:
        attack_chain (dict): A dictionary where keys are MITRE ATT&CK tactics
                             and values are lists of techniques.

    Returns:
        list: A list of JSON objects (asom) that match the criteria.
    """
    asom = []
    current_directory = os.getcwd() # Or specify a directory if not current

    for tactic, techniques in attack_chain.items():
        for technique in techniques:
            # Sanitize technique name to create a valid filename
            # Technique name might be "Txxxx - Technique Name"
            # Filename should be "Txxxx - Technique Name.json"
            file_name = f"{technique}.json"
            file_path = os.path.join(current_directory, file_name)

            if os.path.exists(file_path):
                try:
                    with open(file_path, 'r') as f:
                        technique_data_list = json.load(f)

                    # Ensure technique_data_list is a list
                    if not isinstance(technique_data_list, list):
                        print(f"Warning: Content of {file_name} is not a list. Skipping.")
                        continue

                    for item in technique_data_list:
                        # Ensure item is a dictionary and has at least one key
                        if not isinstance(item, dict) or not item:
                            print(f"Warning: Invalid item format in {file_name}. Skipping item: {item}")
                            continue

                        # The PIR is the first key in the item dictionary
                        pir_key = next(iter(item)) # Gets the first key

                        # Check if the parent tactic is in the PIR key
                        # The tactic in attack_chain is like "TA001 - Initial Access"
                        # We should check if "TA001" is in the pir_key
                        tactic_id = tactic.split(" - ")[0] # Extracts "TA001"
                        if tactic_id in pir_key:
                            asom.append(item)
                except json.JSONDecodeError:
                    print(f"Error decoding JSON from file: {file_name}")
                except Exception as e:
                    print(f"An error occurred while processing {file_name}: {e}")
            else:
                print(f"File not found for technique: {file_name}")
    return asom

### Define Global Variables

In [None]:
# Toggle logging on (verbose = True)/off (verbose = False)
verbose = True
# verbose = False

In [None]:
# Rate limits: https://ai.google.dev/gemini-api/docs/rate-limits
# Pricing: https://ai.google.dev/gemini-api/docs/pricing
# Usage: https://console.cloud.google.com/apis/api/generativelanguage.googleapis.com/metrics?project=gen-lang-client-0497172401
# Note that this notebook is designed to be run in Google Colab. The line below reads the Gemini API key for AI Studio,
# which is configured in the Secrets tab on the left side of the Colab window.
os.environ["GEMINI_API_KEY"] = userdata.get("GOOGLE_API_KEY")
log("Gemii API key loaded.")

[2025-05-08T17:56:52.094818] Gemii API key loaded.


### Mount Google Drive

In [None]:
# Mount Google Drive and move into the Google AI Studio folder
DRIVE_PATH = "/content/drive"
TECHNIQUES_PATH = "/content/drive/MyDrive/Google AI Studio/techniques"

drive.mount(DRIVE_PATH)
log(f"Google Drive mounted to {DRIVE_PATH}")

os.chdir(TECHNIQUES_PATH)
log(f"Changed directory to {TECHNIQUES_PATH}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
[2025-05-08T17:56:52.609538] Google Drive mounted to /content/drive
[2025-05-08T17:56:52.611136] Changed directory to /content/drive/MyDrive/Google AI Studio/techniques


## Test Analytic Scheme of Maneuver Generation

This section generates a small analytic scheme of maneuver off of a test attack chain data set.

In [None]:
# Example attack_chain data structure
attack_chain_data = {
    "TA0001 - Initial Access": [
        "T1190 - Exploit Public-Facing Application",
        "T1566 - Phishing",
        "T1078 - Valid Accounts"
    ],
    "TA0004 - Privilege Escalation": [
        "T1078 - Valid Accounts"
    ]
}

log("Building ASOM...")
resulting_asom = build_asom(attack_chain_data)
log("Finished.")

[2025-05-08T17:56:52.618387] Building ASOM...
[2025-05-08T17:56:52.634378] Finished.


In [None]:
def format_asom(asom_input_list):
    """
    Formats the ASOM data into a Pandas DataFrame with hierarchical indexing,
    including a column for the evidence description text.

    Args:
        asom_input_list (list): A list of JSON-like objects (dictionaries)
                                 as produced by the build_asom function.

    Returns:
        pandas.DataFrame: A DataFrame with the formatted data.
    """
    table_data = []
    pir_overall_index = 0

    for pir_object in asom_input_list:
        if not pir_object or not isinstance(pir_object, dict):
            print(f"Warning: Skipping invalid PIR object: {pir_object}")
            continue

        pir_overall_index += 1
        # Get the first key (PIR question)
        try:
            pir_text_key = next(iter(pir_object))
        except StopIteration:
            print(f"Warning: Skipping empty PIR object: {pir_object}")
            continue

        pir_content = pir_object[pir_text_key]

        if not isinstance(pir_content, dict):
            print(f"Warning: Skipping PIR object with invalid content for PIR key '{pir_text_key}'")
            continue

        indicators_section = pir_content.get("Indicators", {})
        technique_sub_index = 0

        if not isinstance(indicators_section, dict):
            print(f"Warning: 'Indicators' section is not a dictionary for PIR '{pir_text_key}'. Skipping indicators for this PIR.")
            # If indicators_section is empty or not a dict, no technique/evidence rows will be generated for this PIR.
            # If a row for the PIR itself is desired in such cases, additional logic would be needed here.
            if not indicators_section and pir_text_key: # Log if it's just empty but PIR exists
                print(f"Info: PIR '{pir_text_key}' has no 'Indicators' section or it's empty.")

        for technique_name, evidences_dict in indicators_section.items():
            technique_sub_index += 1
            current_technique_index_str = f"{pir_overall_index}.{technique_sub_index}"

            if not isinstance(evidences_dict, dict):
                print(f"Warning: Expected a dictionary of evidences for technique '{technique_name}' in PIR '{pir_text_key}', but got {type(evidences_dict)}. Skipping this technique's evidences.")
                # Optionally, create a row for the technique with blank evidence details
                # For now, this technique will not produce evidence rows.
                continue

            evidence_sub_index = 0
            if not evidences_dict: # If a technique has no evidence entries
                 print(f"Info: Technique '{technique_name}' has no evidence entries under PIR '{pir_text_key}'.")
                 # If a row for the technique itself is desired, add here.
                 # Example:
                 # table_data.append({
                 #    "PIR Index": pir_overall_index, "PIR": pir_text_key,
                 #    "Technique Index": current_technique_index_str, "Technique": technique_name,
                 #    "Evidence Index": pd.NA, "Evidence": "No specific evidence listed",
                 #    "Data": pd.NA, "Data Platform": pd.NA, "NAI": pd.NA, "Action": pd.NA
                 # })
                 pass # No evidence rows to add if not creating a placeholder

            for evidence_description_text, evidence_details in evidences_dict.items():
                evidence_sub_index += 1
                current_evidence_index_str = f"{current_technique_index_str}.{evidence_sub_index}"

                if not isinstance(evidence_details, dict):
                    print(f"Warning: Evidence details for '{evidence_description_text}' under technique '{technique_name}' is not a dictionary. Creating row with limited info.")
                    row = {
                        "PIR Index": pir_overall_index,
                        "PIR": pir_text_key,
                        "Tactic ID": pir_text_key.split(" - ")[0].split("(")[1],
                        "Indicator Index": current_technique_index_str,
                        "Indicator": technique_name,
                        "Technique ID": technique_name.split(" - ")[0],
                        "Evidence Index": current_evidence_index_str,
                        "Evidence": evidence_description_text + " (Error: Malformed details)",
                        "Data": pd.NA, # Using pandas NA for missing data
                        "Data Platform": pd.NA,
                        "NAI": pd.NA,
                        "Action": pd.NA
                    }
                else:
                    row = {
                        "PIR Index": pir_overall_index,
                        "PIR": pir_text_key,
                        "Tactic ID": pir_text_key.split(" - ")[0].split("(")[1],
                        "Indicator Index": current_technique_index_str,
                        "Indicator": technique_name,
                        "Technique ID": technique_name.split(" - ")[0],
                        "Evidence Index": current_evidence_index_str,
                        "Evidence": evidence_description_text, # This is the added column
                        "Data": evidence_details.get("Data", ""),
                        "Data Platform": evidence_details.get("Data Platform", ""),
                        "NAI": evidence_details.get("NAI", ""),
                        "Action": evidence_details.get("Action", "")
                    }
                table_data.append(row)

    df = pd.DataFrame(table_data)

    # Define the column order for the DataFrame
    column_order = [
        "PIR Index", "PIR", "Tactic ID", "Indicator Index", "Indicator", "Technique ID",
        "Evidence Index", "Evidence", "Data", "Data Platform", "NAI", "Action"
    ]

    # Ensure all columns are present in the DataFrame, adding them with NA if they are missing
    # This is important if table_data was empty, or if some rows missed keys.
    for col in column_order:
        if col not in df.columns:
            df[col] = pd.NA

    return df[column_order]

In [None]:
formatted_df = format_asom(resulting_asom)
# To display the full content of cells if they are long
with pd.option_context('display.max_rows', None,
                        'display.max_columns', None,
                        'display.width', 1000,
                        'display.max_colwidth', None):
    display(formatted_df)

Unnamed: 0,PIR Index,PIR,Tactic ID,Indicator Index,Indicator,Technique ID,Evidence Index,Evidence,Data,Data Platform,NAI,Action
0,1,Has the adversary gained initial access to the network via exploiting a public-facing application? (TA0001 - Initial Access),TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.1,"Error messages or unusual entries in application/web server logs indicative of attempted exploitation (e.g., SQL injection errors, command execution attempts, path traversal).",Zeek http.log,Network devices,Insert site-specific NAI here,"Monitor Zeek http.log for unusual request patterns, error codes (e.g., 4xx, 5xx in unexpected volume/context), or suspicious strings in URI/headers/body (e.g., SQL keywords, command execution syntax). Use frequency analysis on request parameters or paths to identify anomalies."
1,1,Has the adversary gained initial access to the network via exploiting a public-facing application? (TA0001 - Initial Access),TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.2,"Alerts from Network Intrusion Detection Systems (NIDS) or Web Application Firewalls (WAF) flagging exploit patterns (e.g., known CVE exploits, injection attempts, unusual request structures).",Zeek signatures.log; Zeek conn.log,Network devices,Insert site-specific NAI here,Analyze Zeek signatures.log for alerts triggered by suspicious network traffic patterns targeting public-facing services. Correlate alerts with Zeek conn.log for associated sessions. Use correlation analysis to link alerts across multiple connections or source IPs.
2,1,Has the adversary gained initial access to the network via exploiting a public-facing application? (TA0001 - Initial Access),TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.3,Unusual process creation or network connections originating from a public-facing server immediately following suspicious external network traffic targeting that server.,Windows Event ID 4688; Windows Event ID 5156; Zeek conn.log; Zeek http.log,"Servers, Network devices",Insert site-specific NAI here,"Correlate Zeek logs (conn.log for connections to public services, http.log for web requests) showing suspicious activity (e.g., high request volume from a single IP, connections to unusual ports, potential exploit patterns) targeting a public-facing server with Windows Event IDs 4688 and 5156 on that server within a narrow time window. Look for processes spawned by web server processes (e.g., w3wp.exe, apache2) or unexpected outbound connections (5156) originating from the server. Use time series analysis to identify unusual spikes in host activity correlated with spikes in suspicious incoming network traffic."
3,1,Has the adversary gained initial access to the network via exploiting a public-facing application? (TA0001 - Initial Access),TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.4,"Anomalous traffic patterns or protocol violations targeting public-facing non-HTTP services (e.g., SMB, SSH, RDP, database ports) not typically exposed or used externally.",Zeek conn.log; Windows Event ID 4624; Windows Event ID 4625,"Network devices, Servers",Insert site-specific NAI here,"Filter Zeek conn.log for connections from external IPs targeting public-facing systems on ports associated with services not intended for external exposure (e.g., SMB 445, SSH 22, RDP 3389, common database ports like 1433, 3306). Identify source IPs initiating connections. Correlate with Windows Event IDs 4624/4625 on target servers if authentication attempts occur. Use frequency analysis or outlier detection on connection attempts by source IP or target port to flag unusual scanning or targeted attacks against these services. Analyze connection duration using descriptive statistics."
4,2,Has the adversary gained initial access to the network via phishing? (TA0001 - Initial Access),TA0001,2.1,T1566 - Phishing,T1566,2.1.1,Email traffic containing links to known malicious sites or attachments detected as potentially malicious.,"Zeek smtp.log, Zeek http.log, Zeek conn.log",Network devices,Insert site-specific NAI here,Analyze Zeek smtp.log for emails containing URLs. Extract URLs and compare against threat intelligence feeds of known malicious sites. Analyze Zeek http.log/conn.log for connections originating from internal hosts to external IPs matching threat intel or associated with suspicious URLs found in emails. Use frequency analysis of URLs or entropy calculations on URL strings to identify unusual link patterns.
5,2,Has the adversary gained initial access to the network via phishing? (TA0001 - Initial Access),TA0001,2.1,T1566 - Phishing,T1566,2.1.2,"Emails with spoofed sender addresses or exhibiting anomalous header patterns (e.g., missing DKIM/SPF, unexpected mail routes).",Zeek smtp.log,Network devices,Insert site-specific NAI here,"Analyze Zeek smtp.log for email headers. Extract 'From', 'Return-Path', 'Received' headers. Check SPF and DKIM status if available in logs. Identify emails where the 'From' domain does not align with the 'Return-Path' or originating server, or where SPF/DKIM fail. Calculate sender domain entropy or use statistical analysis on sender behavior to flag anomalies."
6,3,Has the adversary gained initial access using valid accounts? (TA0001 - Initial Access),TA0001,3.1,T1078 - Valid Accounts,T1078,3.1.1,Successful login to an internal resource using a valid account from an unusual geographic location or new/rare source IP address.,"Windows Event ID 4624, Zeek conn.log","Servers, Network devices",Insert site-specific NAI here,Collect successful login events (WinEvent 4624) and network connections (Zeek conn.log). Extract source IP and destination resource. Geolocate source IPs. Identify logins from new or rare IPs/countries using frequency analysis or clustering of historical login locations. Investigate logins from flagged locations.
7,3,Has the adversary gained initial access using valid accounts? (TA0001 - Initial Access),TA0001,3.1,T1078 - Valid Accounts,T1078,3.1.2,"Successful login to external services (VPN, OWA, RDP) using valid credentials from an untrusted source.","Windows Event ID 4624, Zeek conn.log","Servers, Network devices",Insert site-specific NAI here,"Monitor successful logins to externally facing services (WinEvent 4624 on access gateways, Zeek conn.log for sessions). Identify source IP addresses. Compare source IPs against threat intelligence feeds and lists of known proxies/TOR exit nodes. Calculate frequency distribution of source IPs per service and flag rare or new sources."
8,4,Has the adversary achieved privilege escalation using valid accounts? (TA0004 - Privilege Escalation),TA0004,4.1,T1078 - Valid Accounts,T1078,4.1.1,Successful login or attempted login by a standard user account to a sensitive server or high-privilege system.,"Windows Event ID 4624, Windows Event ID 4688","Servers, Endpoints",Insert site-specific NAI here,Monitor successful login events (WinEvent 4624) on critical servers or systems designated as high-privilege targets. Filter for login attempts by accounts that do not belong to authorized administrative groups for that system. Correlate with process creation events (WinEvent 4688) initiated by these accounts on sensitive systems. Use correlation analysis to link unauthorized logins to subsequent suspicious activity.
9,4,Has the adversary achieved privilege escalation using valid accounts? (TA0004 - Privilege Escalation),TA0004,4.1,T1078 - Valid Accounts,T1078,4.1.2,Multiple distinct valid accounts logged into the same system simultaneously or in quick succession.,"Windows Event ID 4624, Windows Event ID 4648","Servers, Endpoints",Insert site-specific NAI here,"Analyze successful login events (WinEvent 4624) on endpoints and servers. Group events by target system and time window (e.g., 5 minutes). Count the number of distinct user accounts logging into the same system within the time window. Flag systems with an unusually high number of distinct simultaneous or near-simultaneous logins (e.g., using outlier detection based on historical login counts per system)."


In [None]:
# Export the full ASOM to an Excel file
formatted_df.to_excel("test_asom_full.xlsx")

In [None]:
def get_representative_pirs_by_tactic_simplified(df_with_pirs: pd.DataFrame) -> dict:
    """
    Groups PIRs by MITRE Tactic and generates a representative PIR string for each group
    using a simplified templating logic.

    Args:
        df_with_pirs (pd.DataFrame): DataFrame containing a 'PIR' column with the
                                     PIR strings.

    Returns:
        dict: A dictionary mapping each Tactic ID (str) to its representative
              PIR string (str).
    """
    if 'PIR' not in df_with_pirs.columns:
        print("Error: DataFrame must contain a 'PIR' column.")
        return {}

    df = df_with_pirs.copy()

    def extract_tactic_details_from_pir(pir_string: str):
        """
        Extracts Tactic ID, Tactic Name, and the full original signature from a PIR string.
        """
        if pd.isna(pir_string):
            return None, None, None  # tactic_id, tactic_name, original_signature

        # Regex to capture "(TAXXXX - Tactic Name)"
        match_full = re.search(r"\((TA\d{4})\s*-\s*([^)]+?)\)", pir_string)
        if match_full:
            tactic_id = match_full.group(1)      # e.g., TA0001
            tactic_name = match_full.group(2).strip()  # e.g., Initial Access
            original_signature = match_full.group(0) # e.g., (TA0001 - Initial Access)
            return tactic_id, tactic_name, original_signature

        # Fallback for patterns like "(TAXXXX)" if name is missing in signature
        match_simple = re.search(r"\((TA\d{4})\)", pir_string)
        if match_simple:
            tactic_id = match_simple.group(1)
            original_signature = match_simple.group(0)
            # Tactic name is not present in this signature type
            return tactic_id, None, original_signature

        return None, None, None

    # Apply extraction to get Tactic ID, Tactic Name, and original signature
    extracted_info = df['PIR'].apply(
        lambda x: pd.Series(extract_tactic_details_from_pir(x),
                            index=['Tactic_ID_Extracted', 'Tactic_Name_Extracted', 'Original_Signature'])
    )
    df = pd.concat([df, extracted_info], axis=1)

    representative_pirs_map = {}

    # Group by extracted Tactic ID
    # Consider only rows where Tactic_ID_Extracted is not NaN
    valid_tactic_groups = df[df['Tactic_ID_Extracted'].notna()].groupby('Tactic_ID_Extracted')

    for tactic_id, group in valid_tactic_groups:
        unique_pirs = group['PIR'].unique().tolist()

        # Attempt to get a consistent Tactic Name for the group
        # Prioritize non-null Tactic Names if there's variation (shouldn't be if signature is consistent)
        tactic_names_in_group = group['Tactic_Name_Extracted'].dropna().unique()
        tactic_name_for_template = tactic_names_in_group[0] if len(tactic_names_in_group) > 0 else None

        if len(unique_pirs) == 1:
            representative_pirs_map[tactic_id] = unique_pirs[0]
        else:  # More than one unique PIR for this Tactic ID
            if tactic_name_for_template: # Check if we have a tactic name for the template
                lowercase_tactic_name = tactic_name_for_template.lower()
                # Construct the standard signature part for the template
                standard_signature_for_template = f"({tactic_id} - {tactic_name_for_template})"
                templated_pir = f"Did the adversary conduct {lowercase_tactic_name}? {standard_signature_for_template}"
                representative_pirs_map[tactic_id] = templated_pir
            else:
                # Fallback if Tactic Name is not available for the template, use the first unique PIR
                representative_pirs_map[tactic_id] = unique_pirs[0]
                print(f"Warning: Tactic Name not found for Tactic ID {tactic_id} with multiple PIRs. Using original PIR as representative.")

    return representative_pirs_map

In [None]:
representative_pirs_map = get_representative_pirs_by_tactic_simplified(formatted_df)

In [None]:
def update_pirs(original_df, representative_pirs_map):
    """
    Replaces PIR strings in a DataFrame with their generalized versions.

    Args:
        original_df (pd.DataFrame): The DataFrame with an 'PIR' column
                                     containing original PIR strings.
        representative_pirs_map (dict): A dictionary mapping Tactic IDs (e.g., 'TA0001')
                                        to their generalized PIR strings.

    Returns:
        final_df (pd.DataFrame): The DataFrame with updated PIRs.
    """
    if 'PIR' not in original_df.columns:
        print("Error: DataFrame must contain an 'PIR' column.")
        return original_df, None

    final_df = original_df.copy()

    # Helper function to extract Tactic ID from a PIR string
    def extract_tactic_id(pir_string):
        if pd.isna(pir_string):
            return None
        # This regex aims to find (TAXXXX) within the string
        match = re.search(r"\((TA\d{4}).*?\)", pir_string)
        if match:
            return match.group(1) # Returns TAXXXX
        return None

    # 1. Extract Tactic ID from original PIRs to use for mapping
    final_df['Tactic_ID_for_mapping'] = final_df['PIR'].apply(extract_tactic_id)

    # 2. Replace the 'PIR' column with the generalized PIRs
    # The .map() function looks up each 'Tactic_ID_for_mapping' in the representative_pirs_map
    # .fillna(final_df['PIR']) ensures that if a Tactic ID doesn't have a corresponding
    # generalized PIR in the map, the original PIR string is retained.
    final_df['PIR'] = final_df['Tactic_ID_for_mapping'].map(representative_pirs_map).fillna(final_df['PIR'])

    # Clean up the temporary mapping column
    final_df = final_df.drop(columns=['Tactic_ID_for_mapping'])

    return final_df


In [None]:
# Perform the update and grouping
updated_df = update_pirs(formatted_df, representative_pirs_map)

In [None]:
def update_sequential_pir_index(input_df: pd.DataFrame) -> pd.DataFrame:
    """
    Updates the 'PIR Index' column in the DataFrame to sequentially number
    groups of adjacent identical strings in the 'PIR' column.

    The DataFrame should already have its 'PIR' column populated with the
    strings that need to be grouped (e.g., generalized PIRs).
    Any existing 'PIR Index' column will be overwritten.

    Args:
        input_df (pd.DataFrame): The DataFrame with a 'PIR' column.

    Returns:
        pd.DataFrame: The DataFrame with the 'PIR Index' column updated.
    """
    if 'PIR' not in input_df.columns:
        print("Error: DataFrame must contain a 'PIR' column to generate the new PIR Index.")
        # Return a copy or original df if PIR column is missing
        return input_df.copy() if isinstance(input_df, pd.DataFrame) else input_df


    df = input_df.copy()

    # Ensure consistent handling of NaNs if they exist in PIR column,
    # though ideally PIRs are strings. For this logic, direct comparison works.
    # If NaNs need specific grouping (e.g., all NaNs are one group, or each is distinct),
    # df['PIR'] = df['PIR'].fillna('__NAN_PLACEHOLDER__') # might be one strategy before comparison.
    # However, the current logic (pir_series != pir_series.shift()) handles NaNs by
    # typically starting a new group for/after a NaN, as NaN != NaN is True.

    # Identify rows where the 'PIR' value changes from the previous row.
    # The first row will always mark a change (as shift() produces NaN).
    pir_value_changed = (df['PIR'] != df['PIR'].shift())

    # The cumsum() of this boolean series creates a unique sequential ID for each
    # block of adjacent identical PIRs. True becomes 1, False becomes 0 in sum.
    # This will naturally start the 'PIR Index' from 1.
    df['PIR Index'] = pir_value_changed.cumsum()

    return df

In [None]:
def update_hierarchical_indicator_index(input_df: pd.DataFrame,
                                        pir_index_col: str = 'PIR Index',
                                        indicator_text_col: str = 'Indicator', # Updated default
                                        output_indicator_index_col: str = 'Indicator Index') -> pd.DataFrame:
    """
    Updates or creates a hierarchical 'Indicator Index' column in the DataFrame.
    The index is in the format 'PIR Index.Sub-Index'. The Sub-Index restarts
    for each new 'PIR Index' group and increments for blocks of adjacent
    identical values in the 'indicator_text_col' within that PIR Index group.

    Args:
        input_df (pd.DataFrame): The input DataFrame.
        pir_index_col (str): Name of the column containing the main PIR group index.
        indicator_text_col (str): Name of the column containing the indicator text
                                  strings (e.g., "T1190 - Exploit Public-Facing Application")
                                  to be grouped for sub-indexing. Default is 'Indicator'.
        output_indicator_index_col (str): Name of the column where the new
                                          hierarchical indicator index will be stored.
                                          Default is 'Indicator Index'.

    Returns:
        pd.DataFrame: The DataFrame with the 'Indicator Index' column updated or created.
    """
    df = input_df.copy()

    if pir_index_col not in df.columns:
        print(f"Error: DataFrame must contain the PIR group index column '{pir_index_col}'.")
        return df
    if indicator_text_col not in df.columns:
        print(f"Error: DataFrame must contain the indicator text column '{indicator_text_col}'.")
        return df

    # --- Data Cleaning & Preparation for 'indicator_text_col' ---
    # This is crucial if strings that should group together have subtle differences
    # (e.g., extra whitespace, case differences if they should be ignored).
    processed_indicator_text_col = '_processed_' + indicator_text_col

    # Convert to string and strip whitespace as a basic cleaning step.
    # NaNs become "nan", "None" becomes "None". If specific NaN handling is needed,
    # it should be done here (e.g., fillna with a placeholder).
    df[processed_indicator_text_col] = df[indicator_text_col].astype(str).str.strip()
    # Example for case-insensitive grouping (optional):
    # df[processed_indicator_text_col] = df[indicator_text_col].astype(str).str.strip().str.lower()


    # Helper function to calculate the sub-index within each PIR Index group
    def calculate_sub_index_within_group(indicator_series: pd.Series) -> pd.Series:
        """
        For a series of indicator texts within a single PIR Index group,
        this calculates a sub-index (1, 2, 3...) for blocks of identical indicators.
        """
        indicator_changed_in_group = (indicator_series != indicator_series.shift())
        return indicator_changed_in_group.cumsum()

    temp_sub_index_col = '_temp_indicator_sub_index'

    # Apply sub-index calculation to each group formed by 'PIR Index',
    # using the (potentially processed) 'indicator_text_col'.
    df[temp_sub_index_col] = df.groupby(pir_index_col)[processed_indicator_text_col].transform(calculate_sub_index_within_group)

    # Construct the final hierarchical 'Indicator Index' string
    df[output_indicator_index_col] = df[pir_index_col].astype(str) + '.' + df[temp_sub_index_col].astype(str)

    # Drop temporary columns
    df = df.drop(columns=[temp_sub_index_col, processed_indicator_text_col])

    return df

In [None]:
def update_final_evidence_index(input_df: pd.DataFrame,
                                indicator_index_col: str = 'Indicator Index',
                                output_evidence_index_col: str = 'Evidence Index') -> pd.DataFrame:
    """
    Updates or creates the 'Evidence Index' column in the DataFrame.
    The index is hierarchical, in the format 'Indicator Index.n'
    (e.g., "1.1.1", "1.1.2"), where 'n' is the 1-based sequential
    position of the evidence row within its parent 'Indicator Index' group.

    Args:
        input_df (pd.DataFrame): The input DataFrame which must contain the
                                 column specified by `indicator_index_col`.
                                 Each row is assumed to be a distinct piece of evidence.
        indicator_index_col (str): Name of the column containing the hierarchical
                                   indicator index (e.g., a column with values
                                   like "1.1", "1.2" from the previous step).
        output_evidence_index_col (str): Name of the column where the new,
                                         most granular evidence index will be stored.

    Returns:
        pd.DataFrame: The DataFrame with the 'Evidence Index' column updated or created.
    """
    df = input_df.copy()

    if indicator_index_col not in df.columns:
        print(f"Error: DataFrame must contain the '{indicator_index_col}' column to generate Evidence Index.")
        return df

    # Calculate 'n', the sequential position of each evidence item (row)
    # within its group defined by the 'indicator_index_col'.
    # pandas.core.groupby.GroupBy.cumcount() is 0-based, so we add 1 for a 1-based sequence.
    # This assumes each row at this stage is a unique piece of evidence to be numbered.
    df['evidence_sequence_n'] = df.groupby(indicator_index_col).cumcount() + 1

    # Construct the final 'Evidence Index' string by appending '.n'
    # to the existing 'Indicator Index' string.
    df[output_evidence_index_col] = df[indicator_index_col].astype(str) + '.' + df['evidence_sequence_n'].astype(str)

    # Drop the temporary sequence helper column
    df = df.drop(columns=['evidence_sequence_n'])

    return df

In [None]:
# Now that we have re-written the PIR names, update the PIR indexes .
updated_df = update_sequential_pir_index(updated_df)

Unnamed: 0,PIR Index,PIR,Tactic ID,Indicator Index,Indicator,Technique ID,Evidence Index,Evidence,Data,Data Platform,NAI,Action
0,1,Did the adversary conduct initial access? (TA0...,TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.1,Error messages or unusual entries in applicati...,Zeek http.log,Network devices,Insert site-specific NAI here,Monitor Zeek http.log for unusual request patt...
1,1,Did the adversary conduct initial access? (TA0...,TA0001,1.2,T1190 - Exploit Public-Facing Application,T1190,1.1.2,Alerts from Network Intrusion Detection System...,Zeek signatures.log; Zeek conn.log,Network devices,Insert site-specific NAI here,Analyze Zeek signatures.log for alerts trigger...
2,1,Did the adversary conduct initial access? (TA0...,TA0001,1.3,T1190 - Exploit Public-Facing Application,T1190,1.1.3,Unusual process creation or network connection...,Windows Event ID 4688; Windows Event ID 5156; ...,"Servers, Network devices",Insert site-specific NAI here,Correlate Zeek logs (conn.log for connections ...
3,1,Did the adversary conduct initial access? (TA0...,TA0001,1.4,T1190 - Exploit Public-Facing Application,T1190,1.1.4,Anomalous traffic patterns or protocol violati...,Zeek conn.log; Windows Event ID 4624; Windows ...,"Network devices, Servers",Insert site-specific NAI here,Filter Zeek conn.log for connections from exte...
4,1,Did the adversary conduct initial access? (TA0...,TA0001,1.5,T1566 - Phishing,T1566,2.1.1,Email traffic containing links to known malici...,"Zeek smtp.log, Zeek http.log, Zeek conn.log",Network devices,Insert site-specific NAI here,Analyze Zeek smtp.log for emails containing UR...
5,1,Did the adversary conduct initial access? (TA0...,TA0001,1.6,T1566 - Phishing,T1566,2.1.2,Emails with spoofed sender addresses or exhibi...,Zeek smtp.log,Network devices,Insert site-specific NAI here,Analyze Zeek smtp.log for email headers. Extra...
6,1,Did the adversary conduct initial access? (TA0...,TA0001,1.7,T1078 - Valid Accounts,T1078,3.1.1,Successful login to an internal resource using...,"Windows Event ID 4624, Zeek conn.log","Servers, Network devices",Insert site-specific NAI here,Collect successful login events (WinEvent 4624...
7,1,Did the adversary conduct initial access? (TA0...,TA0001,1.8,T1078 - Valid Accounts,T1078,3.1.2,"Successful login to external services (VPN, OW...","Windows Event ID 4624, Zeek conn.log","Servers, Network devices",Insert site-specific NAI here,Monitor successful logins to externally facing...
8,2,Has the adversary achieved privilege escalatio...,TA0004,2.1,T1078 - Valid Accounts,T1078,4.1.1,Successful login or attempted login by a stand...,"Windows Event ID 4624, Windows Event ID 4688","Servers, Endpoints",Insert site-specific NAI here,Monitor successful login events (WinEvent 4624...
9,2,Has the adversary achieved privilege escalatio...,TA0004,2.2,T1078 - Valid Accounts,T1078,4.1.2,Multiple distinct valid accounts logged into t...,"Windows Event ID 4624, Windows Event ID 4648","Servers, Endpoints",Insert site-specific NAI here,Analyze successful login events (WinEvent 4624...


In [None]:
# Next, update the indicator index based on the new PIR indexes.
updated_df = update_hierarchical_indicator_index(updated_df)

Unnamed: 0,PIR Index,PIR,Tactic ID,Indicator Index,Indicator,Technique ID,Evidence Index,Evidence,Data,Data Platform,NAI,Action
0,1,Did the adversary conduct initial access? (TA0...,TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.1,Error messages or unusual entries in applicati...,Zeek http.log,Network devices,Insert site-specific NAI here,Monitor Zeek http.log for unusual request patt...
1,1,Did the adversary conduct initial access? (TA0...,TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.2,Alerts from Network Intrusion Detection System...,Zeek signatures.log; Zeek conn.log,Network devices,Insert site-specific NAI here,Analyze Zeek signatures.log for alerts trigger...
2,1,Did the adversary conduct initial access? (TA0...,TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.3,Unusual process creation or network connection...,Windows Event ID 4688; Windows Event ID 5156; ...,"Servers, Network devices",Insert site-specific NAI here,Correlate Zeek logs (conn.log for connections ...
3,1,Did the adversary conduct initial access? (TA0...,TA0001,1.1,T1190 - Exploit Public-Facing Application,T1190,1.1.4,Anomalous traffic patterns or protocol violati...,Zeek conn.log; Windows Event ID 4624; Windows ...,"Network devices, Servers",Insert site-specific NAI here,Filter Zeek conn.log for connections from exte...
4,1,Did the adversary conduct initial access? (TA0...,TA0001,1.2,T1566 - Phishing,T1566,2.1.1,Email traffic containing links to known malici...,"Zeek smtp.log, Zeek http.log, Zeek conn.log",Network devices,Insert site-specific NAI here,Analyze Zeek smtp.log for emails containing UR...
5,1,Did the adversary conduct initial access? (TA0...,TA0001,1.2,T1566 - Phishing,T1566,2.1.2,Emails with spoofed sender addresses or exhibi...,Zeek smtp.log,Network devices,Insert site-specific NAI here,Analyze Zeek smtp.log for email headers. Extra...
6,1,Did the adversary conduct initial access? (TA0...,TA0001,1.3,T1078 - Valid Accounts,T1078,3.1.1,Successful login to an internal resource using...,"Windows Event ID 4624, Zeek conn.log","Servers, Network devices",Insert site-specific NAI here,Collect successful login events (WinEvent 4624...
7,1,Did the adversary conduct initial access? (TA0...,TA0001,1.3,T1078 - Valid Accounts,T1078,3.1.2,"Successful login to external services (VPN, OW...","Windows Event ID 4624, Zeek conn.log","Servers, Network devices",Insert site-specific NAI here,Monitor successful logins to externally facing...
8,2,Has the adversary achieved privilege escalatio...,TA0004,2.1,T1078 - Valid Accounts,T1078,4.1.1,Successful login or attempted login by a stand...,"Windows Event ID 4624, Windows Event ID 4688","Servers, Endpoints",Insert site-specific NAI here,Monitor successful login events (WinEvent 4624...
9,2,Has the adversary achieved privilege escalatio...,TA0004,2.1,T1078 - Valid Accounts,T1078,4.1.2,Multiple distinct valid accounts logged into t...,"Windows Event ID 4624, Windows Event ID 4648","Servers, Endpoints",Insert site-specific NAI here,Analyze successful login events (WinEvent 4624...


In [None]:
# Finally, update the evidence index based on the new PIR and indicator indexes.
updated_df = update_final_evidence_index(updated_df, indicator_index_col = "Indicator Index", output_evidence_index_col = "Evidence Index")
# display(updated_df)

In [None]:
def create_visually_spanned_df(input_df: pd.DataFrame) -> pd.DataFrame:
    """
    Transforms a DataFrame by setting a MultiIndex on specified hierarchical
    columns to achieve a visual row-spanning effect when displayed.

    The "merging" of cells is a visual effect provided by Pandas' MultiIndex
    display. The underlying DataFrame data remains in a 2D structure.

    Args:
        input_df (pd.DataFrame): The input DataFrame. Expected to have columns
                                 like 'PIR Index', 'PIR', 'Tactic ID',
                                 'Technique Index', 'Technique', 'Technique ID',
                                 followed by more granular data columns.

    Returns:
        pd.DataFrame: A new DataFrame ('final_df') with a MultiIndex set on the
                      hierarchical columns, sorted for proper visual spanning.
    """
    df = input_df.copy()

    # Define the columns that will form the hierarchical MultiIndex.
    # The order of these columns is crucial as it defines the levels of the hierarchy
    # and how the visual spanning will appear.
    hierarchical_cols = [
        'PIR Index',
        'PIR',
        'Indicator Index',
        'Indicator',
    ]

    # Verify that these columns exist in the input DataFrame.
    # If any are missing, they will be excluded from the MultiIndex.
    existing_hierarchical_cols = [col for col in hierarchical_cols if col in df.columns]

    if not existing_hierarchical_cols:
        print("Warning: None of the specified hierarchical columns for indexing "
              "were found in the DataFrame. Returning the original DataFrame.")
        return df

    # The remaining columns will be the data columns associated with the
    # most granular level of the hierarchy.
    # data_cols = [col for col in df.columns if col not in existing_hierarchical_cols] # Not strictly needed for set_index

    # For the visual spanning to work correctly, the DataFrame MUST be sorted
    # by the columns that will form the MultiIndex, in the specified order.
    # This ensures that identical values are adjacent before set_index is called.
    df = df.sort_values(by=existing_hierarchical_cols)

    # Set the MultiIndex.
    # The columns listed in existing_hierarchical_cols will be moved from
    # the DataFrame's columns to its index.
    final_df = df.set_index(existing_hierarchical_cols)

    # While sort_values before set_index is the primary sorting,
    # sorting the index itself can ensure canonical order if needed,
    # though usually redundant if sorted before.
    # final_df = final_df.sort_index()

    return final_df

final_df (with MultiIndex for visual spanning):


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Evidence Index,Evidence,Data,NAI,Action
PIR Index,PIR,Indicator Index,Indicator,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.1,T1190 - Exploit Public-Facing Application,1.1.1,Error messages or unusual ...,Zeek http.log,Insert site-specific NAI here,Monitor Zeek http.log for ...
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.1,T1190 - Exploit Public-Facing Application,1.1.2,Alerts from Network Intrus...,Zeek signatures.log; Zeek ...,Insert site-specific NAI here,Analyze Zeek signatures.lo...
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.1,T1190 - Exploit Public-Facing Application,1.1.3,Unusual process creation o...,Windows Event ID 4688; Win...,Insert site-specific NAI here,Correlate Zeek logs (conn....
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.1,T1190 - Exploit Public-Facing Application,1.1.4,Anomalous traffic patterns...,Zeek conn.log; Windows Eve...,Insert site-specific NAI here,Filter Zeek conn.log for c...
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.2,T1566 - Phishing,1.2.1,Email traffic containing l...,"Zeek smtp.log, Zeek http.l...",Insert site-specific NAI here,Analyze Zeek smtp.log for ...
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.2,T1566 - Phishing,1.2.2,Emails with spoofed sender...,Zeek smtp.log,Insert site-specific NAI here,Analyze Zeek smtp.log for ...
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.3,T1078 - Valid Accounts,1.3.1,Successful login to an int...,"Windows Event ID 4624, Zee...",Insert site-specific NAI here,Collect successful login e...
1,Did the adversary conduct initial access? (TA0001 - Initial Access),1.3,T1078 - Valid Accounts,1.3.2,Successful login to extern...,"Windows Event ID 4624, Zee...",Insert site-specific NAI here,Monitor successful logins ...
2,Has the adversary achieved privilege escalation using valid accounts? (TA0004 - Privilege Escalation),2.1,T1078 - Valid Accounts,2.1.1,Successful login or attemp...,"Windows Event ID 4624, Win...",Insert site-specific NAI here,Monitor successful login e...
2,Has the adversary achieved privilege escalation using valid accounts? (TA0004 - Privilege Escalation),2.1,T1078 - Valid Accounts,2.1.2,Multiple distinct valid ac...,"Windows Event ID 4624, Win...",Insert site-specific NAI here,Analyze successful login e...


In [None]:
# Now we can perform the final transformation on the ASOM, merging like cells to
# produce the finished ASOM.
final_df = create_visually_spanned_df(updated_df)

with pd.option_context('display.max_rows', None, # Show all rows to see full effect
                        'display.width', 1200,    # Adjust width as needed
                        'display.max_colwidth', 30,
                        'display.expand_frame_repr', False # Prevent wrapping if too wide for console
                        ):
    # display(final_df)
    # Here we will display only a specific subset of the columns, rather than the full set.
    display(final_df.filter(items=["PIR Index", "PIR", "Indicator Index", "Indicator ", "Evidence Index", "Evidence", "Data", "NAI", "Action"]))

In [None]:
# Export the finished ASOM to an Excel file
final_df.to_excel("test_asom_merged.xlsx")

## Example Analytic Scheme of Maneuver Generation

This section generates an example analytic scheme of maneuver for a real threat actor based on information from Crowdstrike's threat intelligence platform.

In [None]:
def generate_attack_chain_data(actor_data_list):
    """
    Transforms a list of actor data objects into a nested dictionary
    organizing techniques under tactics.

    Args:
        actor_data_list (list): A list of dictionaries, where each dictionary
                                represents an entry with tactic and technique info.

    Returns:
        dict: A dictionary where keys are formatted tactic strings
              (e.g., "TA0001 - Initial Access") and values are lists of
              formatted technique strings (e.g., "T1078 - Default Accounts").
    """
    attack_chain_data = defaultdict(list)

    if not isinstance(actor_data_list, list):
        print("Error: Input actor_data must be a list.")
        return {}

    for item in actor_data_list:
        if not isinstance(item, dict):
            print(f"Warning: Skipping non-dictionary item in actor_data_list: {item}")
            continue

        tactic_id = item.get('tactic_id')
        tactic_name = item.get('tactic_name')
        technique_id_full = item.get('technique_id')
        technique_name = item.get('technique_name')

        # Ensure all necessary fields are present
        if not all([tactic_id, tactic_name, technique_id_full, technique_name]):
            print(f"Warning: Skipping item due to missing essential fields (tactic_id, tactic_name, technique_id, technique_name): {item}")
            continue

        # Format the Tactic Key: "TAXXXX - Tactic Name"
        try:
            formatted_tactic_id = str(tactic_id).upper()
            tactic_key = f"{formatted_tactic_id} - {str(tactic_name)}"
        except Exception as e:
            print(f"Warning: Could not format tactic key for item {item}. Error: {e}")
            continue

        # Format the Technique Value: "TXXXX - Technique Name"
        try:
            formatted_technique_id_full = str(technique_id_full).upper()
            technique_value = f"{formatted_technique_id_full} - {str(technique_name)}"
        except Exception as e:
            print(f"Warning: Could not format technique value for item {item}. Error: {e}")
            continue

        # Add the technique to the list for the corresponding tactic, avoiding duplicates
        if technique_value not in attack_chain_data[tactic_key]:
            attack_chain_data[tactic_key].append(technique_value)

    # Convert defaultdict to a regular dict for the final output (optional, but common)
    return dict(attack_chain_data)

This is an export of the ATT&CK data from Crowdstrike's threat intelligence platform. Crowdstrike supplies this in a format for MITRE ATT&CK Navigator overlays, but we can repurpose it to our use case here by processing it with `generate_attack_chain_data`, defined above.

In [None]:
actor_data = [
  {
    "id": "ta0001_t1078.001",
    "tactic_id": "ta0001",
    "tactic_name": "Initial Access",
    "technique_id": "t1078.001",
    "technique_name": "Default Accounts",
    "observables": [
      "Access to victims via compromised accounts"
    ]
  },
  {
    "id": "ta0001_t1133",
    "tactic_id": "ta0001",
    "tactic_name": "Initial Access",
    "technique_id": "t1133",
    "technique_name": "External Remote Services",
    "observables": [
      "Use of VPN services"
    ]
  },
  {
    "id": "ta0001_t1189",
    "tactic_id": "ta0001",
    "tactic_name": "Initial Access",
    "technique_id": "t1189",
    "technique_name": "Drive-by Compromise",
    "reports": [
      "CSMR-20006"
    ],
    "observables": [
      "Unspecified use of drive-by compromise"
    ]
  },
  {
    "id": "ta0001_t1190",
    "tactic_id": "ta0001",
    "tactic_name": "Initial Access",
    "technique_id": "t1190",
    "technique_name": "Exploit Public-Facing Application",
    "observables": [
      "Exploitation of unidentified public-facing applications"
    ]
  },
  {
    "id": "ta0001_t1566.001",
    "tactic_id": "ta0001",
    "tactic_name": "Initial Access",
    "technique_id": "t1566.001",
    "technique_name": "Spearphishing Attachment",
    "observables": [
      "KRYPTONITE PANDA has reportedly used spear phishing messages to deliver weaponized document files"
    ]
  },
  {
    "id": "ta0001_t1566.002",
    "tactic_id": "ta0001",
    "tactic_name": "Initial Access",
    "technique_id": "t1566.002",
    "technique_name": "Spearphishing Link",
    "observables": [
      "KRYPTONITE PANDA may use malicious hyperlinks to weaponized document files "
    ]
  },
  {
    "id": "ta0002_t1047",
    "tactic_id": "ta0002",
    "tactic_name": "Execution",
    "technique_id": "t1047",
    "technique_name": "Windows Management Instrumentation",
    "observables": [
      "KRYPTONITE PANDA has used WMI based scripts to deploy GreenCrash RAT. The WMI script mof.txt configures a persistence mechanism using an event-consumer binding. The event is a timer set to five minute intervals, and the consumer is a JavaScript function that will execute the command regsvr32 /s %systemroot%\\conhost.dll"
    ]
  },
  {
    "id": "ta0002_t1059",
    "tactic_id": "ta0002",
    "tactic_name": "Execution",
    "technique_id": "t1059",
    "technique_name": "Command and Scripting Interpreter",
    "observables": [
      "KRYPTONITE PANDA GreenCrash RAT provides remote shell functionality either by:\ncopying cmd.exe to %TEMP%\\svchost.exe and using this to execute received commands - or - copying cmd.exe to %WINDIR%\\Temp\\system and executing it as remote shell\nKRYPTONITE PANDA has used command line interaction to execute GreenCrash RAT loader executables"
    ]
  },
  {
    "id": "ta0002_t1059.001",
    "tactic_id": "ta0002",
    "tactic_name": "Execution",
    "technique_id": "t1059.001",
    "technique_name": "PowerShell",
    "reports": [
      "CSA-17205",
      "CSA-200976",
      "CSMR-20008",
      "CSWR-19046"
    ],
    "observables": [
      "\"KRYPTONITE PANDA has used VBA macros contained within .dotm files to load the DADSTACHE and DADJOKE implants. Malicious macros have Decoded base64-encoded URLs hosting a second-stage downloaders and used the URLDownloadToFileA function to retrieve externally hosted content. Retrieved content has been written to the %APPDATA%\\Microsoft\\Office\\ directory\nKRYPTONITE PANDA has used WMI based scripts to deploy GreenCrash RAT\nKRYPTONITE PANDA has used Visual basic macros to deploy secondary payloads\nUsed PowerShell"
    ]
  },
  {
    "id": "ta0002_t1106",
    "tactic_id": "ta0002",
    "tactic_name": "Execution",
    "technique_id": "t1106",
    "technique_name": "Native API",
    "observables": [
      "KRYPTONITE PANDA has used the Scripting.FileSystemObject function to initialize a DLL search order hijacking chain"
    ]
  },
  {
    "id": "ta0002_t1203",
    "tactic_id": "ta0002",
    "tactic_name": "Execution",
    "technique_id": "t1203",
    "technique_name": "Exploitation for Client Execution",
    "observables": [
      "KRYPTONITE PANDA has targeted known client software vulnerabilities including CVE-2017-0199, CVE-2017-8759, and CVE-2018-0802"
    ]
  },
  {
    "id": "ta0002_t1204.002",
    "tactic_id": "ta0002",
    "tactic_name": "Execution",
    "technique_id": "t1204.002",
    "technique_name": "Malicious File",
    "observables": [
      "KRYPTONITE PANDA has used weaponized malicious documents which require user interaction in order to deploy additional payloads\nKRYPTONITE PANDA has used weaponized malicious documents with embedded Visual Basic macros"
    ]
  },
  {
    "id": "ta0003_t1053.002",
    "tactic_id": "ta0003",
    "tactic_name": "Persistence",
    "technique_id": "t1053.002",
    "technique_name": "At",
    "observables": [
      "KRYPTONITE PANDA has used at.exe and schtasks.exe to persist GreenCrash RAT on compromised systems"
    ]
  },
  {
    "id": "ta0003_t1053.005",
    "tactic_id": "ta0003",
    "tactic_name": "Persistence",
    "technique_id": "t1053.005",
    "technique_name": "Scheduled Task",
    "observables": [
      "KRYPTONITE PANDA has used scheduleded tasks with the filename AdobeSvc to execute legitimate executables as part of DLL search order hijacking "
    ]
  },
  {
    "id": "ta0003_t1505.003",
    "tactic_id": "ta0003",
    "tactic_name": "Persistence",
    "technique_id": "t1505.003",
    "technique_name": "Web Shell",
    "observables": [
      "KRYPTONITE PANDA has used multiple webshells including China Chopper and Angel Shell on compromised web servers for persistence\nKRYPTONITE PANDA has used a JScript evaluator webshell with the filename app_offline.DISABLED.aspx"
    ]
  },
  {
    "id": "ta0003_t1574.001",
    "tactic_id": "ta0003",
    "tactic_name": "Persistence",
    "technique_id": "t1574.001",
    "technique_name": "DLL Search Order Hijacking"
  },
  {
    "id": "ta0003_t1574.002",
    "tactic_id": "ta0003",
    "tactic_name": "Persistence",
    "technique_id": "t1574.002",
    "technique_name": "DLL Side-Loading",
    "reports": [
      "CSA-250446"
    ],
    "observables": [
      "KRYPTONITE PANDA has used legitimate applications to side-load malicious DLLs"
    ]
  },
  {
    "id": "ta0005_t1027",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1027",
    "technique_name": "Obfuscated Files or Information",
    "observables": [
      "KRYPTONITE PANDA has used AES-128 to encode a DADSTACHE payload\nKRYPTONITE PANDA has used Base64 encoding to obfuscate VBScript loaders\nKRYPTONITE PANDA has obfuscated GreenCrash RAT payloads using XOR\nKRYPTONITE PANDA has used the single-byte XOR key 0xD7 to obfuscate downloader configurations"
    ]
  },
  {
    "id": "ta0005_t1027.002",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1027.002",
    "technique_name": "Software Packing",
    "observables": [
      "KRYPTONITE PANDA has used the .NET packer ConfuserEx to obfuscate .NET binaries"
    ]
  },
  {
    "id": "ta0005_t1027.003",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1027.003",
    "technique_name": "Steganography",
    "observables": [
      "Stored stolen data on GitHub disguised as benign images"
    ]
  },
  {
    "id": "ta0005_t1027.004",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1027.004",
    "technique_name": "Compile After Delivery",
    "observables": [
      "KRYPTONITE PANDA has used Base64 encoded second stage loaders run after retrieval on host"
    ]
  },
  {
    "id": "ta0005_t1036",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1036",
    "technique_name": "Masquerading",
    "observables": [
      "KRYPTONITE PANDA has renamed DADSTACHE DLLs using the filename LogiMail.dll\nKRYPTONITE PANDA has placed DADSTACHE implant files in the %APPDATA%\\Microsoft\\Office\\ directory\nKRYPTONITE PANDA has renamed DADJOKE DLLs using the filename mpsvc.dll\nKRYPTONITE PANDA has placed a GreenCrash loader in the C:\\Temp\\dell\\ directory\nKRYPTONITE PANDA has placed implant files in the c:\\ProgramData\\VMware\\logs directory"
    ]
  },
  {
    "id": "ta0005_t1070.004",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1070.004",
    "technique_name": "File Deletion",
    "observables": [
      "KRYPTONITE PANDA JJDOOR dropper delpys a secondary VBScript which contains functionality to delete installer files"
    ]
  },
  {
    "id": "ta0005_t1070.006",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1070.006",
    "technique_name": "Timestomp",
    "observables": [
      "KRYPTONITE PANDA multitool contains functionality to modify file timestamps"
    ]
  },
  {
    "id": "ta0005_t1140",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1140",
    "technique_name": "Deobfuscate/Decode Files or Information",
    "observables": [
      "KRYPTONITE PANDA has used VBA macros to decode base-64 encoded URLs hosting second stage content\nKRYPTONITE PANDA has used an AES-128 key based on the sample SHA256 hash to decode a DADSTACHE payload\nKRYPTONITE PANDA has used Base64 to decode VBScript loaders\nKRYPTONITE PANDA has  decoded GreenCrash RAT payload files with the XOR key 43 72 CD 1D 01 65 9F 7A D0 47 65 1D 9A 60 3C 5F\nKRYPTONITE PANDA GreenCrash RAT loader uses XXTEA algorithm to decode a configuration file"
    ]
  },
  {
    "id": "ta0005_t1218.005",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1218.005",
    "technique_name": "Mshta",
    "observables": [
      "KRYPTONITE PANDA has used malicious HTA files to load secondary payloads including Cobalt Strike and JJDOOR"
    ]
  },
  {
    "id": "ta0005_t1221",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1221",
    "technique_name": "Template Injection",
    "observables": [
      "KRYPTONITE PANDA has used weaponized Microsoft Office document which take advantage of the .docx hyperlink feature to retrieve and execute additional payloads"
    ]
  },
  {
    "id": "ta0005_t1574.001",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1574.001",
    "technique_name": "DLL Search Order Hijacking",
    "observables": [
      "KRYPTONITE PANDA has used search order hijacking in order to execute DADSTACHE DLL payloads\nKRYPTONITE PANDA has used search order hijacking to execute a custom keylogger via the legitimate executable debug.exe, and the DLL dbgeng.dll"
    ]
  },
  {
    "id": "ta0005_t1620",
    "tactic_id": "ta0005",
    "tactic_name": "Defense Evasion",
    "technique_id": "t1620",
    "technique_name": "Reflective Code Loading",
    "reports": [
      "CSA-250446"
    ],
    "observables": [
      "KRYPTONITE PANDA likely loaded .NET assemblies for in-memory execution"
    ]
  },
  {
    "id": "ta0006_t1003.003",
    "tactic_id": "ta0006",
    "tactic_name": "Credential Access",
    "technique_id": "t1003.003",
    "technique_name": "NTDS",
    "reports": [
      "CSA-250446"
    ],
    "observables": [
      "KRYPTONITE PANDA obtained the AD domain database NTDS.dit to access credential hashes"
    ]
  },
  {
    "id": "ta0006_t1040",
    "tactic_id": "ta0006",
    "tactic_name": "Credential Access",
    "technique_id": "t1040",
    "technique_name": "Network Sniffing",
    "observables": [
      "KRYPTONITE PANDA deployed the publicly available Responder credential sniffing tool, with the filename Intel.exe"
    ]
  },
  {
    "id": "ta0007_t1016",
    "tactic_id": "ta0007",
    "tactic_name": "Discovery",
    "technique_id": "t1016",
    "technique_name": "System Network Configuration Discovery",
    "observables": [
      "KRYPTONITE PANDA GreenCrash RAT contains network enumeration functionality"
    ]
  },
  {
    "id": "ta0007_t1046",
    "tactic_id": "ta0007",
    "tactic_name": "Discovery",
    "technique_id": "t1046",
    "technique_name": "Network Service Discovery",
    "observables": [
      "KRYPTONITE PANDA has used a freely available PHP based port scanning tool\nKRYPTONITE PANDA multitool contains network port scanning functionality"
    ]
  },
  {
    "id": "ta0007_t1082",
    "tactic_id": "ta0007",
    "tactic_name": "Discovery",
    "technique_id": "t1082",
    "technique_name": "System Information Discovery",
    "observables": [
      "KRYPTONITE PANDA JJDOOR captures host enumeration data\nKRYPTONITE PANDA GreenCrash RAT contains host enumeration functionality"
    ]
  },
  {
    "id": "ta0008_t1021.002",
    "tactic_id": "ta0008",
    "tactic_name": "Lateral Movement",
    "technique_id": "t1021.002",
    "technique_name": "SMB/Windows Admin Shares",
    "observables": [
      "KRYPTONITE PANDA multitool contains functionality to administer Windows administrative functions over SMB"
    ]
  },
  {
    "id": "ta0009_t1056.001",
    "tactic_id": "ta0009",
    "tactic_name": "Collection",
    "technique_id": "t1056.001",
    "technique_name": "Keylogging",
    "observables": [
      "KRYPTONITE PANDA deployed a custom keylogger tool with the filename dbgeng.dll "
    ]
  },
  {
    "id": "ta0009_t1074.001",
    "tactic_id": "ta0009",
    "tactic_name": "Collection",
    "technique_id": "t1074.001",
    "technique_name": "Local Data Staging",
    "observables": [
      "Data staged in unspecified locations for data exfil"
    ]
  },
  {
    "id": "ta0009_t1074.002",
    "tactic_id": "ta0009",
    "tactic_name": "Collection",
    "technique_id": "t1074.002",
    "technique_name": "Remote Data Staging",
    "observables": [
      "Data moved from compromised hosts to central host prior to exfil"
    ]
  },
  {
    "id": "ta0009_t1560",
    "tactic_id": "ta0009",
    "tactic_name": "Collection",
    "technique_id": "t1560",
    "technique_name": "Archive Collected Data",
    "observables": [
      "Archived data for exfiltration"
    ]
  },
  {
    "id": "ta0010_t1041",
    "tactic_id": "ta0010",
    "tactic_name": "Exfiltration",
    "technique_id": "t1041",
    "technique_name": "Exfiltration Over C2 Channel",
    "observables": [
      "Unspecified use"
    ]
  },
  {
    "id": "ta0010_t1567.002",
    "tactic_id": "ta0010",
    "tactic_name": "Exfiltration",
    "technique_id": "t1567.002",
    "technique_name": "Exfiltration to Cloud Storage",
    "reports": [
      "CSIT-22252"
    ],
    "observables": [
      "KRYPTONITE PANDA exfiltrated stolen data to GitHub and Dropbox "
    ]
  },
  {
    "id": "ta0011_t1001.003",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1001.003",
    "technique_name": "Protocol Impersonation",
    "observables": [
      "KRYPTONITE PANDA JJDOOR implant uses Base64 to encode content within command and control traffic"
    ]
  },
  {
    "id": "ta0011_t1043",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1043",
    "technique_name": "Commonly Used Port",
    "observables": [
      "KRYPTONITE PANDA Green Rash RAT has used TCP ports 443 and 8080 for command and control communications"
    ]
  },
  {
    "id": "ta0011_t1071.001",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1071.001",
    "technique_name": "Web Protocols",
    "observables": [
      "KRYPTONITE PANDA JJDOOR implant uses HTTP for command and control communications"
    ]
  },
  {
    "id": "ta0011_t1095",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1095",
    "technique_name": "Non-Application Layer Protocol",
    "observables": [
      "KRYPTONITE PANDA GreenCrash RAT uses the Windows Socket Library to send and receive a PLib8-compressed data messages between the RAT and the C2 server."
    ]
  },
  {
    "id": "ta0011_t1102",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1102",
    "technique_name": "Web Service",
    "observables": [
      "KRYPTONITE PANDA has used Simple Object Access Protocol (SOAP) to retrieve additional payloads"
    ]
  },
  {
    "id": "ta0011_t1102.001",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1102.001",
    "technique_name": "Dead Drop Resolver",
    "observables": [
      "KRYPTONITE PANDA has used wordpress services as dead drop resolvers for GreenCrash RAT tasking data"
    ]
  },
  {
    "id": "ta0011_t1105",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1105",
    "technique_name": "Ingress Tool Transfer",
    "observables": [
      "KRYPTONITE PANDA has used the URLDownloadToFileA function to retrieve externally hosted content\nKRYPTONITE PANDA has used Simple Object Access Protocol (SOAP) to retrieve additional payloads\nKRYPTONITE PANDA has used a custom downloader with the filename rcdll.dll to retrieve external payloads\nKRYPTONITE PANDA has used a custom downloader JavaScript downloader with the filename 0.js"
    ]
  },
  {
    "id": "ta0011_t1571",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1571",
    "technique_name": "Non-Standard Port",
    "observables": [
      "KRYPTONITE PANDA Green Rash RAT has used TCP port 8084 for command and control communications"
    ]
  },
  {
    "id": "ta0011_t1572",
    "tactic_id": "ta0011",
    "tactic_name": "Command and Control",
    "technique_id": "t1572",
    "technique_name": "Protocol Tunneling",
    "observables": [
      "Unspecified use"
    ]
  },
  {
    "id": "ta0042_t1583.001",
    "tactic_id": "ta0042",
    "tactic_name": "Resource Development",
    "technique_id": "t1583.001",
    "technique_name": "Domains",
    "observables": [
      "Creation of typosquatting domains",
      "KRYPTONITE PANDA has registered domain names relevant to target scope for use as command-and-control and hosting infrastructure including: \nairbusocean[.]com, teledynegroup[.]com, scsnewstoday[.]com, www.thyssenkrupp-marinesystems[.]org"
    ]
  },
  {
    "id": "ta0042_t1583.002",
    "tactic_id": "ta0042",
    "tactic_name": "Resource Development",
    "technique_id": "t1583.002",
    "technique_name": "DNS Server",
    "observables": [
      "KRYPTONITE PANDA employs multiple DNS name servers as part of infrastructure management including:\nGoDaddy, Namecheap, CloudFlare"
    ]
  },
  {
    "id": "ta0042_t1583.003",
    "tactic_id": "ta0042",
    "tactic_name": "Resource Development",
    "technique_id": "t1583.003",
    "technique_name": "Virtual Private Server",
    "reports": [
      "CSA-250446"
    ],
    "observables": [
      "KRYPTONITE PANDA typically hosts VPS infrastructure at Vultr (AS20473) and Digital Ocean (AS14061)"
    ]
  },
  {
    "id": "ta0042_t1587.003",
    "tactic_id": "ta0042",
    "tactic_name": "Resource Development",
    "technique_id": "t1587.003",
    "technique_name": "Digital Certificates",
    "observables": [
      "KRYPTONITE PANDA has used self signed certificates with the subject and issuer field set to C=US, ST=Texas, L=Austin, O=Development, CN=localhost"
    ]
  },
  {
    "id": "ta0043_t1589.001",
    "tactic_id": "ta0043",
    "tactic_name": "Reconnaissance",
    "technique_id": "t1589.001",
    "technique_name": "Credentials",
    "observables": [
      "Stole credentials"
    ]
  }
]

Using the processed attack chain data, generate an ASOM for KRYPTONITE PANDA. This code exports both  the full ASOM where all rows are distinct, better suited for further machine processing, as well as the finished ASOM where like cells are merged to create the nested structure depicted in TC 3-12.2.4.1.

In [None]:
attack_chain_data = generate_attack_chain_data(actor_data)

log("Building ASOM...")
resulting_asom = build_asom(attack_chain_data)
log("Finished.")

formatted_df = format_asom(resulting_asom)

representative_pirs_map = get_representative_pirs_by_tactic_simplified(formatted_df)

# Perform the update and grouping
updated_df = update_pirs(formatted_df, representative_pirs_map)

updated_df = update_sequential_pir_index(updated_df)
updated_df = update_hierarchical_indicator_index(updated_df)
updated_df = update_final_evidence_index(updated_df, indicator_index_col = "Indicator Index", output_evidence_index_col = "Evidence Index")

# Export the full ASOM
updated_df.to_excel("kryptonite-panda_full.xlsx")

final_df = create_visually_spanned_df(updated_df)

print("final_df (with MultiIndex for visual spanning):")
# Displaying the DataFrame with to_string() often shows the MultiIndex spanning.
# In Jupyter notebooks, just `display(final_df)` or `final_df` would also work well.
with pd.option_context('display.max_rows', None, # Show all rows to see full effect
                        'display.width', 1200,    # Adjust width as needed
                        'display.max_colwidth', 30,
                        'display.expand_frame_repr', False # Prevent wrapping if too wide for console
                        ):
    # display(final_df)
    display(final_df.filter(items=["PIR Index", "PIR", "Indicator Index", "Indicator", "Evidence Index", "Evidence", "Data", "NAI", "Action"]))

In [None]:
# Export the merged ASOM
final_df.to_excel("kryptonite-panda_merged.xlsx")