## Generate Policies
 - Load source NIST JSON file: Downloads NIST_SP-800-53_rev5_catalog.json from S3 bucket to local directory and loads it into memory
 - Create JSONL file: Processes the JSON catalog to extract controls, skips withdrawn controls, substitutes parameters, and writes structured data to NIST_SP-800-53_rev5_catalog.jsonl
 - Upload JSONL file: Uploads the processed JSONL file back to S3 bucket under the controls/json/ prefix
 - Create markdown files: Reads the JSONL file and generates markdown files organized by control family and in a single directory structure
 - Upload markdown files: Recursively uploads all generated markdown files to S3 bucket under the controls/markdown/ prefix, maintaining directory structure

In [1]:
from datetime import datetime
from IPython.display import Markdown, display
from typing import Any, Dict
import boto3
import json
import os
from pathlib import Path
import shutil
import sys

In [2]:
BUCKET = '183023889407-us-east-1-compliance-rule-generator'
S3_PREFIX_POLICY_MARKDOWN = 'policies/markdown/'

FOLDER_HOME: Path = Path('/home/sagemaker-user')
FOLDER_CONTROL_MARKDOWN: Path = FOLDER_HOME / 'data/controls/markdown/by-family/'
FOLDER_POLICY_MARKDOWN: Path = FOLDER_HOME / 'data/policies/markdown/'

REGION = os.getenv("AWS_REGION", "us-east-1")
s3 = boto3.client("s3", region_name=REGION)
bedrock_runtime = boto3.client('bedrock-runtime', region_name=REGION)

# Bedrock Model Configuration
MODELS = {
    'premium': 'us.anthropic.claude-opus-4-5-20251101-v1:0', # not available
    'good': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0', # times out
    'balanced': 'us.anthropic.claude-sonnet-4-20250514-v1:0',
    'fast_cheap': 'us.anthropic.claude-haiku-4-5-20251001-v1:0',
    'aws_native': 'amazon.nova-premier-v1:0'
}
MODEL_ID = MODELS['balanced']
MAX_TOKENS = 4096
TEMPERATURE = 0.7
PROMPT_TEMPLATE = """
<system>
You are a cybersecurity compliance expert who creates concise, machine-parseable policy documents optimized for RAG systems. Your policies focus on clear rules, explicit validation logic, and scenario patterns that enable LLMs to accurately evaluate compliance scenarios.
</system>

<task>
Generate a concise, RAG-optimized policy document based on the **provided NIST SP 800-53 control**. Focus ONLY on the essential components needed for automated scenario evaluation. The policy will be part of a knowledgebase used by LLMs to determine whether complex scenarios involving multiple policies are compliant or non-compliant.
</task>

<context>
Organization: Large technology company (10,000+ employees) with hybrid cloud infrastructure and regulatory requirements (SOX, FedRAMP, FISMA, PCI-DSS)

RAG Usage: The policy will be chunked and embedded in a vector database. When users present scenarios, relevant policy chunks will be retrieved to determine compliance. The policy must be structured to maximize accurate retrieval and evaluation.
</context>

<output_structure>
Generate the policy using EXACTLY this markdown structure:

markdown
POLICY: [Control ID]: [Control Name]

METADATA
| Field | Value |
|-------|-------|
| Policy ID | POL_[Control ID] |
| NIST Control | [Control ID]: [Control Name] |
| Version | 1.0 |
| Owner | [Appropriate Role] |
| Keywords | keyword1, keyword2, keyword3 |

1. POLICY STATEMENT
Brief (2-3 sentence) summary of what this policy requires.

2. SCOPE
| Entity | In Scope | Notes |
|--------|----------|-------|
| [Entity types] | [YES/NO/CONDITIONAL] | [Brief condition] |

3. KEY ROLES
| Role | Key Responsibilities |
|------|---------------------|
| [Role 1] | [Bullet list of key responsibilities] |
| [Role 2] | [Bullet list of key responsibilities] |

4. RULES
[RULE-01] [Concise rule statement with MUST/MUST NOT/SHALL/SHALL NOT]
[VALIDATION] IF [condition] THEN [outcome]

[RULE-02] [Concise rule statement with MUST/MUST NOT/SHALL/SHALL NOT]
[VALIDATION] IF [condition] THEN [outcome]

[Continue with all essential rules]

5. REQUIRED PROCEDURES
- [PROC-01] [Procedure name] - [Brief description]
- [PROC-02] [Procedure name] - [Brief description]

6. REVIEW REQUIREMENTS
- Policy review frequency: [Timeframe]
- Procedure review frequency: [Timeframe]
- Triggering events: [List of events requiring review]

7. SCENARIO PATTERNS
[SCENARIO-01: Name]
IF [condition1]
AND [condition2]
THEN [compliance outcome]
[Violation severity if applicable]

[SCENARIO-02: Name]
IF [condition1]
AND [condition2]
THEN [compliance outcome]
[Violation severity if applicable]

[Add 3-5 total scenarios that test key policy aspects]

8. COMPLIANCE MAPPING
| Requirement | Rule Reference |
|-------------|---------------|
| [Assessment objective 1] | [RULE-XX] |
| [Assessment objective 2] | [RULE-XX] |
</output_structure>

<formatting_rules>
1. Keep it concise: Each rule should be 1-2 sentences maximum
2. Use precise modal verbs: MUST, MUST NOT, SHALL, SHALL NOT, SHOULD, MAY
3. Include measurable criteria: Specific timeframes, percentages, thresholds
4. Write machine-parseable validation logic: Use IF-THEN format with clear operators
5. Create realistic scenarios: Focus on common compliance questions and edge cases
6. Map to assessment objectives: Ensure every NIST assessment objective links to a rule
7. Use unique identifiers: Each rule needs a unique ID for cross-referencing
</formatting_rules>

<examples>
Example Rule and Validation:

[RULE-03] Access revocation for terminated employees MUST be completed within 24 hours of HR notification for standard terminations and within 1 hour for involuntary terminations.
[VALIDATION] IF employee_status = "terminated" AND termination_type = "standard" AND revocation_time > 24_hours THEN violation
[VALIDATION] IF employee_status = "terminated" AND termination_type = "involuntary" AND revocation_time > 1_hour THEN critical_violation

Example Scenario Pattern:

[SCENARIO-04: Contractor Access After Project]
IF user_type = "contractor"
AND project_end_date < current_date
AND active_access = TRUE
AND exception_documented = FALSE
THEN compliance = FALSE
violation_severity = "Moderate"
</examples>

<input>
Generate a RAG-optimized policy document for the following NIST control:

NIST CONTROL: {{CONTROL_ID}}: {{CONTROL_NAME}}
Family: {{FAMILY}}
Class: SP800-53

Control Statement:
{{CONTROL_STATEMENT}}

Implementation Guidance:
{{IMPLEMENTATION_GUIDANCE}}

Assessment Objectives:
{{ASSESSMENT_OBJECTIVES}}

Related Controls:
{{RELATED_CONTROLS}}
</input>

**provided NIST SP 800-53 control**:
{control_markdown}
"""

In [3]:
def upload_file(local_path: Path, bucket: str, prefix: str):
    """
    Upload a local file to S3.
    """
    key = prefix.rstrip("/") + "/" + local_path.name
    s3.upload_file(str(local_path), bucket, key)
    

In [4]:
def download_file(bucket: str, key: str, local_path: Path):
    """
    Download an S3 object to a local path.
    """
    local_path.parent.mkdir(parents=True, exist_ok=True)
    s3.download_file(bucket, key, str(local_path))
    

In [4]:
def upload_directory_to_s3(source_path: Path, bucket: str, s3_prefix: str):
    """
    Recursively upload all files from source_path to S3 maintaining directory structure.
    """
    source_path = Path(source_path)
    s3_prefix = s3_prefix.rstrip("/")
    
    for file_path in source_path.rglob("*"):
        if file_path.is_file():
            # Calculate relative path from source
            relative_path = file_path.relative_to(source_path)
            # Create S3 key maintaining directory structure
            s3_key = f"{s3_prefix}/{relative_path}".replace("\\", "/")
            
            s3.upload_file(str(file_path), bucket, s3_key)
            print(f"Uploaded: {relative_path} -> s3://{bucket}/{s3_key}")


In [6]:
def print_directory_tree(base_path: Path, max_depth=None):
    def _print_tree(path, prefix="", depth=0):
        if max_depth and depth > max_depth:
            return
        
        items = sorted(path.iterdir(), key=lambda x: (x.is_file(), x.name))
        
        for i, item in enumerate(items):
            is_last = i == len(items) - 1
            current_prefix = "└── " if is_last else "├── "
            print(f"{prefix}{current_prefix}{item.name}")
            
            if item.is_dir():
                next_prefix = prefix + ("    " if is_last else "│   ")
                _print_tree(item, next_prefix, depth + 1)
    
    print(base_path.name + "/")
    _print_tree(base_path)


In [7]:
def invoke_bedrock_model(prompt: str, model_id: str = MODEL_ID, max_tokens: int = MAX_TOKENS) -> Dict[str, Any]:
    """Call AWS Bedrock with the given prompt - works with Claude and Nova models"""
    try:
        # Determine model type and prepare appropriate request body
        if 'anthropic' in model_id.lower() or 'claude' in model_id.lower():
            # Claude model format
            request_body = {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": max_tokens,
                "messages": [{"role": "user", "content": [{"type": "text", "text": prompt}]}],
                "temperature": TEMPERATURE
            }
        elif 'nova' in model_id.lower():
            # Nova models format
            request_body = {
                "messages": [{"role": "user", "content": [{"type": "text", "text": prompt}]}],
                "inferenceConfig": {"maxTokens": max_tokens, "temperature": TEMPERATURE}
            }
        else:
            # Generic format fallback - this will need to be tweaked if other models are used
            request_body = {
                "messages": [{"role": "user", "content": [{"type": "text", "text": prompt}]}],
                "max_tokens": max_tokens,
                "temperature": TEMPERATURE
            }
        
        # Invoke the model
        response = bedrock_runtime.invoke_model(modelId=model_id, body=json.dumps(request_body))
        response_body = json.loads(response['body'].read())
        
        # Extract text based on model type, currently for anthropic and nova
        if 'nova' in model_id.lower():
            result_text = response_body['output']['message']['content'][0]['text']
        else:
            result_text = response_body['content'][0]['text']
        
        return {
            'success': True,
            'result': result_text,
            'usage': response_body.get('usage', {}),
            'model_id': model_id
        }
    
    except ClientError as e:
        if e.response['Error']['Code'] == 'ThrottlingException':
            print("  ⚠ Rate limit hit, waiting 5 seconds...")
            time.sleep(5)
            return invoke_bedrock_model(prompt, model_id, max_tokens)
        else:
            print(f"  ✗ Bedrock error: {e}")
            return {'success': False, 'error': str(e)}
    
    except Exception as e:
        print(f"  ✗ Unexpected error: {e}")
        return {'success': False, 'error': str(e)}

In [8]:
def control_key(s) -> tuple[str, int, list[int]]:
    """Convert control ID to tuple for proper sorting/comparison.  
    Enables > = < comparisons of control IDs like AC-1.2.3 vs RQ-5.
    Returns (family, main_num, all_nums) for lexicographic comparison.
    e.g.: control_key("AC-1.2.3") → ("AC", 1, [1, 2, 3])
    """
    parts = s.split('-')
    return (parts[0], int(parts[1].split('.')[0]), [int(x) for x in parts[1].split('.')])

In [9]:
def generate_policies(control_root:Path=FOLDER_CONTROL_MARKDOWN,
    policy_root:Path=FOLDER_POLICY_MARKDOWN, model_id:str=MODEL_ID, start_with_control:str=""):
    """
    Traverses NIST controls in local filesystem, generating a corresponding organizational policy
    for each control in a parallel directory structure.
    """   

    # Create two copies of each policy - one set grouped together, another grouped by
    # NIST control category. Each set of generated polices will be subfoldered by date and model id.
    subfolder_name = datetime.now().strftime(f"%Y-%m-%d:%H:%M:%SET_{model_id}")
    all_policies_root: Path = policy_root / "all-controls" / subfolder_name
    all_policies_root.mkdir(parents=True, exist_ok=True)
    policies_by_family_root: Path = policy_root / "by-family" / subfolder_name
    policies_by_family_root.mkdir(parents=True, exist_ok=True)

    for root, dirs, files in os.walk(control_root):
        for control_file_name in files:
            control_absolute_file_path = os.path.join(root, control_file_name)
     
            # skip creating policies for any control that is before the specified control ID
            if start_with_control != "":  # a starting control ID has been specified
              if (control_key(control_absolute_file_path.split('/')[-1].rsplit('.', 1)[0])
                < control_key(start_with_control)):
                continue
 
            control_relative_dir_path = os.path.relpath(root, control_root)
            with open(control_absolute_file_path, 'r') as f:
                control_markdown = f.read()
                policy_markdown = create_policy_from_control(control_markdown=control_markdown,
                    model_id=MODEL_ID)
            for policy_absolute_file_path in [Path(f"{all_policies_root}/{control_relative_dir_path}/policy_{control_file_name}"),
                Path(f"{policies_by_family_root}/{control_relative_dir_path}/policy_{control_file_name}")]:
                
                policy_absolute_file_path.parent.mkdir(parents=True, exist_ok=True)
                policy_absolute_file_path.write_text(policy_markdown)
                print(f"Wrote: {policy_absolute_file_path}")
                # print("-------------------Control-------------------")
                # display(Markdown(control_markdown))
                # print("-------------------Policy-------------------")
                # display(Markdown(policy_markdown))

In [10]:
def create_policy_from_control(control_markdown:str, model_id:str=MODEL_ID) -> str:
    """
    Given markdown of one NIST control, returns a markdown organizational policy based on that control.
    """
    prompt = PROMPT_TEMPLATE.format(control_markdown=control_markdown)
    
    print(f"Calling Bedrock model: {MODEL_ID}")
    result = invoke_bedrock_model(prompt, model_id=MODEL_ID, max_tokens=MAX_TOKENS)
    return result['result']
    

In [5]:
BUCKET = '183023889407-us-east-1-compliance-rule-generator'
S3_PREFIX_POLICY_MARKDOWN = 'policies/markdown/'

FOLDER_HOME: Path = Path('/home/sagemaker-user')
# FOLDER_CONTROL_MARKDOWN: Path = FOLDER_HOME / 'data/controls/markdown/by-family/'
FOLDER_POLICY_MARKDOWN: Path = FOLDER_HOME / 'data/policies/markdown/'
upload_directory_to_s3(source_path=FOLDER_POLICY_MARKDOWN, bucket=BUCKET, s3_prefix=S3_PREFIX_POLICY_MARKDOWN)

Uploaded: by-family-main/access-control/policy_AC-1.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/by-family-main/access-control/policy_AC-1.md
Uploaded: by-family-main/access-control/policy_AC-2.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/by-family-main/access-control/policy_AC-2.md
Uploaded: by-family-main/access-control/policy_AC-2.1.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/by-family-main/access-control/policy_AC-2.1.md
Uploaded: by-family-main/access-control/policy_AC-2.2.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/by-family-main/access-control/policy_AC-2.2.md
Uploaded: by-family-main/access-control/policy_AC-2.3.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/by-family-main/access-control/policy_AC-2.3.md
Uploaded: by-family-main/access-control/policy_AC-2.4.md -> s3://183023889407-us-east-1-compliance-rule-generator

In [11]:
def main():
    print(f"Running Python: {sys.version.split()[0]}")
    generate_policies(control_root=FOLDER_CONTROL_MARKDOWN, policy_root=FOLDER_POLICY_MARKDOWN,model_id=MODEL_ID, start_with_control="SR-12")
    # save generated policy markdown files to s3, mirroring directory structure
    upload_directory_to_s3(source_path=FOLDER_POLICY_MARKDOWN, bucket=BUCKET, s3_prefix=S3_PREFIX_POLICY_MARKDOWN)


In [12]:
main()

Running Python: 3.11.11
Calling Bedrock model: us.anthropic.claude-sonnet-4-20250514-v1:0
Wrote: /home/sagemaker-user/data/policies/markdown/all-controls/2026-01-07:13:53:16ET_us.anthropic.claude-sonnet-4-20250514-v1:0/supply-chain-risk-management/policy_SR-12.md
Wrote: /home/sagemaker-user/data/policies/markdown/by-family/2026-01-07:13:53:16ET_us.anthropic.claude-sonnet-4-20250514-v1:0/supply-chain-risk-management/policy_SR-12.md
Uploaded: all-controls/2026-01-02:18:01:02ET_us.anthropic.claude-sonnet-4-20250514-v1:0/access-control/policy_AC-1.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/all-controls/2026-01-02:18:01:02ET_us.anthropic.claude-sonnet-4-20250514-v1:0/access-control/policy_AC-1.md
Uploaded: all-controls/2026-01-02:18:01:02ET_us.anthropic.claude-sonnet-4-20250514-v1:0/access-control/policy_AC-2.md -> s3://183023889407-us-east-1-compliance-rule-generator/policies/markdown/all-controls/2026-01-02:18:01:02ET_us.anthropic.claude-sonnet-4-2025051