## ScenarioJudger

 - Reads a file from S3 containing json compliance scenarios of the format:
```json
{
  "scenarios": [
    {
      "scenario-id": "scenario-id-1",
      "scenario-detail": "A new employee, Sarah Johnson, joins the IT department...",
      "is-compliant": false,
      "non-compliant-reason": "The scenario violates..." 
    },
    {
      "scenario-id": "scenario-id-2", 
      "scenario-detail": "TechCorp implements a comprehensive incident response procedure...",
      "is-compliant": true,
      "non-compliant-reason": "" 
    }
  ]
}
```
 - Evaluates the veracity each scenario-detail based on RAGed NIST-based policies in Bedrock knowledgebase, comparing its determination against "is-compliant" in the json.
 - When its determination differs, generates json records:
```json
{
  "scenarios": [
    {
      "scenario-id": "scenario-id-1",
      "scenario-detail": "A new employee, Sarah Johnson, joins the IT department...",
      "is-compliant": false,
      "non-compliant-reason": "The scenario violates...",
      "judged-compliant": true,
      "judged-compliant-reason": "Considered the rules AC...  and scenario is not in violation..."
      "llm-judge": "us.anthropic.claude-sonnet-4-20250514-v1:0",
      "judged-dtm":  
    },
    {
      "scenario-id": "scenario-id-2", 
      "scenario-detail": "TechCorp implements a comprehensive incident response procedure...",
      "is-compliant": true,
      "non-compliant-reason": "", 
      "judged-compliant": false,
      "judged-compliant-reason": "Scenario violates access control policy...",
      "llm-judge": "us.anthropic.claude-sonnet-4-20250514-v1:0",
      "judged-dtm":   
    }
  ]
}
```
 - Stores json records back to S3


In [1]:
# Import required libraries
import boto3  # AWS SDK for Python
import datetime
import json   # JSON handling
import time   # For rate limiting between API calls
from compliance_calculator import compliance_calculator, CALCULATOR_TOOL
from pathlib import Path
from typing import List, Dict  # Type hints

# ============================================================================
# CONFIGURATION SECTION - Update these values
# ============================================================================

FOLDER_HOME: Path = Path('/home/sagemaker-user')
FOLDER_JUDGED_SCENARIOS: Path = FOLDER_HOME / 'data/judged_scenarios/'
INPUT_BUCKET = '183023889407-us-east-1-compliance-rule-generator'
INPUT_PREFIX = 'scenarios/'  # Folder path in S3 where scenarios are stored
S3_PREFIX_POLICY_MARKDOWN_ALL = 'policies/markdown/all-policies-main/'
OUTPUT_BUCKET = '183023889407-us-east-1-compliance-rule-generator'
OUTPUT_PREFIX = 'scenarios-judged/'  # Folder path for results

# AWS Region
AWS_REGION = 'us-east-1'
# AWS Bedrock Knowledge Base containing NIST policies
KNOWLEDGE_BASE_ID = 'T8EW10IU3Z'

MAX_TOKENS = 4096
TEMPERATURE = 0.7

# Available Bedrock model ARNs with performance notes
MODELS = {
    'premium': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/global.anthropic.claude-opus-4-5-20251101-v1:0', # not available
    'good': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/global.anthropic.claude-sonnet-4-5-20250929-v1:0', # times out
    'balanced': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0',  # recommended
    'fast_cheap': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.anthropic.claude-haiku-4-5-20251001-v1:0',
    'aws_native_premier': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.amazon.nova-premier-v1:0',
    'aws_native_pro': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.amazon.nova-pro-v1:0'
}
MODEL_ARN = MODELS['premium']  # Default model selection

# JSON tool configuration for Bedrock Converse API
# Forces the model to return structured JSON with specific schema
TOOL_CONFIG = {
    "tools": [
        {
            "toolSpec": {
                "name": "judged_scenario_json",
                "description": "Return judged compliance scenarios as JSON",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "scenarios": {
                                "type": "array",
                                "items": {
                                    "type": "object",
                                    "properties": {
                                        "judged-compliant": {"type": "boolean"},
                                        "judged-compliant-reason": {"type": "string"}
                                    },
                                    "required": ["judged-compliant", "judged-compliant-reason"]
                                }
                            }
                        },
                        "required": ["scenarios"]
                    }
                }
            }
        },
        {
            "toolSpec": {
                "name": "compliance_calculator",
                "description": "Calculate and compare values with time, money, data, and percentage units",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "expression": {"type": "string", "description": "Expression like '800ms < 1s' or '4m > 3b'"}
                        },
                        "required": ["expression"]
                    }
                }
            }
        }
    ]
}
# CALCULATOR_TOOL["toolSpec"] references the calculator tool definition from compliance_calculator.py

# Initialize AWS Bedrock clients
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')  # For knowledge base retrieval
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')  # For model inference

In [10]:
def load_scenarios_from_s3(input_bucket: str = INPUT_BUCKET, input_prefix: str = INPUT_PREFIX, object_name: str = "scenarios.json") -> List[Dict]:
    """
    Load scenarios from S3 JSON file.
    """
    s3 = boto3.client('s3')
    response = s3.get_object(Bucket=input_bucket, Key=input_prefix+object_name)
    json_data = json.loads(response['Body'].read().decode('utf-8'))
    return json_data["scenarios"]


In [11]:
def save_scenarios_to_s3(scenarios: List[Dict], output_bucket: str = OUTPUT_BUCKET, output_prefix: str = OUTPUT_PREFIX, object_name: str = "scenarios.json"):
    """
    Save generated scenarios to a S3.
    """
    s3 = boto3.client('s3')
    json_data = json.dumps({"scenarios": scenarios}, indent=2)
    s3.put_object(Bucket=output_bucket, Key=output_prefix+object_name, Body=json_data)

In [12]:
def retrieve_policies_by_id(bucket:str, folder:str, policy_ids: List[str]) -> str:
    """
    Retrieve specific policy documents from s3.
    """

    s3 = boto3.client('s3')
    policies = []
    for policy_id in policy_ids:
        response = s3.get_object(Bucket=bucket, Key=folder + policy_id + ".md")
        content = response['Body'].read().decode('utf-8')
        policies.append(f"{policy_id}:\n{content}")
    
    return "\n\n".join(policies)
    

In [13]:
def judge_scenarios(
    source_scenarios: List[Dict],
    model_arn: str, 
    kb_id: str = KNOWLEDGE_BASE_ID,
    tempurature: float = TEMPERATURE,
    max_scenarios: int = None
) -> List[Dict]:
    """
    Process scenarios and add judgment fields.
    """

    # Limit scenarios if max_scenarios is specified
    scenarios_to_process = source_scenarios[:max_scenarios] if max_scenarios else source_scenarios
    
    # Extract model ID from ARN (Converse API requires model ID, not full ARN)
    model_id = model_arn.split('/')[-1] if '/' in model_arn else model_arn
    
    judged_scenarios = []
    for scenario in scenarios_to_process:
        judged_scenario = scenario.copy()

        # Extract policy IDs from scenario and pull the policy data from the knowledge base
        import re
        policy_match = re.search(r'Policies referenced: (.+)', scenario["scenario-detail"])
        if policy_match:
            policy_ids = [p.strip() for p in policy_match.group(1).split(',')]
            retrieved_policies = retrieve_policies_by_id(INPUT_BUCKET, S3_PREFIX_POLICY_MARKDOWN_ALL, policy_ids)
        else:
            print(f"No policies referenced in scenario: {scenario['scenario-detail']}")
            continue
                
        prompt = f"""
        You are **ComplianceEvaluator**, an expert AI compliance analyst specializing in NIST 800-53 controls and policies. 
        Your mission is to judge organizational policy scenarios against reference policies stored in your knowledge base.
                
        **Your Expertise:**
        - Deep understanding of all NIST 800-53 Rev. 5 control families (AC, AT, AU, CA, CM, CP, IA, IR, MA, MP, PE, PL, PM, PS, PT, RA, SA, SC, SI, SR)
        - Policy-to-control mapping and compliance evaluation
        - Evidence-focused assessment methodology

        **Task:** Judge if the scenario complies with ALL referenced policies from your knowledge base.

        **Avoid judging scenarios based on cost-benefit principles or concentration percentages.
    
        **Note that non-US citizens cannot obtain US security clearances.**

        **CRITICAL: When comparing timeframes, values, or thresholds:**
        - If a scenario meets or exceeds (performs better than) policy requirements, it is COMPLIANT
        - 
        - If a policy requires "at least quarterly (90 days)" and scenario shows "95 days", this is NON-COMPLIANT (95 > 90)
        - Always use the compliance_calculator tool to verify numerical comparisons when in doubt

        **CRITICAL: For ANY numerical comparison involving timeframes, values, or thresholds:**
        - ALWAYS use the compliance_calculator tool to verify comparisons - do not do mental math
        - If a scenario meets or exceeds (performs better than) policy requirements, it is COMPLIANT.  For example, if a policy requires "within 24 hours" and scenario shows "within 18 hours", this is COMPLIANT (18 < 24).
        - If a scenario does not meet policy requirements it is NON-COMPLIANT.  For example, if a policy requires "at least quarterly (90 days)" and scenario shows "95 days", this is NON-COMPLIANT (95 > 90)

        **Response Format:**
        {{
          "judged-compliant": true/false, true if you determined the scenario is compliant with the organizational 
        policies stored in your knowledge base.  false if the scenario is not compliant.
          "judged-compliant-reason": "Empty if compliant. If the scenario is not compliant, explain very briefly why it is not compliant, citing
          exactly the policy ID(s) is violates, followed by the extracted policy text that indicates non-compliance."
        }}

        **Evaluate scenario against this policy data**:
        {retrieved_policies}

        **Here is the actual compliance scenario to judge**:
        {scenario["scenario-detail"]}
        """
        messages=[{"role": "user", "content": [{"text": prompt}]}]
        input_tokens = 0
        output_tokens = 0
        
        while True:
            response = bedrock_runtime.converse(
                modelId=model_id,
                messages=messages,
                toolConfig=TOOL_CONFIG,
                inferenceConfig={
                    "maxTokens": MAX_TOKENS,
                    "temperature": tempurature
                }
            )

            # track per-scenario token usage
            usage = response.get('usage', {})
            input_tokens += usage.get('inputTokens', 0)
            output_tokens += usage.get('outputTokens', 0)
            
            if response['stopReason'] == 'tool_use':
                tool_results = []
                for content_block in response['output']['message']['content']:
                    if 'toolUse' in content_block:
                        tool_name = content_block['toolUse']['name']
                        tool_use_id = content_block['toolUse']['toolUseId']
                        
                        if tool_name == 'compliance_calculator':
                            expression = content_block['toolUse']['input']['expression']                         
                            result = compliance_calculator(expression)
                            print("=" * 60)
                            print(f"Compliance calculator expression: {expression}" )
                            print(f"Compliance calculator result: {result}" )
                            print("=" * 60)
                            tool_results.append({
                                "toolResult": {
                                    "toolUseId": tool_use_id,
                                    "content": [{"text": result}]
                                }
                            })
                        elif tool_name == 'judged_scenario_json':
                            tool_result = content_block['toolUse']['input']
                            judged_scenario["judged-compliant"] = tool_result['scenarios'][0]['judged-compliant']
                            judged_scenario["judged-compliant-reason"] = tool_result['scenarios'][0]['judged-compliant-reason']
                            break
                
                if tool_results:
                    messages.append({"role": "assistant", "content": response['output']['message']['content']})
                    messages.append({"role": "user", "content": tool_results})
                else:
                    break
            else:
                break

        judged_scenario["judged-dtm"] = datetime.datetime.now().isoformat()
        judged_scenario["llm-judge"] = model_arn.split('/')[-1]
        judged_scenario["llm-judge-temp"] = tempurature
        judged_scenario["llm-judge-input-tokens"] = input_tokens
        judged_scenario["llm-judge-output-tokens"] = output_tokens
        judged_scenario["llm-judge-total-tokens"] = input_tokens + output_tokens
        judged_scenarios.append(judged_scenario)
    
    return judged_scenarios

In [14]:
def save_scenarios_to_file(scenarios: List[Dict], output_path: Path):
    
    # Print scenarios to console for immediate review
    print(json.dumps(scenarios, indent=2))

    # Create parent directories if they don't exist
    output_path.parent.mkdir(parents=True, exist_ok=True)
    
    # Save to file with metadata and statistics
    with open(output_path, 'w') as f:
        json.dump({
            'total_scenarios': len(scenarios),
            'compliant_count': sum(1 for s in scenarios if s['is-compliant']),
            'non_compliant_count': sum(1 for s in scenarios if not s['is-compliant']),
            'judged compliant_count': sum(1 for s in scenarios if s['judged-compliant']),
            'judged non_compliant_count': sum(1 for s in scenarios if not s['judged-compliant']),
            'scenarios': scenarios
        }, f, indent=2)

In [15]:
def main():
    
    judger_models = [
        {
            'name': 'claude_3_5_sonnet',
            'arn': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0',
            'temperature': 0.1
        },
        {
            'name': 'claude_4_sonnet',
            'arn': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0',
            'temperature': 0.0
        },
        {
            'name': 'claude_opus_4_5',
            'arn': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/global.anthropic.claude-opus-4-5-20251101-v1:0',
            'temperature': 0.0
        },
        {
            'name': 'nova_premier',
            'arn': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.amazon.nova-premier-v1:0',
            'temperature': 0.1
        },
        {
            'name': 'nova_2_pro',
            'arn': 'arn:aws:bedrock:us-east-1:183023889407:inference-profile/us.amazon.nova-pro-v1:0',
            'temperature': 0.2
        }
    ]
    
    source_scenarios_file = "scenarios.json"  
    source_scenarios = load_scenarios_from_s3(INPUT_BUCKET, INPUT_PREFIX, source_scenarios_file)

    for model in judger_models:
        print(f"Processing with model: {model['name']} (temp: {model['temperature']})")
    
        judged_scenarios = judge_scenarios(
            source_scenarios=source_scenarios,
            model_arn=model['arn'],
            kb_id=KNOWLEDGE_BASE_ID,
            tempurature=model['temperature'],
            max_scenarios=5
        )
        
        judged_scenarios_file = f"judged_scenarios_{model['name']}_temp{model['temperature']}.json"
        
        save_scenarios_to_file(judged_scenarios, FOLDER_JUDGED_SCENARIOS / judged_scenarios_file)
        save_scenarios_to_s3(judged_scenarios, OUTPUT_BUCKET, OUTPUT_PREFIX, judged_scenarios_file)

        print(f"Completed {model['name']}: {len(judged_scenarios)} scenarios processed")


In [16]:
main()

Compliance calculator expression: 10 months <= 12 months
Compliance calculator result: True
Compliance calculator expression: 8 months <= 12 months
Compliance calculator result: True
Compliance calculator expression: 3 days <= 5 days
Compliance calculator result: True
Compliance calculator expression: 10 years >= 7 years
Compliance calculator result: True
Compliance calculator expression: 24 hours <= 48 hours
Compliance calculator result: True
Compliance calculator expression: 15 days <= 30 days
Compliance calculator result: True
Compliance calculator expression: 8 years >= 7 years
Compliance calculator result: True
Compliance calculator expression: 18 months <= 24 months
Compliance calculator result: True
Compliance calculator expression: 11 months <= 12 months
Compliance calculator result: True
Compliance calculator expression: 6 months <= 12 months
Compliance calculator result: True
Compliance calculator expression: 20 days <= 30 days
Compliance calculator result: True
Compliance ca