# Marketplace Validator
- **Goal**: Validate lender listings (credit cards, loans) against Marketplace Standards and suggest compliant rewrites.

- **Process**:

    1. Load the baseline prompt and sample listings.

    2. Call a Validator LLM (Gemini) per listing; save structured JSON outputs.

    3. Call a Judge LLM to score each validator output against the standards.

    4. Aggregate into a leaderboard with pass rate and criterion scores.

    4. Swap in an improved prompt and compare results.

#### Assumptions and constraints
- **Standards compliance** is assessed via an LLM (not a deterministic rules engine).

- **Self-evaluation bias**: Same model family may be used for validation and judging; acceptable for fast iteration, but cross-model judging is preferable for final scoring.

- **Input shape**: Listings are plain text; the prompt template contains a {{ listing }} placeholder.

## Imports and installs

The code  imports standard libraries 

In [1]:
# !pip install google-genai --user
# !pip install -q google-generativeai pandas tqdm python-dotenv

In [29]:
import pandas as pd
import time
import json
from typing import Optional, Any, List

In [3]:
import os
import glob
os.environ['GEMINI_API_KEY'] = 'AIzaSyC4Y2gm1beWZx8tMVJNnZjgE_dB1ac-zbM' 

In [4]:
from google import genai
from google.genai.errors import APIError

In [28]:

import pathlib
from tqdm.auto import tqdm
import logging
logger = logging.getLogger(__name__) 
logger.setLevel(logging.INFO) 

## Helper functions

- `read_txt` reads a single text file (used for prompts).

- `write_file` saves the results of a validation run (standards prompt, original listing, and AI output) into a JSON file for later analysis.

- `read_listings` reads all `.txt` files in a given directory (`data/sample_listings/`) to get the credit card listings for validation.

- `load_client` initializes the `genai.Client`.

In [None]:

def read_txt(file_path: str) -> Optional[str]:
    """
    Reads the entire content of a text file and returns it as a string.

    This function attempts to open and read a file specified by `file_path`.
    It uses a context manager (`with open(...)`) to ensure the file is properly
    closed even if errors occur.

    Args:
        file_path (str): The full path to the text file to be read.

    Returns:
        str or None: The entire content of the file as a single string, or
                     None if a FileNotFoundError occurs (e.g., the file
                     doesn't exist at the specified path).
    """
    try:
        # 'r' mode opens the file for reading
        with open(file_path, 'r') as file:
            # read() reads the entire content of the file
            file_content = file.read()

    except FileNotFoundError:
        logger.info(f"Error: The file '{file_path}' was not found.")
        return None
    return file_content

In [None]:
def write_file(path: str, result_prompt: Any, listing: Any, text: Any) -> None:
    """
    Writes a dictionary containing structured data to a file in JSON format.

    Args:
        path (str): The full path to the output file (e.g., 'results/data.json').
                    The file will be overwritten if it already exists ('w' mode).
        result_prompt (Any): Data to be stored under the "standards" key. 
                             This typically represents a set of expected standards or criteria.
        listing (Any): Data to be stored under the "listing" key.
                       This often represents a list of items or configuration details.
        text (Any): Data to be stored under the "actual_output" key.
                    This usually represents the raw or processed output from a process.

    Returns:
        None: The function handles file writing and logging internally.
    """
    with open(path, "w", encoding="utf-8") as f:
        f.write(json.dumps({
            "standards": result_prompt,
            "listing": listing,
            "actual_output": text
        }, ensure_ascii=False) + "\n")

    logger.info(f"Wrote to {path}")

In [None]:
def read_listings(sample_listings_directory_path: str) -> List[str]:
    """
    Reads the content of all text files (*.txt) within a specified directory.

    Args:
        sample_listings_directory_path (str): The path to the directory 
                                              containing the text files.

    Returns:
        list[str]: A list where each element is the string content of one 
                   successfully read text file. Files that cause errors are 
                   skipped, and their content is not included in the list.
    """
    listings = []
    file_paths = glob.glob(os.path.join(sample_listings_directory_path, '*.txt'))
    for file_path in file_paths:
        try:
            # Open the file in read mode ('r')
            with open(file_path, 'r', encoding='utf-8') as f:
                content = f.read()
                listings.append(content)
                logger.info(f"Successfully read: {file_path}")
        except FileNotFoundError:
            logger.info(f"Error: File not found at {file_path}")
        except Exception as e:
            logger.info(f"Error reading {file_path}: {e}")
    return listings

In [None]:
def load_client():
    """
    Initializes and returns the Gemini API client.

    Returns:
        genai.Client or None: An initialized instance of the Gemini client 
                              if successful, otherwise returns None.
    """
    try:
        # Initialize the client. Assumes GEMINI_API_KEY is set in the environment.
        client = genai.Client()
        return client
    except Exception:
        logger.info("ERROR: Could not initialize Gemini Client. Ensure GEMINI_API_KEY is set.")
        return

## Config

In [None]:
sample_listings_directory_path = 'data/sample_listings/'
prompt_path = 'prompts/marketplace_validator_prompt.txt'
ROOT = "data/validation_original_prompt"   
MODEL = "gemini-2.5-flash"                 
OUT_CSV = "leaderboard.csv"
OUT_JSON = "details.json"
MAX_RETRIES = 3
TEMPERATURE = 0
model = MODEL

## Inputs   

**Data Preparation**: The original prompt is read into the `prompt` variable, and the sample listings are loaded into the `listings` list.

In [None]:
# Read original prompt 
prompt = read_txt(prompt_path)

In [None]:
# Read the listings
listings = read_listings(sample_listings_directory_path)

Successfully read: data/sample_listings\listing_01.txt
Successfully read: data/sample_listings\listing_02.txt
Successfully read: data/sample_listings\listing_03.txt
Successfully read: data/sample_listings\listing_04.txt
Successfully read: data/sample_listings\listing_06.txt
Successfully read: data/sample_listings\listing_07.txt
Successfully read: data/sample_listings\listing_08.txt
Successfully read: data/sample_listings\listing_09.txt
Successfully read: data/sample_listings\listing_10.txt
Successfully read: data/sample_listings\listing_11.txt
Successfully read: data/sample_listings\listing_12.txt
Successfully read: data/sample_listings\listing_13.txt
Successfully read: data/sample_listings\listing_14.txt
Successfully read: data/sample_listings\listing_15.txt
Successfully read: data/sample_listings\listing_16.txt
Successfully read: data/sample_listings\listing_17.txt
Successfully read: data/sample_listings\listing_18.txt
Successfully read: data/sample_listings\listing_19.txt
Successful

## Evaluate the listings with prompt

This section executes the marketplace validation using the original, provided prompt (`prompts/marketplace_validator_prompt.txt`)

In [None]:
def process_listings(listings: list[str], prompt_template: str, output_filepath: str = "data/"):
    """
    Loops through a list of credit card listings, generates a prompt for each, 
    calls the Gemini API for validation, and writes the results to a separate, 
    uniquely named JSON file.

    Args:
        listings (list[str]): A list where each element is the content of a listing file.
        prompt_template (str): The template string containing the validation standards and the 
                                placeholder '{{ listing }}'.
        output_filepath (str): This is used as the base path for the output directory 
                                (e.g., 'data/').
    """
    client = load_client()

    logger.info(f"Starting API validation for {len(listings)} listings...")

    # Determine the output directory from the base path
    output_dir = os.path.dirname(output_filepath)
    if not output_dir:
        # If output_filepath is just "data/", os.path.dirname returns empty, so use "data"
        output_dir = output_filepath.rstrip('/')

    for i, listing in enumerate(listings):
        logger.info(f"\n--- Processing Listing {i+1}/{len(listings)} ---")

        # Prepare prompt
        result_prompt = prompt_template.replace('{{ listing }}', listing)
        
        ai_response_text = "API_CALL_FAILED"
        max_retries = 5
        wait_time = 2  # Exponential backoff start time

        # Get response with exponential backoff
        for attempt in range(max_retries):
            try:
                logger.info(f"  Attempt {attempt + 1}: Calling Gemini...")
                
                response = client.models.generate_content(
                    model="gemini-2.5-flash",
                    contents=result_prompt
                )
                
                ai_response_text = response.text
                logger.info(f"  SUCCESS! Response received.")
                break  # Exit retry loop on success
                
            except APIError as e:
                # Handle transient errors (e.g., rate limits)
                if attempt < max_retries - 1:
                    logger.info(f"  API Error ({e}). Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                    wait_time *= 2  # Double wait time
                else:
                    logger.info("  Max retries reached. Saving error message and proceeding.")
                    ai_response_text = f"API_ERROR_MAX_RETRIES: {e}"
            
            except Exception as e:
                # Handle unexpected, non-retryable errors
                logger.info(f"  UNEXPECTED ERROR: {e}. Skipping this listing.")
                ai_response_text = f"UNEXPECTED_ERROR: {e}"
                break
        
        # Write to a unique file for this listing
        unique_filename = f"listing_validation_{i+1}.json"
        unique_filepath = os.path.join(output_dir, unique_filename)

        try:
            # We now call the newly defined write_file function
            write_file(unique_filepath, result_prompt, listing, ai_response_text)
            logger.info(f"  Results for Listing {i+1} saved.")
        except Exception as e:
             # Catching general exception here in case of a file system error
             logger.info(f"FATAL ERROR saving results: {e}")
             return

    logger.info("\n--- Validation Loop Complete ---")

In [None]:
process_listings(listings, prompt, output_filepath = "data/validation_original_prompt")

Starting API validation for 24 listings...

--- Processing Listing 1/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 1 saved.

--- Processing Listing 2/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 2 saved.

--- Processing Listing 3/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 3 saved.

--- Processing Listing 4/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 4 saved.

--- Processing Listing 5/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 5 saved.

--- Processing Listing 6/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 6 saved.

--- Processing Listing 7/24 ---
  Attempt 1: Calli

## LLM as a judge

Evaluates the results using the LLM-as-a-Judge framework.

- The `JUDGE_SYSTEM` prompt is loaded from a file (`prompts/judge_prompt.txt`).

- The `build_user_prompt` function prepares a structured context (standards, listing, actual output) for the judge model.

- The `call_gemini_judge` function submits the structured context, along with the judge's system prompt, to the Gemini model for evaluation. It uses `temperature: 0` for deterministic scoring and includes retry logic. It expects a JSON response with "verdict" and "scores".

- The `evaluate_folder` function orchestrates the judge process: it finds all output files from the `process_listings` step, calls the `judge_fn` (`call_gemini_judge`) for each one, and compiles the results into a detailed record DataFrame (`df_a`) and an aggregated leaderboard (`lb_a`).

In [14]:
JUDGE_SYSTEM = read_txt("prompts/judge_prompt.txt")

In [None]:
def build_user_prompt(standards: str, listing: str, actual_output: str) -> str:
    """
    Constructs a structured context string for the LLM by embedding key data 
    fields after JSON-serializing them.

    Args:
        standards (str): The prompt defining the criteria or expected output 
                         (the "standards").
        listing (str): The raw input content being analyzed (the "listing").
        actual_output (str): The result or previous output generated by a process.

    Returns:
        str: A single formatted string containing the three inputs under 
             a "Context:" header, with each input value safely enclosed 
             by JSON serialization.
    """
    # Keep it compact & safe-JSON
    return (
        "Context:\n"
        f'"standards": {json.dumps(standards)}\n'
        f'"listing": {json.dumps(listing)}\n'
        f'"actual_output": {json.dumps(actual_output)}\n'
    )

In [None]:
def call_gemini_judge(standards: str, listing: str, actual_output: str,
                      retries: int = MAX_RETRIES):
    """
    Calls the Gemini model (acting as a 'judge') to evaluate and score
    the provided data against defined standards, incorporating retry logic.


    Args:
        standards (str): The criteria or scoring rubric used for the evaluation.
        listing (str): The raw input content being evaluated.
        actual_output (str): The result or proposed improvement generated in a
                            previous step.
        retries (int, optional): The maximum number of attempts to call the API.
                                Defaults to MAX_RETRIES (assumed global).

    Returns:
        Dict[str, Any]: The parsed and validated JSON response payload from the 
                        model, expected to contain "verdict" and "scores" keys.

    Raises:
        Exception: Re-raises the last exception if all retry attempts fail
                (e.g., API error, JSONDecodeError, or dictionary key error).
    """

    user_presult_prompt  = build_user_prompt(standards, listing, actual_output)
    full_prompt = JUDGE_SYSTEM + "\n\n" + user_presult_prompt
    client = load_client()
    for attempt in range(retries):
        try:
            
            resp = client.models.generate_content(
                model=MODEL,
                contents=full_prompt,
                config={"temperature": TEMPERATURE},
            )
            text = (resp.text or "").strip()
            # Sometimes models wrap JSON with ```...```. Extract if needed.
            #m = re.search(r"\{.*\}\s*$", text, re.S)
            #logger.info(text)
            if text.startswith("```json"):
                text = text[7:-3]
            payload = json.loads(text)
            # Light validation
            _ = payload["verdict"]
            _ = payload["scores"]
            return payload
        except Exception as e:
            if attempt == retries - 1:
                raise
            time.sleep(1.2 * (attempt + 1))

In [17]:
def find_cases(root: str):
    """
    Returns list of dicts:
    {evaluator, path}
    evaluator = subfolder name if present; otherwise 'default'.
    """
    root_path = pathlib.Path(root)
    assert root_path.exists(), f"Path not found: {root}"
    cases = []
    for f in root_path.rglob("*.json"):
        cases.append({"evaluator": "default", "path": str(f)})
    return cases


In [None]:
def _agg_mean(s):
    """
    Calculates the mean of a series after robustly cleaning and converting to numeric type.

    Args:
        s (Union[pd.Series, list, tuple]): The data series or list of values 
                                           for which the mean should be calculated.

    Returns:
        float: The calculated mean, rounded to 3 decimal places, or 0.0 if any 
               exception occurs during processing (e.g., all values are non-numeric).
    """
    try:
        return round(float(pd.Series(s).dropna().astype(float).mean()), 3)
    except Exception:
        return 0.0

def _make_leaderboard(df: pd.DataFrame) -> pd.DataFrame:
    """
    Generates a structured leaderboard DataFrame by aggregating evaluation scores.

    Args:
        df (pd.DataFrame): The input DataFrame containing raw evaluation results.
                           It is expected to have columns: 'evaluator', 'file',
                           'verdict', 'coverage', 'correctness', 'helpfulness',
                           'structure', 'rewrite_quality', and 'gold_accuracy'.

    Returns:
        pd.DataFrame: A leaderboard, sorted in descending order primarily by
                      'pass_rate', then by 'avg_correctness', and finally by
                      'avg_coverage'. The returned columns include:
                      - 'evaluator'
                      - 'num_cases' (count of cases evaluated)
                      - 'pass_rate' (percentage of 'pass' verdicts)
                      - 'avg_coverage', 'avg_correctness', 'avg_helpfulness',
                        'avg_structure', 'avg_rewrite_quality' (all calculated mean scores)
                      - 'gold_accuracy' (TBD, currently aggregated by _agg_mean)
    """

    if df.empty:
        return pd.DataFrame(
            columns=["evaluator","num_cases","pass_rate","avg_coverage",
                     "avg_correctness","avg_helpfulness","avg_structure",
                     "avg_rewrite_quality","gold_accuracy"]
        )
    lb = (
        df.groupby("evaluator")
             .agg(
                 num_cases=("file", "count"),
                 pass_rate=("verdict", lambda s: round(100.0 * (s == "pass").mean(), 2)),
                 avg_coverage=("coverage", _agg_mean),
                 avg_correctness=("correctness", _agg_mean),
                 avg_helpfulness=("helpfulness", _agg_mean),
                 avg_structure=("structure", _agg_mean),
                 avg_rewrite_quality=("rewrite_quality", _agg_mean),
             )
             .reset_index()
             .sort_values(["pass_rate","avg_correctness","avg_coverage"], ascending=False)
    )
    return lb

def evaluate_folder(
    root: str,
    judge_fn,                      # e.g., call_gemini_judge
    save_csv: str = None,
    save_json: str = None,
    show_progress: bool = True,
):
    """
    Run LLM-as-a-Judge over all JSON files found by find_cases(root).

    Returns:
        df_records: per-file flat records (DataFrame)
        leaderboard: aggregated metrics by evaluator (DataFrame)
        details: per-file rich dicts suitable for inspection or saving (list)
    """
    cases = find_cases(root)
    logger.info(f"Found {len(cases)} JSON files in {root}.")

    records, details = [], []

    iterator = tqdm(cases) if show_progress else cases
    for c in iterator:
        try:
            with open(c["path"], "r", encoding="utf-8") as f:
                obj = json.load(f)
        except Exception as e:
            logger.info(f"[WARN] Failed to read {c['path']}: {e}")
            continue

        standards = obj.get("standards", "")
        listing = obj.get("listing", "")
        actual_output = obj.get("actual_output", "")

        if not (standards and listing and actual_output):
            logger.info(f"[WARN] Missing fields in {c['path']}")
            continue

        # Call your judge function with safety net
        try:
            judge = judge_fn(standards, listing, actual_output)
        except Exception as e:
            # Record an error verdict so you can spot failures without breaking the run
            judge = {
                "verdict": "error",
                "scores": {
                    "coverage": None, "correctness": None, "helpfulness": None,
                    "structure": None, "rewrite_quality": None
                },
                "error": str(e)
            }


        details.append({
            "evaluator": c["evaluator"],
            "file": c["path"],
            "judge": judge,
        })

        scores = judge.get("scores", {}) or {}
        records.append({
            "evaluator": c["evaluator"],
            "file": c["path"],
            "verdict": judge.get("verdict"),
            "coverage": scores.get("coverage"),
            "correctness": scores.get("correctness"),
            "helpfulness": scores.get("helpfulness"),
            "structure": scores.get("structure"),
            "rewrite_quality": scores.get("rewrite_quality"),
            "improvements":scores.get("improvements")
            #"gold_verdict": gold
        })

    df = pd.DataFrame(records)
    leaderboard = _make_leaderboard(df)

    if save_csv:
        leaderboard.to_csv(save_csv, index=False)
    if save_json:
        with open(save_json, "w", encoding="utf-8") as f:
            json.dump(details, f, ensure_ascii=False, indent=2)

    return df, leaderboard, details

In [66]:
# Using your Gemini judge function
df_a, lb_a, details_a = evaluate_folder(
    root="data/validation_original_prompt",
    judge_fn=call_gemini_judge,              # or call_gemini_judge_safe
    save_csv="data/reporting/leaderboard_original.csv",
    save_json="data/reporting/details_original.json",
)

Found 24 JSON files in data/validation_original_prompt.


  0%|          | 0/24 [00:00<?, ?it/s]

The final leaderboard for the original prompt shows:

In [78]:
lb_a


Unnamed: 0,evaluator,num_cases,pass_rate,avg_coverage,avg_correctness,avg_helpfulness,avg_structure,avg_rewrite_quality
0,default,24,83.33,4.958,4.958,5,5,4.875


## Improved prompt

- **Enhanced Persona and Authority**: It assigns the LLM the role of a **Senior Compliance and Content Auditor specializing in UK financial listings (FCA regulations)**. This demands a stricter, more professional, and regulatory-focused evaluation.

- **Mandatory Comprehensive Check**: It requires the model to **systematically evaluate against all six main sections (1-6)** of the standards, preventing premature stopping after finding initial violations.

- **Quantitative Analysis Requirements**: The prompt forces objective data collection and reporting for specific standards:

    - Calculate and note the **exact word count** of the entire listing.

    - For Language/Clarity issues (Standard 2.1), calculate the **sentence length** and check for **active voice**.

- **Strict Output Fidelity**: It enforces a precise JSON schema, including the new quant_data field, which must contain the measurable data (e.g., "Sentence Length: [X] words").

- **Prioritization**: It instructs the model to prioritize violations in **Section 1 (Regulatory)** and issues affecting the critical **first 100 words** of the listing.

**Re-running Validation**: The `process_listings` function is called again with the `improved_prompt` and a new output path (`data/validation_improved_prompt/`).

In [25]:
improved_prompt = read_txt('prompts/improved_marketplace_validator_prompt.txt')
process_listings(listings, improved_prompt, output_filepath = "data/validation_improved_prompt/")

Starting API validation for 24 listings...

--- Processing Listing 1/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 1 saved.

--- Processing Listing 2/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 2 saved.

--- Processing Listing 3/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 3 saved.

--- Processing Listing 4/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 4 saved.

--- Processing Listing 5/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 5 saved.

--- Processing Listing 6/24 ---
  Attempt 1: Calling Gemini...
  SUCCESS! Response received.
Wrote data/listings.jsonl
  Results for Listing 6 saved.

--- Processing Listing 7/24 ---
  Attempt 1: Calli

**Re-running Judge Evaluation**: The `evaluate_folder` function is run on the new output folder, generating the `lb_i` leaderboard.

In [26]:
# Using your Gemini judge function
df_i, lb_i, details_i = evaluate_folder(
    root="data/validation_improved_prompt",
    judge_fn=call_gemini_judge,              # or call_gemini_judge_safe
    save_csv="data/reporting/leaderboard_improved.csv",
    save_json="data/reporting/details_improved.json",
)

Found 24 JSON files in data/validation_improved_prompt.


  0%|          | 0/24 [00:00<?, ?it/s]

**Improved Prompt Results**

In [27]:
lb_i

Unnamed: 0,evaluator,num_cases,pass_rate,avg_coverage,avg_correctness,avg_helpfulness,avg_structure,avg_rewrite_quality
0,default,24,33.33,4.458,4.667,4.958,4.708,4.917


## Comparison of Original vs. Improved Prompt Performance

| Metric | Original Prompt | Improved Prompt | Difference (Improved - Original) |
| :--- | :--- | :--- | :--- |
| **Pass Rate** | 83.33% | 33.33% | -50.00% |
| **Avg. Coverage** | 4.958 | 4.458 | -0.500 |
| **Avg. Correctness** | 4.958 | 4.667 | -0.291 |
| **Avg. Helpfulness** | 5.000 | 4.958 | -0.042 |
| **Avg. Structure** | 5.000 | 4.708 | -0.292 |
| **Avg. Rewrite Quality** | 4.875 | 4.917 | +0.042 |

**Strictness vs. Accuracy**: The most significant change is the dramatic **drop in the Pass Rate from 83.33% to 33.33%**. This suggests the **Improved Prompt** is significantly **stricter** in its validation criteria. It likely instructs the model to adhere to the Marketplace Standards with less tolerance for ambiguity or minor deviations, causing it to fail more listings.

**Tradeoff**: **Pass Rate vs. Quality**: The drop in avg_coverage, avg_correctness, and avg_structure suggests that for the cases that did fail, the judge found the **actual output** (the validation result from the initial model call) to be slightly less comprehensive or structured when compared to the (likely stricter) **standards** embedded in the Improved Prompt.

**Tradeoff**: **Rewrite Quality**: Interestingly, the **Average Rewrite Quality** slightly **improved** (4.875 to 4.917). This indicates that while the Improved Prompt resulted in more listings failing the overall check, the **suggested changes or rewrites** provided by the model for those failures were rated slightly higher by the LLM-as-a-Judge. This is a common pattern where a stricter prompt leads to a lower pass rate but higher quality corrections when a failure occurs.

**Overall Conclusion**: The Improved Validator Prompt trades off a higher overall pass rate for a **stricter validation** process. If the goal of the marketplace is to have **zero tolerance** for minor standard violations, the Improved Prompt is better because it identifies and flags more issues. If the goal is a rapid, high-throughput validation that only catches major errors, the Original Prompt might be better.