# Hallucination Detection

**The method adopts the concept of RefChecker and is primarily divided into Extractor and Checker components. However, I am unable to directly apply the GitHub method (https://github.com/amazon-science/RefChecker/tree/main/refchecker) due to the following issues:**

1. The GitHub implementation relies on the OpenAI API, which requires a proxy for usage in China.
2. RefChecker does not achieve the goal of hallucination position detection.
3. RefChecker uses its own set of references for comparison, but the references I have are limited, some are inaccessible and just can be the context.

**Based on the above, I only use the conceptual framework of RefChecker and some of its key prompts for improvement and optimization. In my method, I primarily use the Anthropic API (Claude).**

**My idea of solving the problems:**
1. **Claim Extraction:** Claims are extracted by using the Anthropic API to process both `model_output_text` and `model_input`. Each claim is a merged representation derived from a triple-structured knowledge breakdown. (This uses prompts from RefChecker's Extractor component).

2. **Primary Reference Acquisition:** Based on the claims obtained in Step 1, I use a self-verification approach. Each claim is processed through the Anthropic API for self-verification to obtain factually consistent statements and even supplemental content. The returned statements are combined with external knowledge I retrieved earlier using the Google API. This forms a complete set of references for subsequent verification checks.

3. **Verification of Claims, Model Input, Model Output Text, and References:** The extracted claims, references from Step 2, and `model_input` are verified. The Anthropic API is employed with specific prompts to identify hallucinated words in `model_output_text` and estimate their hallucination probability.

4. **Soft Label Generation:** Based on the hallucinated words identified above, their positions in `model_output_text` are mapped. Using the probabilities obtained, a set of `soft_labels` is generated.

In [None]:
import json
import os
import anthropic
from tqdm import tqdm
import requests
import httpx
from RefChecker.refchecker.extractor import extractor_prompts
from scorer import recompute_hard_labels
import glob

In [None]:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

In [None]:
os.environ['ANTHROPIC_API_KEY'] = 'sk-an' # set the Anthropic key

# setting proxies (cause in China)
proxies = {
    "http": "http://127.0.0.1:10809",
    "https": "http://127.0.0.1:10809",
    "socks5": "socks5://127.0.0.1:10808"  # SOCKS5 proxy
}

session = requests.Session()
session.proxies.update(proxies)

# setting Anthropic API key
api_key = os.getenv("ANTHROPIC_API_KEY")

## 1. Extracting Claims (Extractor) - Each claim is a merger of triple-structured knowledge.

In [None]:
LLM_TRIPLET_EXTRACTION_PROMPT_Q = extractor_prompts.LLM_TRIPLET_EXTRACTION_PROMPT_Q
LLM_Triplet_To_Claim_PROMPT_Q = extractor_prompts.LLM_Triplet_To_Claim_PROMPT_Q
LLM_CLAIM_EXTRACTION_PROMPT_Q = extractor_prompts.LLM_CLAIM_EXTRACTION_PROMPT_Q

In [None]:
url_anthropic = "https://api.anthropic.com/v1/messages"
def extract_triplets_to_claims(question, model_output_text):
    prompt = LLM_CLAIM_EXTRACTION_PROMPT_Q.format(q=question, r=model_output_text)
    headers = {
        "x-api-key": api_key,
        "content-type": "application/json",
        "anthropic-version": "2023-06-01"
    }

    data = {
        "model": "claude-3-5-haiku-20241022",
        "max_tokens": 1000,
        "temperature": 0,
        "system": "You are an AI assistant, you can help to extract claims from a model-generated response for a question. In addition, you should attribute the claims to the sentences followed by the sentence ids.",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt
                    }
                ]
            }
        ]
    }

    try:
        # pass SSL verification
        response = requests.post(
            url_anthropic,
            headers=headers,
            json=data,
            proxies=proxies,
            timeout=30,
        )

        if response.status_code == 200:
            result = response.json()
            return result['content'][0]['text'].split('\n')  # Return a line-separated list of claims.
        else:
            print(f"extract_triplets_to_claims Error: {response.status_code}")
            return []

    except requests.exceptions.RequestException as e:
        print("Request failed:", e)
        return []

## 2. Obtain Complete References

In [None]:
def get_reference_for_claim(claim):
    prompt = f"""
    Please expand, provide additional relevant factual information and verify about the following claim:
    Claims: {claim}

    If the claim is accurate, not the hallucination and complete, return the original claim.
    If the claim is inaccurate, partial or lacking detail, return a corrected, a more detailed and comprehensive factual statement.
    """
    headers = {
        "x-api-key": api_key,
        "content-type": "application/json",
        "anthropic-version": "2023-06-01"
    }

    data = {
        "model": "claude-3-5-haiku-20241022",
        "max_tokens": 1000,
        "temperature": 0,
        "system": "You are an AI assistant who checks the factual accuracy of claims and returns corrected references if necessary.",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt
                    }
                ]
            }
        ]
    }

    try:
        response = requests.post(
            url_anthropic,
            headers=headers,
            json=data,
            proxies=proxies,
            timeout=30,
            verify=False
        )

        if response.status_code == 200:
            result = response.json()
            return result['content'][0]['text']
        else:
            print(f"Error verifying claim: {response.status_code}")
            return claim

    except requests.exceptions.RequestException as e:
        print("Verification request failed:", e)
        return claim

In [None]:
def extract_and_get_references(claims, context):
    references = []
    for claim in claims:
        verified_reference = get_reference_for_claim(claim)
        references.append(verified_reference)

    final_reference = " ".join(references) + " " + context

    return final_reference

## 3. Validate claims, `model_input`, `model_output_text`, and References (Checker)

The validation results should be mapped back to the `model_output_text`, marking hallucination positions and probabilities, and outputting them as `soft_labels`.

In [None]:
def extract_hallucination_positions(model_output_text, hallucination_results):
    # parse JSON data
    try:
        hallucination_results = json.loads(hallucination_results)
    except json.JSONDecodeError:
        print("Failed to decode JSON. Returning empty labels.")
        return {"soft_labels": []}

    soft_labels = []

    # find the position in the original text
    for result in hallucination_results:
        word = result['word']
        prob = result['prob']

        start = 0
        while True:
            start = model_output_text.find(word, start)
            if start == -1:
                break
            end = start + len(word)

            # save soft_labels
            soft_labels.append({
                "start": start,
                "end": end,
                "prob": prob
            })
            start = end

    return {"soft_labels": soft_labels}


In [None]:
def triplets_and_references_checker(claims, model_output_text, references, question):
    prompt = f"""
    Evaluate the model output text for hallucinations by comparing it to the provided references, existed fact, claims, and question (model input). Identify any hallucinated or potentially inaccurate parts in the entire model output text. Highlight the hallucinated word and assign a probability of the hallucination word in the `model output text`.

    ### Question (Model Input)
    {question}

    ### Claims
    {claims}

    ### References
    {references}

    ### Model Output Text
    {model_output_text}

    ### Instructions
    1. Compare each claim with the provided references, question and existing fact (internal knowledge).
    2. If a claim cannot be fully supported by the references, identify the hallucinated words and mark it to `model output text`.
    3. Return character-level offsetss and assign hallucination probabilities.
    4. If the claim is fully supported, hallucination should not to be labeled.
    5. Assign hallucination probabilities based on the following criteria:
       - **0.7 - 1.0**: Fully fabricated or highly speculative content with no supporting evidence.
       - **0.4 - 0.7**: Partially incorrect or speculative content, but some evidence supports parts of the claim.
       - **0.1 - 0.4**: Minor inaccuracies, such as spelling errors, wrong formatting, or small factual deviations.
    6. Ensure that the hallucinated words do not overlap or repeat. If overlapping occurs, merge them or seperate them appropriately.
    7. Ensure the words are shown in the `model output text`.
    8. Highlight text in `model output text` that could potentially be a hallucination even if not explicitly listed in the claims.
    9. Return **all the hallucinated words or phrases** and assign each a hallucination probability (between 0 and 1).
    10. Do not filter out hallucinations based on low probability. Return results for any potential hallucination.
    11. Do not include any explanations, summaries, or additional text. **Return the JSON list directly.**
    12. Ensure all potential hallucinations are listed, even those with probabilities as low as 0.1.

    ### Output Example
    Only return results with all hallucinated words or phrases and their probability **strictly in the following JSON format**:
    [
        {{"word": <example_word>, "prob": <probability>}},
        {{"word": <another_word>, "prob": <probability>}}
    ]

    """

    headers = {
        "x-api-key": api_key,
        "content-type": "application/json",
        "anthropic-version": "2023-06-01"
    }

    data = {
        "model": "claude-3-5-haiku-20241022",
        "max_tokens": 1000,
        "temperature": 0,
        "system": "You are an AI assistant who checks the factual accuracy of claims and returns position and probability of the hallucination from model output text",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt
                    }
                ]
            }
        ]
    }

    try:
        response = requests.post(
            url_anthropic,
            headers=headers,
            json=data,
            proxies=proxies,
            timeout=30,
            verify=False
        )
        if response.status_code == 200:
            raw_labels = response.json()['content'][0]['text']
            return extract_hallucination_positions(model_output_text, raw_labels)
        else:
            print(f"Error during hallucination detection: {response.status_code}")
            return {"soft_labels": []}
    except requests.RequestException as e:
        print("Hallucination detection request failed:", e)
        return {"soft_labels": []}

## Main Logic

In [None]:
def hallucination_detect(question, model_output_text, context):
    claims = extract_triplets_to_claims(question, model_output_text)
    references = extract_and_get_references(claims, context)
    hallucination_results = triplets_and_references_checker(claims, model_output_text, references, question)
    # extract soft_labels
    soft_labels = hallucination_results.get("soft_labels", [])

    # compute hard_labels
    hard_labels = recompute_hard_labels(soft_labels)

    return soft_labels, hard_labels

## Apply on My Dataset

In [None]:
# process the dataset and save the results
def process_dataset(input_folder, output_folder):
    os.makedirs(output_folder, exist_ok=True)
    input_files = glob.glob(os.path.join(input_folder, "*.jsonl"))

    with tqdm(total=len(input_files), desc="Processing Files", unit="file") as file_progress:
        for file_path in input_files:
            with open(file_path, 'r', encoding='utf-8') as f:
                data = [json.loads(line) for line in f]

            output_data = []

            with tqdm(total=len(data), desc=f"Processing {os.path.basename(file_path)}", unit="entry", leave=False) as entry_progress:
                for entry in data:
                    try:
                        question = entry.get("model_input", "")
                        model_output_text = entry.get("model_output_text", "")
                        context = entry.get("context_googlecse", "")

                        soft_labels, hard_labels = hallucination_detect(
                            question, model_output_text, context
                        )

                        output_entry = {
                            "id": entry.get("id"),
                            "lang": entry.get("lang"),
                            "model_input": entry.get("model_input"),
                            "model_output_text": entry.get("model_output_text"),
                            "model_id": entry.get("model_id"),
                            "soft_labels": soft_labels,
                            "hard_labels": hard_labels,
                            "model_output_logits": entry.get("model_output_logits"),
                            "model_output_tokens": entry.get("model_output_tokens")
                        }

                        output_data.append(output_entry)

                    except Exception as e:
                        print(f"Error processing entry {entry.get('id')}: {e}")

                    entry_progress.update(1)

            output_file = os.path.join(output_folder, os.path.basename(file_path))
            with open(output_file, 'w', encoding='utf-8') as f:
                for item in output_data:
                    f.write(json.dumps(item, ensure_ascii=False) + '\n')

            file_progress.update(1)
            print(f"Processed and saved: {output_file}")

In [None]:
# jsonl files with context(Google API)
input_folder = "data/val/exknowledge/"
# output detected jsonl files
output_folder = "data/val/detect_2/"

process_dataset(input_folder, output_folder)

Processing Files:   0%|          | 0/10 [00:00<?, ?file/s]
Processing mushroom.ar-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.ar-val.v2.jsonl:   2%|▏         | 1/50 [00:14<12:02, 14.74s/entry][A
Processing mushroom.ar-val.v2.jsonl:   4%|▍         | 2/50 [00:33<13:54, 17.39s/entry][A
Processing mushroom.ar-val.v2.jsonl:   6%|▌         | 3/50 [01:13<21:37, 27.61s/entry][A
Processing mushroom.ar-val.v2.jsonl:   8%|▊         | 4/50 [01:28<17:18, 22.57s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  10%|█         | 5/50 [01:43<14:47, 19.71s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  12%|█▏        | 6/50 [02:42<24:21, 33.23s/entry][A
Processing mushroom.ar-val.v2.jsonl:  14%|█▍        | 7/50 [03:09<22:24, 31.27s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  16%|█▌        | 8/50 [03:23<17:51, 25.51s/entry][A
Processing mushroom.ar-val.v2.jsonl:  18%|█▊        | 9/50 [03:47<17:08, 25.08s/entry][A
Processing mushroom.ar-val.v2.jsonl:  20%|██        | 10/50 [03:59<14:09, 21.24s/entry][A
Processing mushroom.ar-val.v2.jsonl:  22%|██▏       | 11/50 [04:50<19:37, 30.19s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  24%|██▍       | 12/50 [05:13<17:51, 28.19s/entry][A
Processing mushroom.ar-val.v2.jsonl:  26%|██▌       | 13/50 [05:28<14:50, 24.06s/entry][A
Processing mushroom.ar-val.v2.jsonl:  28%|██▊       | 14/50 [05:39<12:08, 20.25s/entry][A
Processing mushroom.ar-val.v2.jsonl:  30%|███       | 15/50 [05:53<10:37, 18.22s/entry][A
Processing mushroom.ar-val.v2.jsonl:  32%|███▏      | 16/50 [06:06<09:27, 16.70s/entry][A
Processing mushroom.ar-val.v2.jsonl:  34%|███▍      | 17/50 [06:21<08:49, 16.05s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  36%|███▌      | 18/50 [06:33<07:57, 14.91s/entry][A
Processing mushroom.ar-val.v2.jsonl:  38%|███▊      | 19/50 [06:50<08:04, 15.63s/entry][A
Processing mushroom.ar-val.v2.jsonl:  40%|████      | 20/50 [07:10<08:27, 16.92s/entry][A
Processing mushroom.ar-val.v2.jsonl:  42%|████▏     | 21/50 [07:22<07:30, 15.52s/entry][A
Processing mushroom.ar-val.v2.jsonl:  44%|████▍     | 22/50 [07:36<07:00, 15.02s/entry][A
Processing mushroom.ar-val.v2.jsonl:  46%|████▌     | 23/50 [07:57<07:30, 16.69s/entry][A
Processing mushroom.ar-val.v2.jsonl:  48%|████▊     | 24/50 [08:14<07:14, 16.73s/entry][A
Processing mushroom.ar-val.v2.jsonl:  50%|█████     | 25/50 [08:32<07:06, 17.05s/entry][A
Processing mushroom.ar-val.v2.jsonl:  52%|█████▏    | 26/50 [09:03<08:30, 21.26s/entry][A
Processing mushroom.ar-val.v2.jsonl:  54%|█████▍    | 27/50 [09:17<07:22, 19.25s/entry][A
Processing mushroom.ar-val.v2.jsonl:  56%|█████▌    | 28/50 [10:10<10:44, 29.28s/entry][

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  58%|█████▊    | 29/50 [10:22<08:24, 24.02s/entry][A
Processing mushroom.ar-val.v2.jsonl:  60%|██████    | 30/50 [10:32<06:40, 20.03s/entry][A
Processing mushroom.ar-val.v2.jsonl:  62%|██████▏   | 31/50 [10:58<06:51, 21.67s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  64%|██████▍   | 32/50 [11:39<08:15, 27.52s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  66%|██████▌   | 33/50 [12:12<08:15, 29.13s/entry][A
Processing mushroom.ar-val.v2.jsonl:  68%|██████▊   | 34/50 [12:44<07:58, 29.93s/entry][A
Processing mushroom.ar-val.v2.jsonl:  70%|███████   | 35/50 [13:04<06:45, 27.04s/entry][A
Processing mushroom.ar-val.v2.jsonl:  72%|███████▏  | 36/50 [13:46<07:20, 31.45s/entry][A
Processing mushroom.ar-val.v2.jsonl:  74%|███████▍  | 37/50 [14:04<05:58, 27.60s/entry][A
Processing mushroom.ar-val.v2.jsonl:  76%|███████▌  | 38/50 [14:18<04:42, 23.57s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  78%|███████▊  | 39/50 [14:34<03:52, 21.15s/entry][A
Processing mushroom.ar-val.v2.jsonl:  80%|████████  | 40/50 [14:50<03:15, 19.55s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  82%|████████▏ | 41/50 [15:03<02:39, 17.74s/entry][A
Processing mushroom.ar-val.v2.jsonl:  84%|████████▍ | 42/50 [15:15<02:08, 16.04s/entry][A
Processing mushroom.ar-val.v2.jsonl:  86%|████████▌ | 43/50 [16:29<03:53, 33.34s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  88%|████████▊ | 44/50 [16:50<02:58, 29.70s/entry][A
Processing mushroom.ar-val.v2.jsonl:  90%|█████████ | 45/50 [17:18<02:25, 29.05s/entry][A
Processing mushroom.ar-val.v2.jsonl:  92%|█████████▏| 46/50 [17:32<01:37, 24.44s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.ar-val.v2.jsonl:  94%|█████████▍| 47/50 [17:46<01:04, 21.53s/entry][A
Processing mushroom.ar-val.v2.jsonl:  96%|█████████▌| 48/50 [18:11<00:44, 22.41s/entry][A
Processing mushroom.ar-val.v2.jsonl:  98%|█████████▊| 49/50 [18:27<00:20, 20.63s/entry][A
Processing mushroom.ar-val.v2.jsonl: 100%|██████████| 50/50 [18:42<00:00, 19.02s/entry][A
Processing Files:  10%|█         | 1/10 [18:43<2:48:27, 1123.01s/file]                 [A

Failed to decode JSON. Returning empty labels.
Processed and saved: data/val/detect_2/mushroom.ar-val.v2.jsonl



Processing mushroom.de-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.de-val.v2.jsonl:   2%|▏         | 1/50 [00:11<09:26, 11.57s/entry][A
Processing mushroom.de-val.v2.jsonl:   4%|▍         | 2/50 [00:28<11:50, 14.80s/entry][A
Processing mushroom.de-val.v2.jsonl:   6%|▌         | 3/50 [00:39<10:02, 12.81s/entry][A
Processing mushroom.de-val.v2.jsonl:   8%|▊         | 4/50 [00:49<09:08, 11.92s/entry][A
Processing mushroom.de-val.v2.jsonl:  10%|█         | 5/50 [01:01<08:52, 11.84s/entry][A
Processing mushroom.de-val.v2.jsonl:  12%|█▏        | 6/50 [01:13<08:50, 12.05s/entry][A
Processing mushroom.de-val.v2.jsonl:  14%|█▍        | 7/50 [01:30<09:45, 13.61s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  16%|█▌        | 8/50 [01:42<09:06, 13.01s/entry][A
Processing mushroom.de-val.v2.jsonl:  18%|█▊        | 9/50 [02:07<11:27, 16.76s/entry][A
Processing mushroom.de-val.v2.jsonl:  20%|██        | 10/50 [02:18<10:05, 15.14s/entry][A
Processing mushroom.de-val.v2.jsonl:  22%|██▏       | 11/50 [02:31<09:22, 14.42s/entry][A
Processing mushroom.de-val.v2.jsonl:  24%|██▍       | 12/50 [03:06<13:01, 20.56s/entry][A
Processing mushroom.de-val.v2.jsonl:  26%|██▌       | 13/50 [03:31<13:30, 21.90s/entry][A
Processing mushroom.de-val.v2.jsonl:  28%|██▊       | 14/50 [03:46<11:54, 19.85s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  30%|███       | 15/50 [03:59<10:23, 17.81s/entry][A
Processing mushroom.de-val.v2.jsonl:  32%|███▏      | 16/50 [04:26<11:39, 20.58s/entry][A
Processing mushroom.de-val.v2.jsonl:  34%|███▍      | 17/50 [04:50<11:54, 21.65s/entry][A
Processing mushroom.de-val.v2.jsonl:  36%|███▌      | 18/50 [05:14<11:56, 22.38s/entry][A
Processing mushroom.de-val.v2.jsonl:  38%|███▊      | 19/50 [05:27<10:05, 19.55s/entry][A
Processing mushroom.de-val.v2.jsonl:  40%|████      | 20/50 [06:44<18:20, 36.68s/entry][A
Processing mushroom.de-val.v2.jsonl:  42%|████▏     | 21/50 [08:54<31:15, 64.67s/entry][A
Processing mushroom.de-val.v2.jsonl:  44%|████▍     | 22/50 [09:12<23:43, 50.84s/entry][A
Processing mushroom.de-val.v2.jsonl:  46%|████▌     | 23/50 [09:26<17:51, 39.70s/entry][A
Processing mushroom.de-val.v2.jsonl:  48%|████▊     | 24/50 [09:56<15:53, 36.67s/entry][A
Processing mushroom.de-val.v2.jsonl:  50%|█████     | 25/50 [10:09<12:25, 29.80s/entry][

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  52%|█████▏    | 26/50 [10:21<09:41, 24.22s/entry][A
Processing mushroom.de-val.v2.jsonl:  54%|█████▍    | 27/50 [10:42<08:57, 23.36s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  56%|█████▌    | 28/50 [11:00<07:59, 21.78s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  58%|█████▊    | 29/50 [11:12<06:35, 18.83s/entry][A
Processing mushroom.de-val.v2.jsonl:  60%|██████    | 30/50 [11:37<06:54, 20.70s/entry][A
Processing mushroom.de-val.v2.jsonl:  62%|██████▏   | 31/50 [12:05<07:16, 22.96s/entry][A
Processing mushroom.de-val.v2.jsonl:  64%|██████▍   | 32/50 [12:20<06:10, 20.61s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  66%|██████▌   | 33/50 [12:34<05:14, 18.49s/entry][A
Processing mushroom.de-val.v2.jsonl:  68%|██████▊   | 34/50 [13:10<06:22, 23.88s/entry][A
Processing mushroom.de-val.v2.jsonl:  70%|███████   | 35/50 [13:23<05:06, 20.41s/entry][A
Processing mushroom.de-val.v2.jsonl:  72%|███████▏  | 36/50 [13:55<05:36, 24.06s/entry][A
Processing mushroom.de-val.v2.jsonl:  74%|███████▍  | 37/50 [14:15<04:55, 22.70s/entry][A
Processing mushroom.de-val.v2.jsonl:  76%|███████▌  | 38/50 [14:32<04:11, 20.92s/entry][A
Processing mushroom.de-val.v2.jsonl:  78%|███████▊  | 39/50 [14:55<03:57, 21.58s/entry][A
Processing mushroom.de-val.v2.jsonl:  80%|████████  | 40/50 [15:24<03:57, 23.77s/entry][A
Processing mushroom.de-val.v2.jsonl:  82%|████████▏ | 41/50 [16:08<04:28, 29.85s/entry][A
Processing mushroom.de-val.v2.jsonl:  84%|████████▍ | 42/50 [16:28<03:35, 26.96s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl:  86%|████████▌ | 43/50 [17:11<03:43, 31.92s/entry][A
Processing mushroom.de-val.v2.jsonl:  88%|████████▊ | 44/50 [17:26<02:41, 26.86s/entry][A
Processing mushroom.de-val.v2.jsonl:  90%|█████████ | 45/50 [17:46<02:03, 24.80s/entry][A
Processing mushroom.de-val.v2.jsonl:  92%|█████████▏| 46/50 [17:58<01:23, 20.91s/entry][A
Processing mushroom.de-val.v2.jsonl:  94%|█████████▍| 47/50 [18:10<00:54, 18.14s/entry][A
Processing mushroom.de-val.v2.jsonl:  96%|█████████▌| 48/50 [18:29<00:36, 18.30s/entry][A
Processing mushroom.de-val.v2.jsonl:  98%|█████████▊| 49/50 [19:58<00:39, 39.73s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.de-val.v2.jsonl: 100%|██████████| 50/50 [20:23<00:00, 35.09s/entry][A
Processing Files:  20%|██        | 2/10 [39:06<2:37:34, 1181.84s/file]                 [A

Processed and saved: data/val/detect_2/mushroom.de-val.v2.jsonl



Processing mushroom.en-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.en-val.v2.jsonl:   2%|▏         | 1/50 [00:13<11:18, 13.85s/entry][A
Processing mushroom.en-val.v2.jsonl:   4%|▍         | 2/50 [00:26<10:39, 13.32s/entry][A
Processing mushroom.en-val.v2.jsonl:   6%|▌         | 3/50 [00:45<12:20, 15.75s/entry][A
Processing mushroom.en-val.v2.jsonl:   8%|▊         | 4/50 [00:58<11:17, 14.73s/entry][A
Processing mushroom.en-val.v2.jsonl:  10%|█         | 5/50 [01:10<10:18, 13.74s/entry][A
Processing mushroom.en-val.v2.jsonl:  12%|█▏        | 6/50 [01:56<18:07, 24.72s/entry][A
Processing mushroom.en-val.v2.jsonl:  14%|█▍        | 7/50 [02:18<17:04, 23.81s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  16%|█▌        | 8/50 [02:30<14:00, 20.02s/entry][A
Processing mushroom.en-val.v2.jsonl:  18%|█▊        | 9/50 [02:45<12:42, 18.59s/entry][A
Processing mushroom.en-val.v2.jsonl:  20%|██        | 10/50 [03:00<11:35, 17.39s/entry][A
Processing mushroom.en-val.v2.jsonl:  22%|██▏       | 11/50 [03:13<10:22, 15.95s/entry][A
Processing mushroom.en-val.v2.jsonl:  24%|██▍       | 12/50 [03:27<09:49, 15.51s/entry][A
Processing mushroom.en-val.v2.jsonl:  26%|██▌       | 13/50 [03:39<08:52, 14.39s/entry][A
Processing mushroom.en-val.v2.jsonl:  28%|██▊       | 14/50 [04:04<10:36, 17.68s/entry][A
Processing mushroom.en-val.v2.jsonl:  30%|███       | 15/50 [04:18<09:36, 16.46s/entry][A
Processing mushroom.en-val.v2.jsonl:  32%|███▏      | 16/50 [05:08<15:03, 26.57s/entry][A
Processing mushroom.en-val.v2.jsonl:  34%|███▍      | 17/50 [05:40<15:29, 28.15s/entry][A
Processing mushroom.en-val.v2.jsonl:  36%|███▌      | 18/50 [07:59<32:52, 61.63s/entry][A


Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  42%|████▏     | 21/50 [11:21<34:12, 70.77s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  44%|████▍     | 22/50 [12:05<29:22, 62.95s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  46%|████▌     | 23/50 [12:33<23:32, 52.31s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.en-val.v2.jsonl:  48%|████▊     | 24/50 [13:39<24:26, 56.39s/entry][A
Processing mushroom.en-val.v2.jsonl:  50%|█████     | 25/50 [13:53<18:14, 43.79s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  52%|█████▏    | 26/50 [14:37<17:28, 43.69s/entry][A
Processing mushroom.en-val.v2.jsonl:  54%|█████▍    | 27/50 [15:23<17:07, 44.67s/entry][A
Processing mushroom.en-val.v2.jsonl:  56%|█████▌    | 28/50 [16:40<19:54, 54.30s/entry][A
Processing mushroom.en-val.v2.jsonl:  58%|█████▊    | 29/50 [17:25<17:58, 51.37s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  60%|██████    | 30/50 [18:08<16:16, 48.82s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))



Processing mushroom.en-val.v2.jsonl:  62%|██████▏   | 31/50 [19:10<16:43, 52.79s/entry][A
Processing mushroom.en-val.v2.jsonl:  64%|██████▍   | 32/50 [20:16<17:03, 56.88s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  66%|██████▌   | 33/50 [21:07<15:37, 55.15s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  68%|██████▊   | 34/50 [22:00<14:32, 54.53s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.en-val.v2.jsonl:  70%|███████   | 35/50 [22:20<11:00, 44.04s/entry][A
Processing mushroom.en-val.v2.jsonl:  72%|███████▏  | 36/50 [22:32<08:02, 34.43s/entry][A
Processing mushroom.en-val.v2.jsonl:  74%|███████▍  | 37/50 [22:44<06:01, 27.83s/entry][A
Processing mushroom.en-val.v2.jsonl:  76%|███████▌  | 38/50 [22:57<04:38, 23.17s/entry][A
Processing mushroom.en-val.v2.jsonl:  78%|███████▊  | 39/50 [23:08<03:37, 19.77s/entry][A
Processing mushroom.en-val.v2.jsonl:  80%|████████  | 40/50 [23:22<02:57, 17.77s/entry][A
Processing mushroom.en-val.v2.jsonl:  82%|████████▏ | 41/50 [23:34<02:25, 16.17s/entry][A
Processing mushroom.en-val.v2.jsonl:  84%|████████▍ | 42/50 [23:50<02:07, 15.97s/entry][A
Processing mushroom.en-val.v2.jsonl:  86%|████████▌ | 43/50 [24:20<02:21, 20.22s/entry][A
Processing mushroom.en-val.v2.jsonl:  88%|████████▊ | 44/50 [24:50<02:18, 23.11s/entry][A
Processing mushroom.en-val.v2.jsonl:  90%|█████████ | 45/50 [25:11<01:53, 22.64s/entry][

Processed and saved: data/val/detect_2/mushroom.en-val.v2.jsonl



Processing mushroom.es-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.es-val.v2.jsonl:   2%|▏         | 1/50 [00:34<28:33, 34.98s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:   4%|▍         | 2/50 [00:47<17:18, 21.63s/entry][A
Processing mushroom.es-val.v2.jsonl:   6%|▌         | 3/50 [01:01<14:22, 18.35s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))



Processing mushroom.es-val.v2.jsonl:   8%|▊         | 4/50 [02:41<38:46, 50.58s/entry][A
Processing mushroom.es-val.v2.jsonl:  10%|█         | 5/50 [03:16<33:39, 44.88s/entry][A
Processing mushroom.es-val.v2.jsonl:  12%|█▏        | 6/50 [03:40<27:50, 37.96s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  14%|█▍        | 7/50 [04:51<34:44, 48.48s/entry][A
Processing mushroom.es-val.v2.jsonl:  16%|█▌        | 8/50 [05:35<33:06, 47.30s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  18%|█▊        | 9/50 [05:59<27:14, 39.88s/entry][A
Processing mushroom.es-val.v2.jsonl:  20%|██        | 10/50 [07:18<34:40, 52.02s/entry][A
Processing mushroom.es-val.v2.jsonl:  22%|██▏       | 11/50 [07:33<26:20, 40.52s/entry][A
Processing mushroom.es-val.v2.jsonl:  24%|██▍       | 12/50 [08:06<24:12, 38.23s/entry][A
Processing mushroom.es-val.v2.jsonl:  26%|██▌       | 13/50 [08:48<24:17, 39.40s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.es-val.v2.jsonl:  28%|██▊       | 14/50 [10:10<31:20, 52.22s/entry][A
Processing mushroom.es-val.v2.jsonl:  30%|███       | 15/50 [10:46<27:44, 47.56s/entry][A
Processing mushroom.es-val.v2.jsonl:  32%|███▏      | 16/50 [11:09<22:44, 40.14s/entry][A
Processing mushroom.es-val.v2.jsonl:  34%|███▍      | 17/50 [11:21<17:19, 31.51s/entry][A
Processing mushroom.es-val.v2.jsonl:  36%|███▌      | 18/50 [11:58<17:40, 33.13s/entry][A
Processing mushroom.es-val.v2.jsonl:  38%|███▊      | 19/50 [12:10<13:56, 26.99s/entry][A
Processing mushroom.es-val.v2.jsonl:  40%|████      | 20/50 [14:19<28:47, 57.58s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  42%|████▏     | 21/50 [15:09<26:41, 55.21s/entry][A
Processing mushroom.es-val.v2.jsonl:  44%|████▍     | 22/50 [15:36<21:54, 46.95s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  46%|████▌     | 23/50 [15:53<17:03, 37.91s/entry][A
Processing mushroom.es-val.v2.jsonl:  48%|████▊     | 24/50 [17:05<20:52, 48.16s/entry][A
Processing mushroom.es-val.v2.jsonl:  50%|█████     | 25/50 [17:34<17:38, 42.34s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  52%|█████▏    | 26/50 [20:31<33:07, 82.82s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  54%|█████▍    | 27/50 [20:51<24:29, 63.90s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  56%|█████▌    | 28/50 [21:17<19:17, 52.62s/entry][A
Processing mushroom.es-val.v2.jsonl:  58%|█████▊    | 29/50 [21:40<15:14, 43.54s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  60%|██████    | 30/50 [21:53<11:26, 34.33s/entry][A
Processing mushroom.es-val.v2.jsonl:  62%|██████▏   | 31/50 [24:51<24:32, 77.49s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  64%|██████▍   | 32/50 [29:54<43:33, 145.18s/entry][A
Processing mushroom.es-val.v2.jsonl:  66%|██████▌   | 33/50 [30:09<30:05, 106.19s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  68%|██████▊   | 34/50 [33:34<36:12, 135.80s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  70%|███████   | 35/50 [34:44<29:01, 116.10s/entry][A
Processing mushroom.es-val.v2.jsonl:  72%|███████▏  | 36/50 [35:17<21:15, 91.07s/entry] [A
Processing mushroom.es-val.v2.jsonl:  74%|███████▍  | 37/50 [36:01<16:40, 76.96s/entry][A
Processing mushroom.es-val.v2.jsonl:  76%|███████▌  | 38/50 [38:11<18:34, 92.87s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  78%|███████▊  | 39/50 [39:42<16:55, 92.30s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  80%|████████  | 40/50 [39:59<11:37, 69.71s/entry][A
Processing mushroom.es-val.v2.jsonl:  82%|████████▏ | 41/50 [40:15<08:02, 53.60s/entry][A
Processing mushroom.es-val.v2.jsonl:  84%|████████▍ | 42/50 [40:34<05:46, 43.34s/entry][A
Processing mushroom.es-val.v2.jsonl:  86%|████████▌ | 43/50 [43:26<09:33, 81.99s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  88%|████████▊ | 44/50 [44:26<07:30, 75.16s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  90%|█████████ | 45/50 [45:50<06:29, 77.96s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  92%|█████████▏| 46/50 [46:03<03:53, 58.29s/entry][A
Processing mushroom.es-val.v2.jsonl:  94%|█████████▍| 47/50 [46:14<02:12, 44.28s/entry][A
Processing mushroom.es-val.v2.jsonl:  96%|█████████▌| 48/50 [47:46<01:56, 58.48s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.es-val.v2.jsonl:  98%|█████████▊| 49/50 [49:28<01:11, 71.71s/entry][A
Processing mushroom.es-val.v2.jsonl: 100%|██████████| 50/50 [49:54<00:00, 57.95s/entry][A
Processing Files:  40%|████      | 4/10 [1:55:17<3:20:40, 2006.75s/file]               [A

Processed and saved: data/val/detect_2/mushroom.es-val.v2.jsonl



Processing mushroom.fi-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.fi-val.v2.jsonl:   2%|▏         | 1/50 [00:20<16:41, 20.44s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:   4%|▍         | 2/50 [00:43<17:43, 22.15s/entry][A
Processing mushroom.fi-val.v2.jsonl:   6%|▌         | 3/50 [01:21<22:46, 29.08s/entry][A
Processing mushroom.fi-val.v2.jsonl:   8%|▊         | 4/50 [01:34<17:38, 23.02s/entry][A
Processing mushroom.fi-val.v2.jsonl:  10%|█         | 5/50 [02:58<33:34, 44.78s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  12%|█▏        | 6/50 [03:30<29:39, 40.45s/entry][A
Processing mushroom.fi-val.v2.jsonl:  14%|█▍        | 7/50 [04:06<28:00, 39.07s/entry][A
Processing mushroom.fi-val.v2.jsonl:  16%|█▌        | 8/50 [04:30<23:54, 34.15s/entry][A
Processing mushroom.fi-val.v2.jsonl:  18%|█▊        | 9/50 [04:51<20:36, 30.16s/entry][A
Processing mushroom.fi-val.v2.jsonl:  20%|██        | 10/50 [05:21<20:01, 30.04s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  22%|██▏       | 11/50 [06:18<24:58, 38.42s/entry][A
Processing mushroom.fi-val.v2.jsonl:  24%|██▍       | 12/50 [06:41<21:17, 33.62s/entry][A
Processing mushroom.fi-val.v2.jsonl:  26%|██▌       | 13/50 [07:04<18:50, 30.55s/entry][A
Processing mushroom.fi-val.v2.jsonl:  28%|██▊       | 14/50 [07:39<19:00, 31.69s/entry][A
Processing mushroom.fi-val.v2.jsonl:  30%|███       | 15/50 [08:12<18:45, 32.15s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  32%|███▏      | 16/50 [09:20<24:20, 42.95s/entry][A
Processing mushroom.fi-val.v2.jsonl:  34%|███▍      | 17/50 [09:37<19:17, 35.07s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fi-val.v2.jsonl:  36%|███▌      | 18/50 [11:24<30:18, 56.83s/entry][A
Processing mushroom.fi-val.v2.jsonl:  38%|███▊      | 19/50 [12:00<26:02, 50.41s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  40%|████      | 20/50 [12:30<22:10, 44.35s/entry][A
Processing mushroom.fi-val.v2.jsonl:  42%|████▏     | 21/50 [13:16<21:40, 44.84s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  44%|████▍     | 22/50 [13:39<17:52, 38.29s/entry][A
Processing mushroom.fi-val.v2.jsonl:  46%|████▌     | 23/50 [13:50<13:37, 30.27s/entry][A
Processing mushroom.fi-val.v2.jsonl:  48%|████▊     | 24/50 [14:19<12:51, 29.68s/entry][A
Processing mushroom.fi-val.v2.jsonl:  50%|█████     | 25/50 [14:42<11:34, 27.80s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  52%|█████▏    | 26/50 [15:26<13:04, 32.67s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  54%|█████▍    | 27/50 [15:50<11:27, 29.90s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  56%|█████▌    | 28/50 [16:35<12:37, 34.45s/entry][A
Processing mushroom.fi-val.v2.jsonl:  58%|█████▊    | 29/50 [17:08<11:57, 34.18s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  60%|██████    | 30/50 [17:51<12:17, 36.87s/entry][A
Processing mushroom.fi-val.v2.jsonl:  62%|██████▏   | 31/50 [18:40<12:45, 40.27s/entry][A
Processing mushroom.fi-val.v2.jsonl:  64%|██████▍   | 32/50 [19:07<10:57, 36.55s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  66%|██████▌   | 33/50 [19:21<08:22, 29.54s/entry][A
Processing mushroom.fi-val.v2.jsonl:  68%|██████▊   | 34/50 [19:38<06:53, 25.85s/entry][A
Processing mushroom.fi-val.v2.jsonl:  70%|███████   | 35/50 [20:16<07:24, 29.62s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  72%|███████▏  | 36/50 [21:44<10:58, 47.04s/entry][A
Processing mushroom.fi-val.v2.jsonl:  74%|███████▍  | 37/50 [23:05<12:23, 57.17s/entry][A
Processing mushroom.fi-val.v2.jsonl:  76%|███████▌  | 38/50 [23:27<09:19, 46.60s/entry][A
Processing mushroom.fi-val.v2.jsonl:  78%|███████▊  | 39/50 [23:39<06:41, 36.47s/entry][A
Processing mushroom.fi-val.v2.jsonl:  80%|████████  | 40/50 [24:32<06:54, 41.41s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  82%|████████▏ | 41/50 [25:14<06:12, 41.38s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  84%|████████▍ | 42/50 [26:14<06:16, 47.02s/entry][A
Processing mushroom.fi-val.v2.jsonl:  86%|████████▌ | 43/50 [26:38<04:40, 40.12s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl:  88%|████████▊ | 44/50 [26:50<03:09, 31.65s/entry][A
Processing mushroom.fi-val.v2.jsonl:  90%|█████████ | 45/50 [27:03<02:10, 26.02s/entry][A
Processing mushroom.fi-val.v2.jsonl:  92%|█████████▏| 46/50 [27:56<02:16, 34.16s/entry][A
Processing mushroom.fi-val.v2.jsonl:  94%|█████████▍| 47/50 [28:50<02:00, 40.24s/entry][A
Processing mushroom.fi-val.v2.jsonl:  96%|█████████▌| 48/50 [29:03<01:03, 31.90s/entry][A
Processing mushroom.fi-val.v2.jsonl:  98%|█████████▊| 49/50 [30:09<00:42, 42.16s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fi-val.v2.jsonl: 100%|██████████| 50/50 [31:22<00:00, 51.39s/entry][A
Processing Files:  50%|█████     | 5/10 [2:26:40<2:43:29, 1961.86s/file]               [A

Processed and saved: data/val/detect_2/mushroom.fi-val.v2.jsonl



Processing mushroom.fr-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.fr-val.v2.jsonl:   2%|▏         | 1/50 [01:58<1:36:54, 118.67s/entry][A
Processing mushroom.fr-val.v2.jsonl:   4%|▍         | 2/50 [02:15<46:52, 58.59s/entry]   [A
Processing mushroom.fr-val.v2.jsonl:   6%|▌         | 3/50 [03:07<43:28, 55.50s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:   8%|▊         | 4/50 [03:29<32:36, 42.53s/entry][A
Processing mushroom.fr-val.v2.jsonl:  10%|█         | 5/50 [06:09<1:03:33, 84.73s/entry][A
Processing mushroom.fr-val.v2.jsonl:  12%|█▏        | 6/50 [06:57<53:06, 72.42s/entry]  [A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  14%|█▍        | 7/50 [07:10<37:48, 52.76s/entry][A
Processing mushroom.fr-val.v2.jsonl:  16%|█▌        | 8/50 [07:57<35:38, 50.93s/entry][A
Processing mushroom.fr-val.v2.jsonl:  18%|█▊        | 9/50 [08:24<29:46, 43.58s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  20%|██        | 10/50 [08:51<25:42, 38.56s/entry][A
Processing mushroom.fr-val.v2.jsonl:  22%|██▏       | 11/50 [09:57<30:32, 46.98s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  24%|██▍       | 12/50 [11:19<36:25, 57.51s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  26%|██▌       | 13/50 [11:59<32:06, 52.07s/entry][A
Processing mushroom.fr-val.v2.jsonl:  28%|██▊       | 14/50 [12:27<26:54, 44.85s/entry][A
Processing mushroom.fr-val.v2.jsonl:  30%|███       | 15/50 [12:48<21:57, 37.65s/entry][A
Processing mushroom.fr-val.v2.jsonl:  32%|███▏      | 16/50 [13:18<20:01, 35.34s/entry][A
Processing mushroom.fr-val.v2.jsonl:  34%|███▍      | 17/50 [13:37<16:51, 30.65s/entry][A
Processing mushroom.fr-val.v2.jsonl:  36%|███▌      | 18/50 [14:06<15:58, 29.97s/entry][A
Processing mushroom.fr-val.v2.jsonl:  38%|███▊      | 19/50 [14:45<16:54, 32.73s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  40%|████      | 20/50 [15:27<17:50, 35.69s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  42%|████▏     | 21/50 [15:54<15:51, 32.82s/entry][A
Processing mushroom.fr-val.v2.jsonl:  44%|████▍     | 22/50 [16:26<15:17, 32.78s/entry][A
Processing mushroom.fr-val.v2.jsonl:  46%|████▌     | 23/50 [16:42<12:27, 27.68s/entry][A
Processing mushroom.fr-val.v2.jsonl:  48%|████▊     | 24/50 [17:38<15:42, 36.23s/entry][A

Hallucination detection request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))
Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fr-val.v2.jsonl:  50%|█████     | 25/50 [18:20<15:46, 37.87s/entry][A
Processing mushroom.fr-val.v2.jsonl:  52%|█████▏    | 26/50 [20:15<24:21, 60.89s/entry][A
Processing mushroom.fr-val.v2.jsonl:  54%|█████▍    | 27/50 [21:15<23:18, 60.79s/entry][A
Processing mushroom.fr-val.v2.jsonl:  56%|█████▌    | 28/50 [21:59<20:26, 55.74s/entry][A

Error verifying claim: 502



Processing mushroom.fr-val.v2.jsonl:  58%|█████▊    | 29/50 [23:26<22:47, 65.10s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fr-val.v2.jsonl:  60%|██████    | 30/50 [25:04<24:59, 74.97s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  62%|██████▏   | 31/50 [25:37<19:43, 62.31s/entry][A
Processing mushroom.fr-val.v2.jsonl:  64%|██████▍   | 32/50 [25:51<14:21, 47.87s/entry][A
Processing mushroom.fr-val.v2.jsonl:  66%|██████▌   | 33/50 [28:03<20:44, 73.20s/entry][A

Hallucination detection request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fr-val.v2.jsonl:  68%|██████▊   | 34/50 [29:02<18:21, 68.86s/entry][A
Processing mushroom.fr-val.v2.jsonl:  70%|███████   | 35/50 [29:43<15:07, 60.51s/entry][A
Processing mushroom.fr-val.v2.jsonl:  72%|███████▏  | 36/50 [30:33<13:21, 57.27s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  74%|███████▍  | 37/50 [32:13<15:14, 70.31s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  76%|███████▌  | 38/50 [32:37<11:16, 56.36s/entry][A
Processing mushroom.fr-val.v2.jsonl:  78%|███████▊  | 39/50 [33:37<10:29, 57.26s/entry][A
Processing mushroom.fr-val.v2.jsonl:  80%|████████  | 40/50 [34:11<08:23, 50.30s/entry][A
Processing mushroom.fr-val.v2.jsonl:  82%|████████▏ | 41/50 [34:44<06:46, 45.19s/entry][A
Processing mushroom.fr-val.v2.jsonl:  84%|████████▍ | 42/50 [35:10<05:15, 39.45s/entry][A

Failed to decode JSON. Returning empty labels.
Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fr-val.v2.jsonl:  86%|████████▌ | 43/50 [35:44<04:25, 37.90s/entry][A
Processing mushroom.fr-val.v2.jsonl:  88%|████████▊ | 44/50 [36:39<04:18, 43.01s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.fr-val.v2.jsonl:  90%|█████████ | 45/50 [37:03<03:06, 37.29s/entry][A
Processing mushroom.fr-val.v2.jsonl:  92%|█████████▏| 46/50 [37:40<02:28, 37.06s/entry][A
Processing mushroom.fr-val.v2.jsonl:  94%|█████████▍| 47/50 [38:40<02:11, 43.90s/entry][A

Hallucination detection request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fr-val.v2.jsonl:  96%|█████████▌| 48/50 [38:55<01:10, 35.29s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.fr-val.v2.jsonl:  98%|█████████▊| 49/50 [40:11<00:47, 47.59s/entry][A
Processing mushroom.fr-val.v2.jsonl: 100%|██████████| 50/50 [40:31<00:00, 39.27s/entry][A
Processing Files:  60%|██████    | 6/10 [3:07:11<2:21:26, 2121.52s/file]               [A

Processed and saved: data/val/detect_2/mushroom.fr-val.v2.jsonl



Processing mushroom.hi-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.hi-val.v2.jsonl:   2%|▏         | 1/50 [00:20<16:32, 20.25s/entry][A
Processing mushroom.hi-val.v2.jsonl:   4%|▍         | 2/50 [00:37<14:41, 18.37s/entry][A
Processing mushroom.hi-val.v2.jsonl:   6%|▌         | 3/50 [00:52<13:25, 17.14s/entry][A
Processing mushroom.hi-val.v2.jsonl:   8%|▊         | 4/50 [01:07<12:21, 16.13s/entry][A

Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.hi-val.v2.jsonl:  10%|█         | 5/50 [01:54<20:33, 27.41s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  12%|█▏        | 6/50 [03:02<30:12, 41.19s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  14%|█▍        | 7/50 [03:44<29:34, 41.27s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  16%|█▌        | 8/50 [04:00<23:17, 33.27s/entry][A
Processing mushroom.hi-val.v2.jsonl:  18%|█▊        | 9/50 [04:22<20:25, 29.90s/entry][A
Processing mushroom.hi-val.v2.jsonl:  20%|██        | 10/50 [04:35<16:25, 24.63s/entry][A
Processing mushroom.hi-val.v2.jsonl:  22%|██▏       | 11/50 [04:51<14:08, 21.76s/entry][A
Processing mushroom.hi-val.v2.jsonl:  24%|██▍       | 12/50 [05:19<15:01, 23.72s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  26%|██▌       | 13/50 [05:34<12:58, 21.05s/entry][A
Processing mushroom.hi-val.v2.jsonl:  28%|██▊       | 14/50 [05:49<11:32, 19.24s/entry][A
Processing mushroom.hi-val.v2.jsonl:  30%|███       | 15/50 [06:12<12:00, 20.59s/entry][A
Processing mushroom.hi-val.v2.jsonl:  32%|███▏      | 16/50 [06:54<15:14, 26.89s/entry][A
Processing mushroom.hi-val.v2.jsonl:  34%|███▍      | 17/50 [07:16<13:54, 25.28s/entry][A
Processing mushroom.hi-val.v2.jsonl:  36%|███▌      | 18/50 [07:28<11:25, 21.41s/entry][A
Processing mushroom.hi-val.v2.jsonl:  38%|███▊      | 19/50 [07:41<09:41, 18.76s/entry][A
Processing mushroom.hi-val.v2.jsonl:  40%|████      | 20/50 [07:55<08:40, 17.37s/entry][A
Processing mushroom.hi-val.v2.jsonl:  42%|████▏     | 21/50 [08:13<08:32, 17.66s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CCCDE6B8E0>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))



Processing mushroom.hi-val.v2.jsonl:  44%|████▍     | 22/50 [08:52<11:13, 24.05s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  46%|████▌     | 23/50 [09:06<09:30, 21.15s/entry][A
Processing mushroom.hi-val.v2.jsonl:  48%|████▊     | 24/50 [09:19<08:02, 18.55s/entry][A
Processing mushroom.hi-val.v2.jsonl:  50%|█████     | 25/50 [09:35<07:24, 17.78s/entry][A
Processing mushroom.hi-val.v2.jsonl:  52%|█████▏    | 26/50 [09:52<07:04, 17.68s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  54%|█████▍    | 27/50 [10:36<09:49, 25.62s/entry][A

Hallucination detection request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.hi-val.v2.jsonl:  56%|█████▌    | 28/50 [11:01<09:19, 25.44s/entry][A
Processing mushroom.hi-val.v2.jsonl:  58%|█████▊    | 29/50 [11:49<11:16, 32.20s/entry][A

Hallucination detection request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.hi-val.v2.jsonl:  60%|██████    | 30/50 [12:12<09:46, 29.31s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))



Processing mushroom.hi-val.v2.jsonl:  62%|██████▏   | 31/50 [12:29<08:06, 25.58s/entry][A
Processing mushroom.hi-val.v2.jsonl:  64%|██████▍   | 32/50 [12:47<07:02, 23.45s/entry][A
Processing mushroom.hi-val.v2.jsonl:  66%|██████▌   | 33/50 [13:10<06:34, 23.22s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.hi-val.v2.jsonl:  68%|██████▊   | 34/50 [13:54<07:51, 29.46s/entry][A

Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.hi-val.v2.jsonl:  70%|███████   | 35/50 [14:32<07:58, 31.90s/entry][A
Processing mushroom.hi-val.v2.jsonl:  72%|███████▏  | 36/50 [14:50<06:28, 27.77s/entry][A
Processing mushroom.hi-val.v2.jsonl:  74%|███████▍  | 37/50 [15:27<06:37, 30.58s/entry][A
Processing mushroom.hi-val.v2.jsonl:  76%|███████▌  | 38/50 [15:42<05:12, 26.00s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  78%|███████▊  | 39/50 [16:25<05:42, 31.16s/entry][A
Processing mushroom.hi-val.v2.jsonl:  80%|████████  | 40/50 [16:39<04:19, 25.93s/entry][A
Processing mushroom.hi-val.v2.jsonl:  82%|████████▏ | 41/50 [16:53<03:21, 22.40s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  84%|████████▍ | 42/50 [17:12<02:51, 21.40s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  86%|████████▌ | 43/50 [17:33<02:27, 21.14s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  88%|████████▊ | 44/50 [17:56<02:11, 21.84s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.hi-val.v2.jsonl:  90%|█████████ | 45/50 [20:34<05:13, 62.69s/entry][A
Processing mushroom.hi-val.v2.jsonl:  92%|█████████▏| 46/50 [20:57<03:22, 50.56s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  94%|█████████▍| 47/50 [21:10<01:58, 39.40s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  96%|█████████▌| 48/50 [21:29<01:06, 33.28s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.hi-val.v2.jsonl:  98%|█████████▊| 49/50 [21:51<00:30, 30.05s/entry][A
Processing mushroom.hi-val.v2.jsonl: 100%|██████████| 50/50 [22:12<00:00, 27.06s/entry][A
Processing Files:  70%|███████   | 7/10 [3:29:23<1:33:10, 1863.43s/file]               [A

Processed and saved: data/val/detect_2/mushroom.hi-val.v2.jsonl



Processing mushroom.it-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.it-val.v2.jsonl:   2%|▏         | 1/50 [00:22<18:17, 22.40s/entry][A
Processing mushroom.it-val.v2.jsonl:   4%|▍         | 2/50 [01:11<30:41, 38.36s/entry][A
Processing mushroom.it-val.v2.jsonl:   6%|▌         | 3/50 [01:48<29:28, 37.63s/entry][A
Processing mushroom.it-val.v2.jsonl:   8%|▊         | 4/50 [02:02<21:42, 28.32s/entry][A
Processing mushroom.it-val.v2.jsonl:  10%|█         | 5/50 [02:35<22:23, 29.85s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.it-val.v2.jsonl:  12%|█▏        | 6/50 [04:19<40:25, 55.12s/entry][A
Processing mushroom.it-val.v2.jsonl:  14%|█▍        | 7/50 [04:44<32:25, 45.23s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  16%|█▌        | 8/50 [04:55<24:07, 34.46s/entry][A
Processing mushroom.it-val.v2.jsonl:  18%|█▊        | 9/50 [05:09<19:13, 28.12s/entry][A
Processing mushroom.it-val.v2.jsonl:  20%|██        | 10/50 [05:39<18:59, 28.48s/entry][A
Processing mushroom.it-val.v2.jsonl:  22%|██▏       | 11/50 [05:51<15:23, 23.67s/entry][A
Processing mushroom.it-val.v2.jsonl:  24%|██▍       | 12/50 [06:01<12:21, 19.50s/entry][A
Processing mushroom.it-val.v2.jsonl:  26%|██▌       | 13/50 [06:32<14:00, 22.72s/entry][A
Processing mushroom.it-val.v2.jsonl:  28%|██▊       | 14/50 [06:48<12:34, 20.97s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  30%|███       | 15/50 [07:45<18:33, 31.80s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  32%|███▏      | 16/50 [07:59<14:58, 26.44s/entry][A
Processing mushroom.it-val.v2.jsonl:  34%|███▍      | 17/50 [09:17<23:04, 41.96s/entry][A
Processing mushroom.it-val.v2.jsonl:  36%|███▌      | 18/50 [09:36<18:39, 34.98s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  38%|███▊      | 19/50 [09:56<15:46, 30.52s/entry][A
Processing mushroom.it-val.v2.jsonl:  40%|████      | 20/50 [10:12<12:57, 25.93s/entry][A
Processing mushroom.it-val.v2.jsonl:  42%|████▏     | 21/50 [10:24<10:35, 21.90s/entry][A
Processing mushroom.it-val.v2.jsonl:  44%|████▍     | 22/50 [10:41<09:35, 20.55s/entry][A
Processing mushroom.it-val.v2.jsonl:  46%|████▌     | 23/50 [10:54<08:11, 18.19s/entry][A
Processing mushroom.it-val.v2.jsonl:  48%|████▊     | 24/50 [11:07<07:07, 16.45s/entry][A
Processing mushroom.it-val.v2.jsonl:  50%|█████     | 25/50 [11:29<07:38, 18.34s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  52%|█████▏    | 26/50 [11:55<08:11, 20.46s/entry][A
Processing mushroom.it-val.v2.jsonl:  54%|█████▍    | 27/50 [12:06<06:48, 17.74s/entry][A
Processing mushroom.it-val.v2.jsonl:  56%|█████▌    | 28/50 [12:33<07:31, 20.53s/entry][A
Processing mushroom.it-val.v2.jsonl:  58%|█████▊    | 29/50 [13:06<08:27, 24.18s/entry][A
Processing mushroom.it-val.v2.jsonl:  60%|██████    | 30/50 [13:37<08:45, 26.28s/entry][A
Processing mushroom.it-val.v2.jsonl:  62%|██████▏   | 31/50 [14:20<09:57, 31.44s/entry][A
Processing mushroom.it-val.v2.jsonl:  64%|██████▍   | 32/50 [14:38<08:09, 27.17s/entry][A
Processing mushroom.it-val.v2.jsonl:  66%|██████▌   | 33/50 [14:53<06:42, 23.70s/entry][A
Processing mushroom.it-val.v2.jsonl:  68%|██████▊   | 34/50 [15:05<05:21, 20.11s/entry][A
Processing mushroom.it-val.v2.jsonl:  70%|███████   | 35/50 [15:23<04:51, 19.43s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  72%|███████▏  | 36/50 [15:45<04:42, 20.16s/entry][A
Processing mushroom.it-val.v2.jsonl:  74%|███████▍  | 37/50 [16:03<04:13, 19.50s/entry][A
Processing mushroom.it-val.v2.jsonl:  76%|███████▌  | 38/50 [16:19<03:43, 18.59s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  78%|███████▊  | 39/50 [17:30<06:16, 34.20s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  80%|████████  | 40/50 [18:03<05:37, 33.77s/entry][A
Processing mushroom.it-val.v2.jsonl:  82%|████████▏ | 41/50 [18:45<05:26, 36.25s/entry][A
Processing mushroom.it-val.v2.jsonl:  84%|████████▍ | 42/50 [19:14<04:33, 34.17s/entry][A
Processing mushroom.it-val.v2.jsonl:  86%|████████▌ | 43/50 [19:29<03:18, 28.43s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  88%|████████▊ | 44/50 [19:44<02:26, 24.39s/entry][A
Processing mushroom.it-val.v2.jsonl:  90%|█████████ | 45/50 [19:56<01:43, 20.64s/entry][A
Processing mushroom.it-val.v2.jsonl:  92%|█████████▏| 46/50 [20:18<01:24, 21.17s/entry][A
Processing mushroom.it-val.v2.jsonl:  94%|█████████▍| 47/50 [20:35<00:59, 19.78s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  96%|█████████▌| 48/50 [21:25<00:58, 29.07s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.it-val.v2.jsonl:  98%|█████████▊| 49/50 [21:56<00:29, 29.59s/entry][A
Processing mushroom.it-val.v2.jsonl: 100%|██████████| 50/50 [22:44<00:00, 34.97s/entry][A
Processing Files:  80%|████████  | 8/10 [3:52:07<56:49, 1704.54s/file]                 [A

Failed to decode JSON. Returning empty labels.
Processed and saved: data/val/detect_2/mushroom.it-val.v2.jsonl



Processing mushroom.sv-val.v2.jsonl:   0%|          | 0/49 [00:00<?, ?entry/s][A
Processing mushroom.sv-val.v2.jsonl:   2%|▏         | 1/49 [00:17<14:06, 17.65s/entry][A
Processing mushroom.sv-val.v2.jsonl:   4%|▍         | 2/49 [00:45<18:18, 23.36s/entry][A
Processing mushroom.sv-val.v2.jsonl:   6%|▌         | 3/49 [01:22<22:41, 29.59s/entry][A
Processing mushroom.sv-val.v2.jsonl:   8%|▊         | 4/49 [01:48<21:17, 28.39s/entry][A
Processing mushroom.sv-val.v2.jsonl:  10%|█         | 5/49 [02:05<17:49, 24.30s/entry][A
Processing mushroom.sv-val.v2.jsonl:  12%|█▏        | 6/49 [02:17<14:29, 20.22s/entry][A
Processing mushroom.sv-val.v2.jsonl:  14%|█▍        | 7/49 [02:34<13:25, 19.18s/entry][A
Processing mushroom.sv-val.v2.jsonl:  16%|█▋        | 8/49 [02:46<11:27, 16.76s/entry][A
Processing mushroom.sv-val.v2.jsonl:  18%|█▊        | 9/49 [03:07<12:05, 18.13s/entry][A
Processing mushroom.sv-val.v2.jsonl:  20%|██        | 10/49 [03:21<10:52, 16.74s/entry][A
Processing mushr

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.sv-val.v2.jsonl:  35%|███▍      | 17/49 [06:59<20:27, 38.37s/entry][A
Processing mushroom.sv-val.v2.jsonl:  37%|███▋      | 18/49 [07:36<19:37, 37.98s/entry][A
Processing mushroom.sv-val.v2.jsonl:  39%|███▉      | 19/49 [07:56<16:18, 32.61s/entry][A
Processing mushroom.sv-val.v2.jsonl:  41%|████      | 20/49 [08:17<14:04, 29.12s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.sv-val.v2.jsonl:  43%|████▎     | 21/49 [09:05<16:10, 34.67s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.sv-val.v2.jsonl:  45%|████▍     | 22/49 [09:50<17:01, 37.83s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.sv-val.v2.jsonl:  47%|████▋     | 23/49 [10:08<13:44, 31.71s/entry][A

Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))



Processing mushroom.sv-val.v2.jsonl:  49%|████▉     | 24/49 [10:18<10:34, 25.38s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.sv-val.v2.jsonl:  51%|█████     | 25/49 [10:33<08:55, 22.30s/entry][A
Processing mushroom.sv-val.v2.jsonl:  53%|█████▎    | 26/49 [11:31<12:37, 32.94s/entry][A
Processing mushroom.sv-val.v2.jsonl:  55%|█████▌    | 27/49 [11:43<09:47, 26.70s/entry][A
Processing mushroom.sv-val.v2.jsonl:  57%|█████▋    | 28/49 [11:58<08:04, 23.09s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.sv-val.v2.jsonl:  59%|█████▉    | 29/49 [12:14<06:59, 21.00s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.sv-val.v2.jsonl:  61%|██████    | 30/49 [12:32<06:21, 20.07s/entry][A
Processing mushroom.sv-val.v2.jsonl:  63%|██████▎   | 31/49 [12:46<05:30, 18.38s/entry][A
Processing mushroom.sv-val.v2.jsonl:  65%|██████▌   | 32/49 [12:58<04:35, 16.22s/entry][A

extract_triplets_to_claims Error: 529



Processing mushroom.sv-val.v2.jsonl:  67%|██████▋   | 33/49 [13:10<03:58, 14.93s/entry][A
Processing mushroom.sv-val.v2.jsonl:  69%|██████▉   | 34/49 [13:54<05:54, 23.66s/entry][A
Processing mushroom.sv-val.v2.jsonl:  71%|███████▏  | 35/49 [14:09<04:54, 21.05s/entry][A
Processing mushroom.sv-val.v2.jsonl:  73%|███████▎  | 36/49 [14:21<04:01, 18.57s/entry][A
Processing mushroom.sv-val.v2.jsonl:  76%|███████▌  | 37/49 [14:35<03:25, 17.16s/entry][A
Processing mushroom.sv-val.v2.jsonl:  78%|███████▊  | 38/49 [15:03<03:43, 20.34s/entry][A
Processing mushroom.sv-val.v2.jsonl:  80%|███████▉  | 39/49 [15:35<03:58, 23.87s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.sv-val.v2.jsonl:  82%|████████▏ | 40/49 [15:50<03:09, 21.09s/entry][A
Processing mushroom.sv-val.v2.jsonl:  84%|████████▎ | 41/49 [16:06<02:36, 19.56s/entry][A
Processing mushroom.sv-val.v2.jsonl:  86%|████████▌ | 42/49 [16:34<02:35, 22.23s/entry][A
Processing mushroom.sv-val.v2.jsonl:  88%|████████▊ | 43/49 [17:21<02:57, 29.58s/entry][A
Processing mushroom.sv-val.v2.jsonl:  90%|████████▉ | 44/49 [17:35<02:05, 25.07s/entry][A
Processing mushroom.sv-val.v2.jsonl:  92%|█████████▏| 45/49 [17:58<01:36, 24.18s/entry][A
Processing mushroom.sv-val.v2.jsonl:  94%|█████████▍| 46/49 [18:16<01:07, 22.59s/entry][A
Processing mushroom.sv-val.v2.jsonl:  96%|█████████▌| 47/49 [18:47<00:50, 25.02s/entry][A
Processing mushroom.sv-val.v2.jsonl:  98%|█████████▊| 48/49 [19:39<00:33, 33.01s/entry][A
Processing mushroom.sv-val.v2.jsonl: 100%|██████████| 49/49 [19:52<00:00, 27.13s/entry][A
Processing Files:  90%|█████████ | 9/10 [4:12:00<25:44, 1544.52s/file]                 [

Processed and saved: data/val/detect_2/mushroom.sv-val.v2.jsonl



Processing mushroom.zh-val.v2.jsonl:   0%|          | 0/50 [00:00<?, ?entry/s][A
Processing mushroom.zh-val.v2.jsonl:   2%|▏         | 1/50 [00:22<18:22, 22.50s/entry][A
Processing mushroom.zh-val.v2.jsonl:   4%|▍         | 2/50 [02:40<1:12:25, 90.53s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CCCDE63AC0>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyErro


Processing mushroom.zh-val.v2.jsonl:   6%|▌         | 3/50 [06:41<2:04:34, 159.04s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:   8%|▊         | 4/50 [12:06<2:52:11, 224.60s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  10%|█         | 5/50 [15:03<2:35:43, 207.63s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  12%|█▏        | 6/50 [17:35<2:18:11, 188.43s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  14%|█▍        | 7/50 [18:10<1:39:11, 138.40s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))



Processing mushroom.zh-val.v2.jsonl:  16%|█▌        | 8/50 [19:40<1:26:12, 123.16s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))



Processing mushroom.zh-val.v2.jsonl:  18%|█▊        | 9/50 [21:25<1:20:06, 117.23s/entry][A
Processing mushroom.zh-val.v2.jsonl:  20%|██        | 10/50 [23:10<1:15:48, 113.72s/entry][A
Processing mushroom.zh-val.v2.jsonl:  22%|██▏       | 11/50 [25:22<1:17:24, 119.08s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  24%|██▍       | 12/50 [29:01<1:34:45, 149.62s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  26%|██▌       | 13/50 [29:20<1:07:51, 110.05s/entry][A

Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  28%|██▊       | 14/50 [29:54<52:17, 87.14s/entry]   [A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out.



Processing mushroom.zh-val.v2.jsonl:  30%|███       | 15/50 [34:12<1:20:43, 138.39s/entry][A

Hallucination detection request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  32%|███▏      | 16/50 [34:51<1:01:29, 108.52s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  34%|███▍      | 17/50 [39:05<1:23:52, 152.50s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))



Processing mushroom.zh-val.v2.jsonl:  36%|███▌      | 18/50 [43:57<1:43:34, 194.21s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by SSLError(SSLZeroReturnError(6, 'TLS/SSL connection has been closed (EOF) (_ssl.c:1149)')))



Processing mushroom.zh-val.v2.jsonl:  38%|███▊      | 19/50 [48:39<1:54:02, 220.74s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  40%|████      | 20/50 [51:07<1:39:24, 198.80s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  42%|████▏     | 21/50 [51:22<1:09:27, 143.71s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  44%|████▍     | 22/50 [52:37<57:27, 123.14s/entry]  [A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  46%|████▌     | 23/50 [53:02<42:05, 93.53s/entry] [A
Processing mushroom.zh-val.v2.jsonl:  48%|████▊     | 24/50 [55:11<45:08, 104.16s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  50%|█████     | 25/50 [56:52<43:01, 103.26s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  52%|█████▏    | 26/50 [57:14<31:31, 78.79s/entry] [A
Processing mushroom.zh-val.v2.jsonl:  54%|█████▍    | 27/50 [58:17<28:28, 74.27s/entry][A

Failed to decode JSON. Returning empty labels.
Request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  56%|█████▌    | 28/50 [58:57<23:24, 63.83s/entry][A
Processing mushroom.zh-val.v2.jsonl:  58%|█████▊    | 29/50 [1:01:13<29:57, 85.59s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  60%|██████    | 30/50 [1:01:27<21:20, 64.01s/entry][A
Processing mushroom.zh-val.v2.jsonl:  62%|██████▏   | 31/50 [1:02:14<18:40, 58.98s/entry][A

Failed to decode JSON. Returning empty labels.
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)))
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Max retries exceeded with url: /v1/messages (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CCCDE28AC0>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。')))



Processing mushroom.zh-val.v2.jsonl:  64%|██████▍   | 32/50 [1:04:56<26:59, 89.97s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  66%|██████▌   | 33/50 [1:05:16<19:28, 68.75s/entry][A
Processing mushroom.zh-val.v2.jsonl:  68%|██████▊   | 34/50 [1:07:02<21:21, 80.07s/entry][A
Processing mushroom.zh-val.v2.jsonl:  70%|███████   | 35/50 [1:08:49<22:02, 88.18s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  72%|███████▏  | 36/50 [1:09:05<15:28, 66.33s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  74%|███████▍  | 37/50 [1:14:59<33:07, 152.87s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  76%|███████▌  | 38/50 [1:15:54<24:38, 123.25s/entry][A
Processing mushroom.zh-val.v2.jsonl:  78%|███████▊  | 39/50 [1:18:58<25:57, 141.63s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  80%|████████  | 40/50 [1:19:12<17:13, 103.33s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  82%|████████▏ | 41/50 [1:19:25<11:26, 76.27s/entry] [A
Processing mushroom.zh-val.v2.jsonl:  84%|████████▍ | 42/50 [1:24:23<19:01, 142.67s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)
Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out.



Processing mushroom.zh-val.v2.jsonl:  86%|████████▌ | 43/50 [1:26:43<16:34, 142.08s/entry][A

Verification request failed: HTTPSConnectionPool(host='api.anthropic.com', port=443): Read timed out. (read timeout=30)



Processing mushroom.zh-val.v2.jsonl:  88%|████████▊ | 44/50 [1:29:29<14:54, 149.09s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  90%|█████████ | 45/50 [1:34:38<16:24, 196.99s/entry][A

Failed to decode JSON. Returning empty labels.
Error verifying claim: 529
Error verifying claim: 529
Error verifying claim: 529



Processing mushroom.zh-val.v2.jsonl:  92%|█████████▏| 46/50 [1:36:16<11:10, 167.50s/entry][A

Error during hallucination detection: 529



Processing mushroom.zh-val.v2.jsonl:  94%|█████████▍| 47/50 [1:37:44<07:10, 143.51s/entry][A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl:  96%|█████████▌| 48/50 [1:39:44<04:32, 136.37s/entry][A
Processing mushroom.zh-val.v2.jsonl:  98%|█████████▊| 49/50 [1:39:57<01:39, 99.55s/entry] [A

Failed to decode JSON. Returning empty labels.



Processing mushroom.zh-val.v2.jsonl: 100%|██████████| 50/50 [1:40:55<00:00, 87.13s/entry][A
Processing Files: 100%|██████████| 10/10 [5:52:56<00:00, 2117.65s/file]                  [A

Processed and saved: data/val/detect_2/mushroom.zh-val.v2.jsonl





## Evaluation

In [None]:
import pandas as pd
import json
import os
from scorer import load_jsonl_file_to_records, score_iou, score_cor, main, recompute_hard_labels
import argparse as ap
import ast

In [None]:
def evaluate_iou_and_cor(val_dir, detect_dir, output_file):
    """
    Evaluate IoU and Spearman correlation between the reference (val) and detected (detect) files.

    :param val_dir: Directory containing the ground truth files (e.g., data/val/val/)
    :param detect_dir: Directory containing the detected files (e.g., data/detect/)
    :param output_file: Path to save the evaluation results (optional)
    """
    # List all files in the validation directory
    val_files = os.listdir(val_dir)
    detect_files = os.listdir(detect_dir)

    # Ensure that we are comparing the same files (same lang)
    for val_file in val_files:
        # Skip non-JSONL files
        if not val_file.endswith('.jsonl'):
            continue

        # Check if the corresponding detect file exists
        detect_file_path = os.path.join(detect_dir, val_file)

        if not os.path.exists(detect_file_path):
            print(f"Warning: {detect_file_path} not found, skipping.")
            continue

        # Load ground truth (val) and detected (detect) data
        ref_dicts = load_jsonl_file_to_records(os.path.join(val_dir, val_file))
        pred_dicts = load_jsonl_file_to_records(detect_file_path)

        # Calculate IoU and Spearman correlation
        try:
            ious, cors = main(ref_dicts, pred_dicts)
        except IndexError as e:
            print(f"IndexError occurred for file: {val_file}, skipping this file. Error: {e}")
            continue

        # Print or save the results
        print(f"Results for {val_file}:")
        print(f"  Mean IoU: {ious.mean():.8f}")
        print(f"  Mean Spearman Correlation: {cors.mean():.8f}")

        # Optionally, save the results to a file
        if output_file:
            with open(output_file, 'a', encoding='utf-8') as f:
                f.write(f"Results for {val_file}:\n")
                f.write(f"  Mean IoU: {ious.mean():.8f}\n")
                f.write(f"  Mean Spearman Correlation: {cors.mean():.8f}\n\n")

val_dir = 'data/val/val/'
detect_dir = 'data/val/detect_2/'
output_file = 'evaluation_results3.txt'
evaluate_iou_and_cor(val_dir, detect_dir, output_file)

Results for mushroom.ar-val.v2.jsonl:
  Mean IoU: 0.38188029
  Mean Spearman Correlation: 0.32331063
Results for mushroom.de-val.v2.jsonl:
  Mean IoU: 0.41459801
  Mean Spearman Correlation: 0.39725764
Results for mushroom.en-val.v2.jsonl:
  Mean IoU: 0.36303494
  Mean Spearman Correlation: 0.40384660
Results for mushroom.es-val.v2.jsonl:
  Mean IoU: 0.36105925
  Mean Spearman Correlation: 0.36397433
Results for mushroom.fi-val.v2.jsonl:
  Mean IoU: 0.31845450
  Mean Spearman Correlation: 0.33442559
Results for mushroom.fr-val.v2.jsonl:
  Mean IoU: 0.30727328
  Mean Spearman Correlation: 0.35717161
Results for mushroom.hi-val.v2.jsonl:
  Mean IoU: 0.39672349
  Mean Spearman Correlation: 0.42207792
Results for mushroom.it-val.v2.jsonl:
  Mean IoU: 0.29420150
  Mean Spearman Correlation: 0.35644268
Results for mushroom.sv-val.v2.jsonl:
  Mean IoU: 0.51249971
  Mean Spearman Correlation: 0.42565442
Results for mushroom.zh-val.v2.jsonl:
  Mean IoU: 0.23721623
  Mean Spearman Correlation: 0