# üîß AI Telco Troubleshooting Challenge - Track 3

**Question**: Can our fine-tuned LLM detect and explain unseen network failures? In other words, for the Track 3: Can we build a specialised edge-cloud LLM to troubleshoot network faults?

**Target**: Enhance the accuracy of Qwen2.5-1.5B-Instruct when answering telco troubleshooting questions in telelogs data.

**Objective**: Fine-tuning a lightweight LLM(Qwen2.5-1.5B-Instruct) to diagnose 5G network throughput issues from drive-test data.

### Installation

In [None]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9]{1,}\.[0-9]{1,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.33.post1" if v=="2.9" else "0.0.32.post2" if v=="2.8" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

### Unsloth

Initialize an Unsloth model here.

We will use **unsloth/Qwen2.5-1.5B-Instruct-unsloth-bnb-4bit** because Unsloth's [Dynamic 4-bit Quants](https://unsloth.ai/blog/dynamic-4bit) is selectively quantized, greatly improving accuracy over standard 4-bit.

In [None]:
from unsloth import FastLanguageModel
import torch

# Based on Fig. 5 in the research paper (question + answer + reasonning trace)
max_seq_length = 16384 # Choose any! Unsloth auto support RoPE Scaling internally!

dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-1.5B-Instruct-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.1.4: Fast Qwen2 patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/1.53G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/270 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2026.1.4 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Telelogs dataset from [netop](https://huggingface.co/datasets/netop/TeleLogs).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd
from datasets import Dataset

data_path = "/content/drive/MyDrive/The AI Telco Troubleshooting Challenge/data/sft_data.csv"

# Load CSV
df = pd.read_csv(data_path)

# Convert to Hugging Face Dataset
dataset = Dataset.from_pandas(df)

print("\nHugging Face Dataset features:")
print(dataset.features)


Hugging Face Dataset features:
{'ID': Value('string'), 'question': Value('string'), 'answer': Value('string'), 'reasoning_trace': Value('string')}


In [None]:
# Prompt template - stop at Reasoning so model learns to generate it
base_prompt = r"""Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Reasoning:
"""

# Target output format (what the model should generate)
target_format = r"""{}

### Response:
\boxed{{{}}}"""

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    questions = examples["question"]
    traces    = examples["reasoning_trace"]
    answers   = examples["answer"]

    texts = []
    for question, trace, answer in zip(questions, traces, answers):
        # Split question into instruction and input
        given_pos = question.find("Given:")
        if given_pos != -1:
            instruction = question[:given_pos].strip()
            input_text  = question[given_pos:].strip()
        else:
            instruction = question.strip()
            input_text  = ""

        # Prompt (what model sees)
        prompt = base_prompt.format(
            instruction,
            input_text
        )

        # Target (what model should generate)
        target = target_format.format(
            trace,
            answer
        )

        # Full training text
        text = prompt + target + EOS_TOKEN

        texts.append(text)

    return {"text": texts}


# Apply formatting
dataset = dataset.map(formatting_prompts_func, batched=True)

Map:   0%|          | 0/3342 [00:00<?, ? examples/s]

In [None]:
# Preview one sample
dataset[0]

{'ID': 'ID_A274LU3WJ6',
 'question': "Analyze the 5G wireless network drive-test user plane data and engineering parameters.\nIdentify the reason for the throughput dropping below 600Mbps in certain road sections.\nFrom the following 8 potential root causes, select the most likely one and enclose its number in \\boxed{{}} in the final answer.\n\nC1: The serving cell's downtilt angle is too large, causing weak coverage at the far end.\nC2: The serving cell's coverage distance exceeds 1km, resulting in over-shooting.\nC3: A neighboring cell provides higher throughput.\nC4: Non-colocated co-frequency neighboring cells cause severe overlapping coverage.\nC5: Frequent handovers degrade performance.\nC6: Neighbor cell and serving cell have the same PCI mod 30, leading to interference.\nC7: Test vehicle speed exceeds 40km/h, impacting user throughput.\nC8: Average scheduled RBs are below 160, affecting throughput.\n\nGiven:\n- The default electronic downtilt value is 255, representing a downt

<a name="Train"></a>
### Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

We choose 1 epoch for two main reason:
 - It helps reduce the risk of catastrophic forgetting when paired with a smaller low learning rate.
 - It gives the model a solid foundation by aligning its outputs with highquality labeled examples, narrowing down the output distribution to reasonable responses.

In [None]:
from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    packing = True,
    args = SFTConfig(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 0,
        num_train_epochs = 1,
        learning_rate = 1.5e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use TrackIO/WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=1):   0%|          | 0/3342 [00:00<?, ? examples/s]

ü¶• Unsloth: Padding-free auto-enabled, enabling faster training.


In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
8.928 GB of memory reserved.


In [None]:
# Let's train
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 3,342 | Num Epochs = 1 | Total steps = 209
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 18,464,768 of 1,562,179,072 (1.18% trained)


Step,Training Loss
1,1.0158
2,1.0068
3,1.0235
4,1.014
5,1.0135
6,0.9891
7,0.9854
8,0.9534
9,0.9162
10,0.9239


In [None]:
trainer_stats

In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

8528.1838 seconds used for training.
142.14 minutes used for training.
Peak reserved memory = 12.674 GB.
Peak reserved memory for training = 3.746 GB.
Peak reserved memory % of max memory = 85.978 %.
Peak reserved memory for training % of max memory = 25.412 %.


<a name="Evaluation"></a>
### Evaluation
Let's run the model! You can change the instruction and input - leave the output blank!



In [None]:
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
#  Quick Evaluation
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

from unsloth import FastLanguageModel
import torch
import random
import pandas as pd
import re
from typing import Tuple

# Enable fast inference
model = FastLanguageModel.for_inference(model)

# Load dataset (for evaluation)
df = pd.read_csv("/content/drive/MyDrive/The AI Telco Troubleshooting Challenge/data/train.csv")
print(f"Loaded {len(df)} examples from train.csv")

# ‚îÄ‚îÄ‚îÄ The exact prompt template used during training ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
base_prompt = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Reasoning:
"""

EOS_TOKEN = tokenizer.eos_token

# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
#  Robust answer extraction
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def extract_answer(generated: str, verbose: bool = False) -> Tuple[str, str]:
    """
    Enhanced extraction to handle varied formats: C1-C8, M1-M5, P1-P8, 1-8, A-I, etc.

    Returns: Tuple[str, str]: (predicted_answer, extraction_method_used)
    """
    text = generated.strip()

    if verbose:
        print(f"\n[DEBUG] Full generated text length: {len(text)} chars")

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # Priority 1: \boxed{...} - Most strict, most reliable
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    boxed_pattern = r'\\boxed\s*\{([^}]+?)\}'
    boxed_matches = list(re.finditer(boxed_pattern, text, re.IGNORECASE))

    if boxed_matches:
        # Check all boxed instances, prefer the last valid one
        for match in reversed(boxed_matches):
            content = match.group(1).strip()
            if verbose:
                print(f"[DEBUG] Found \\boxed{{{content}}}")

            # Flexible match: [A-Z]?[1-9] or [A-I] (upper case)
            code_match = re.search(r'\b([A-Z]?\d|[A-I])\b', content, re.IGNORECASE)
            if code_match:
                return code_match.group(1).upper(), "boxed"

            # Prefix like "Option B" or "2:"
            code_prefix = re.match(r'^\s*([A-Z]?\d|[A-I])\b', content, re.IGNORECASE)
            if code_prefix:
                return code_prefix.group(1).upper(), "boxed_prefix"

        if verbose:
            print(f"[DEBUG] Found \\boxed{{}} but no valid code inside")

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # Priority 2: ### Response: followed by code
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    response_pattern = r'###\s*Response\s*:\s*([^\n]+)'
    response_matches = re.findall(response_pattern, text, re.IGNORECASE)

    if response_matches:
        for block in reversed(response_matches):
            if verbose:
                print(f"[DEBUG] Found ### Response: {block.strip()}")

            code_match = re.search(r'\b([A-Z]?\d|[A-I])\b', block, re.IGNORECASE)
            if code_match:
                return code_match.group(1).upper(), "response_section"

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # Priority 3: <answer>...</answer> tags
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    answer_pattern = r'<answer>\s*([^<]+?)\s*</answer>'
    answer_blocks = re.findall(answer_pattern, text, re.IGNORECASE | re.DOTALL)

    if answer_blocks:
        for block in reversed(answer_blocks):
            if verbose:
                print(f"[DEBUG] Found <answer>{block.strip()}</answer>")

            code_match = re.search(r'\b([A-Z]?\d|[A-I])\b', block, re.IGNORECASE)
            if code_match:
                return code_match.group(1).upper(), "answer_tag"

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # Priority 4: Final answer keywords followed by code
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    final_keywords = [
        r'final\s+answer\s*:?\s*([A-Z]?\d|[A-I])\b',
        r'answer\s*:?\s*([A-Z]?\d|[A-I])\b',
        r'therefore\s*,?\s*(?:the\s+)?(?:answer\s+is\s*)?([A-Z]?\d|[A-I])\b',
        r'conclusion\s*:?\s*([A-Z]?\d|[A-I])\b',
        r'select\s*:?\s*([A-Z]?\d|[A-I])\b',
        r'choose\s*:?\s*([A-Z]?\d|[A-I])\b',
    ]

    for pattern in final_keywords:
        matches = list(re.finditer(pattern, text, re.IGNORECASE))
        if matches:
            last = matches[-1].group(1).upper()
            if verbose:
                print(f"[DEBUG] Keyword match: {last}")
            return last, "keyword"

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # Priority 5: Last sentence containing "<code> is" or "<code> because"
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    sentences = re.split(r'[.!?]\s+', text)
    for sentence in reversed(sentences):
        if re.search(r'\b([A-Z]?\d|[A-I])\s+(?:is|because|explains|accounts)\b', sentence, re.IGNORECASE):
            code_match = re.search(r'\b([A-Z]?\d|[A-I])\b', sentence, re.IGNORECASE)
            if code_match:
                return code_match.group(1).upper(), "contextual_sentence"

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # Priority 6: Last standalone code in text (weakest fallback)
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    code_matches = re.findall(r'\b([A-Z]?\d|[A-I])\b', text, re.IGNORECASE)

    if code_matches:
        last_code = code_matches[-1].upper()
        unique_codes = set(c.upper() for c in code_matches)

        if len(unique_codes) > 1 and verbose:
            print(f"[DEBUG] Multiple codes: {unique_codes}, using last: {last_code}")

        return last_code, "last_occurrence"

    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # No valid code found
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    if verbose:
        print("[WARNING] No valid code (e.g., C4, M2, 3, B) found in output")

    return "NOT_FOUND", "none"

# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
#  Generate using the exact same prompt format as training
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def predict_one(question: str, verbose: bool = False) -> Tuple[str, str, str]:
    # Same split logic as in your formatting function
    given_pos = question.find("Given:")
    if given_pos != -1:
        instruction = question[:given_pos].strip()
        input_text  = question[given_pos:].strip()
    else:
        instruction = question.strip()
        input_text = ""

    # Prepare prompt ‚Äî model should generate from "### Reasoning:"
    prompt = base_prompt.format(instruction, input_text)

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.7,
            top_p=0.95,
            do_sample=True,
            use_cache=True,
            pad_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.1,
            eos_token_id=tokenizer.eos_token_id,
        )

    # Decode **only** newly generated tokens
    generated_ids = outputs[0][inputs.input_ids.shape[1]:]
    generated = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

    predicted, method = extract_answer(generated, verbose=verbose)

    return generated, predicted, method

# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
#  Evaluation loop
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def evaluate_samples(n_samples: int = 8, seed: int = 42, verbose: bool = False):
    random.seed(seed)
    samples = df.sample(n_samples)

    correct = 0
    wrong = 0
    not_found = 0
    method_counts = {}

    print(f"\n{'‚ïê'*90}")
    print(f"Evaluating on {n_samples} random examples (seed={seed})")
    print(f"{'‚ïê'*90}\n")

    for idx, row in enumerate(samples.itertuples(), 1):
        q_id = row.ID
        question = row.question
        gt = row.answer.strip().upper()

        print(f"Example {idx}/{n_samples}  ID: {q_id}  GT: \033[92m{gt}\033[0m")

        generated, predicted, method = predict_one(question, verbose=verbose)

        method_counts[method] = method_counts.get(method, 0) + 1

        if predicted == "NOT_FOUND":
            not_found += 1
            status = "‚ùå NOT FOUND"
            color = "\033[93m"
        elif predicted == gt:
            correct += 1
            status = "‚úì CORRECT"
            color = "\033[92m"
        else:
            wrong += 1
            status = "‚úó WRONG"
            color = "\033[91m"

        print(f"Pred: {color}{predicted}\033[0m  [{method}] {status}")
        print("-"*80)

        # Show full generation for inspection
        preview_len = 600
        preview = generated[-preview_len:] if len(generated) > preview_len else generated
        print(f"Generation (tail):\n{preview}")
        print("-"*90 + "\n")

        if idx % 4 == 0:
            torch.cuda.empty_cache()

    # Summary
    total = n_samples
    print(f"\n{'‚ïê'*90}")
    print("SUMMARY")
    print(f"{'‚ïê'*90}")
    print(f"Correct:    {correct:2d} / {total}  ({correct/total:.1%})")
    print(f"Wrong:      {wrong:2d} / {total}  ({wrong/total:.1%})")
    print(f"Not found:  {not_found:2d} / {total}  ({not_found/total:.1%})")
    print("\nExtraction methods:")
    for m, cnt in sorted(method_counts.items(), key=lambda x: -x[1]):
        print(f"  {m:18} : {cnt:2d} ({cnt/total:.1%})")
    print(f"{'‚ïê'*90}")

# ‚îÄ‚îÄ Run evaluation ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

evaluate_samples(n_samples=10, seed=42, verbose=False)

Loaded 2400 examples from train.csv

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
Evaluating on 10 random examples (seed=42) ‚Äî new prompt format
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

Example 1/10  ID: ID_XM3UNZKQ3B  GT: [92mC2[0m
Pred: [91m3[0m  [last_occurrence] ‚úó WRONG
--------------------------------------------------------------------------------
Generation tail:
es, so it should be sufficient for the given distance.
   - Evidence: No significant drop in throughput when moving away from the serving cell.

2. **Over-shootin

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
from google.colab import userdata

user_token = userdata.get('HF_TOKEN')

# Local saving
model.save_pretrained("qwen2.5_1.5B_lora_model")
tokenizer.save_pretrained("qwen2.5_1.5B_lora_model")

# Online saving
model.push_to_hub("yehoshua00/qwen2.5_1.5B_lora_model", token = user_token)
tokenizer.push_to_hub("yehoshua00/qwen2.5_1.5B_lora_model", token = user_token)

README.md:   0%|          | 0.00/568 [00:00<?, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors:   0%|          | 45.7kB / 73.9MB            

Saved model to https://huggingface.co/yehoshua00/qwen2.5_1.5B_lora_model


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...lora_model/tokenizer.json: 100%|##########| 11.4MB / 11.4MB            

No files have been modified since last commit. Skipping to prevent empty commit.


### Saving to float16 for VLLM

We will also saving to `float16` directly and use `push_to_hub_merged` to upload to our Hugging Face account!

In [None]:
# Merge to 16bit
if True: model.save_pretrained_merged("Qwen2.5-RCA-1.5B", tokenizer, save_method = "merged_16bit",)
if True: model.push_to_hub_merged("yehoshua00/Qwen2.5-RCA-1.5B", tokenizer, save_method = "merged_16bit", token = user_token)

In [None]:
# NOTE: This will restart the session.
# The fine-tuned model is already pushed to Hugging Face under our account.
!pip install -q vllm transformers accelerate

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# %%writefile inference.py

# """
# Google Colab Compatible Submission Script
# For AI Telco Troubleshooting Challenge - Track 3
# Supports phase_1_test.csv and/or phase_2_test.csv
# Updated: January 2025 ‚Äì more robust answer extraction
# """

# import pandas as pd
# import torch
# from transformers import AutoModelForCausalLM, AutoTokenizer
# import re
# from tqdm.auto import tqdm
# import gc
# import os
# from typing import Tuple, List

# # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# #  Configuration
# # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

# MODEL_PATH = "yehoshua00/Qwen2.5-RCA-1.5B"

# BASE_DIR = "/content/drive/MyDrive/The AI Telco Troubleshooting Challenge/data"

# PATHS = {
#     "phase1": os.path.join(BASE_DIR, "phase_1_test.csv"),
#     "phase2": os.path.join(BASE_DIR, "phase_2_test.csv"),
#     "sample": os.path.join(BASE_DIR, "SampleSubmission.csv"),
#     "output": "/content/drive/MyDrive/submission.csv",
# }

# BASE_PROMPT = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

# ### Instruction:
# {}

# ### Input:


# ### Response:
# """

# # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# #  Robust Answer Extraction
# # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

# def extract_answer(generated: str, verbose: bool = False) -> Tuple[str, str]:
#     """
#     Enhanced extraction trying to find C1‚ÄìC8 / M1‚ÄìM5 / P1‚ÄìP8 / 1‚Äì8 / A‚ÄìI etc.
#     Returns (predicted_code, method_name)
#     """
#     text = generated.strip()

#     if verbose:
#         print(f"[DEBUG] Generated text length: {len(text)} chars")

#     # 1. \boxed{...}
#     boxed_pattern = r'\\boxed\s*(?:\{|\s*)([^}]+?)(?:\s*\}|$)'
#     boxed_matches = list(re.finditer(boxed_pattern, text, re.IGNORECASE))

#     if boxed_matches:
#         for m in reversed(boxed_matches):
#             content = m.group(1).strip()
#             if verbose:
#                 print(f"[DEBUG] \\boxed{{{content}}}")

#             # Try to find code-like pattern inside
#             code = re.search(r'\b([A-Z]?\d|[A-I])\b', content, re.IGNORECASE)
#             if code:
#                 return code.group(1).upper(), "boxed_content"

#             # Maybe the whole content is the code
#             if re.fullmatch(r'[A-Z]?\d|[A-I]', content.strip(), re.IGNORECASE):
#                 return content.strip().upper(), "boxed_whole"

#     # 2. <answer> ... </answer>
#     answer_blocks = re.findall(r'<answer>\s*([^<]+?)\s*</answer>', text, re.IGNORECASE | re.DOTALL)
#     if answer_blocks:
#         for block in reversed(answer_blocks):
#             code = re.search(r'\b([A-Z]?\d|[A-I])\b', block, re.IGNORECASE)
#             if code:
#                 return code.group(1).upper(), "answer_tag"

#     # 3. Keyword patterns (final answer :, therefore ..., etc.)
#     keyword_patterns = [
#         r'(?:final\s+answer|answer|conclusion|therefore|most\s+likely|root\s+cause)\s*[:=]\s*([A-Z]?\d|[A-I])\b',
#         r'(?:select|choose)\s*([A-Z]?\d|[A-I])\b',
#     ]

#     for pat in keyword_patterns:
#         m = re.search(pat, text, re.IGNORECASE)
#         if m:
#             return m.group(1).upper(), "keyword"

#     # 4. Last sentence / phrase with "is X" or "because X"
#     sentences = re.split(r'[.!?]\s+', text)
#     for sent in reversed(sentences):
#         m = re.search(r'\b([A-Z]?\d|[A-I])\s+(?:is|because|explains|caused|indicates)\b', sent, re.IGNORECASE)
#         if m:
#             return m.group(1).upper(), "context_sentence"

#     # 5. Last code-like token in the whole text (weak fallback)
#     all_codes = re.findall(r'\b([A-Z]?\d|[A-I])\b', text, re.IGNORECASE)
#     if all_codes:
#         last_code = all_codes[-1].upper()
#         if verbose and len(set(c.upper() for c in all_codes)) > 1:
#             print(f"[DEBUG] Multiple codes found, picked last: {last_code}")
#         return last_code, "last_code_fallback"

#     if verbose:
#         print("[WARNING] No valid code pattern found ‚Üí returning default")

#     return "NOT_FOUND", "default_fallback"


# class RCASubmissionGenerator:
#     def __init__(self, model_name: str, device: str = "cuda"):
#         print(f"Loading model: {model_name}")
#         self.device = device

#         self.tokenizer = AutoTokenizer.from_pretrained(
#             model_name,
#             trust_remote_code=True,
#             padding_side="left"
#         )

#         self.model = AutoModelForCausalLM.from_pretrained(
#             model_name,
#             torch_dtype=torch.float16,
#             device_map="auto",
#             trust_remote_code=True,
#             low_cpu_mem_usage=True
#         )
#         self.model.eval()

#         if self.tokenizer.pad_token_id is None:
#             self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

#         print("Model & tokenizer loaded.")

#     @torch.inference_mode()
#     def generate_multiple(self, question: str, n: int = 4) -> List[str]:
#         prompt = BASE_PROMPT.format(question.strip())

#         inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)

#         answers = []

#         for _ in range(n):
#             out = self.model.generate(
#                 **inputs,
#                 max_new_tokens=280,
#                 temperature=0.75,
#                 top_p=0.92,
#                 do_sample=True,
#                 pad_token_id=self.tokenizer.pad_token_id,
#                 eos_token_id=self.tokenizer.eos_token_id,
#             )

#             generated = self.tokenizer.decode(
#                 out[0][inputs.input_ids.shape[1]:],
#                 skip_special_tokens=True
#             ).strip()

#             code, method = extract_answer(generated)
#             answers.append(code)

#         return answers

#     def create_submission(self):
#         print("\nLoading test data...")

#         test_dfs = []

#         # The submission requires all the entries but the evaluation
#         # is done only on the ones related to phase 2.
#         # for key, path in [("phase1", PATHS["phase1"]), ("phase2", PATHS["phase2"])]:
#         for key, path in [("phase2", PATHS["phase2"])]:
#             if os.path.isfile(path):
#                 df = pd.read_csv(path)
#                 print(f"‚Üí Loaded {key}: {len(df)} rows")
#                 test_dfs.append(df)

#         if not test_dfs:
#             raise FileNotFoundError("No phase_1_test.csv nor phase_2_test.csv found.")

#         test_df = pd.concat(test_dfs, ignore_index=True)
#         test_df = test_df.drop_duplicates(subset=['ID'], keep='first')
#         print(f"Total unique questions: {len(test_df)}")

#         sample_df = pd.read_csv(PATHS["sample"])
#         print(f"Sample submission template: {len(sample_df)} rows")

#         print("\nGenerating predictions...")

#         predictions = {}

#         for idx, row in tqdm(test_df.iterrows(), total=len(test_df), desc="Questions"):
#             test_id = row["ID"]
#             q = row["question"]

#             codes = self.generate_multiple(q, n=4)

#             for i, code in enumerate(codes, 1):
#                 row_id = f"{test_id}_{i}"
#                 predictions[row_id] = code

#             # Light memory management
#             if (idx + 1) % 8 == 0:
#                 torch.cuda.empty_cache()
#                 gc.collect()

#         # Build submission
#         submission_rows = []

#         for rid in sample_df["ID"]:
#             base_id, _ = rid.rsplit("_", 1)
#             submission_rows.append({
#                 "ID": rid,
#                 "Qwen3-32B": "placeholder",
#                 "Qwen2.5-7B-Instruct": "placeholder",
#                 "Qwen2.5-1.5B-Instruct": predictions.get(rid, "placeholder"),
#             })

#         sub_df = pd.DataFrame(submission_rows)

#         # Raw save
#         sub_df.to_csv(PATHS["output"], index=False)
#         print(f"Raw predictions saved ‚Üí {PATHS['output']}")

#         # Final formatting
#         col = "Qwen2.5-1.5B-Instruct"
#         sub_df[col] = sub_df[col].apply(
#             lambda x: f"Based on the provided data, the most likely root cause is...: \\boxed{{{x}}}"
#             if x != "placeholder" else x
#         )

#         sub_df.to_csv(PATHS["output"], index=False)
#         print(f"Formatted submission saved ‚Üí {PATHS['output']}")

#         self._print_validation(sub_df)

#         return sub_df

#     def _print_validation(self, df: pd.DataFrame):
#         print("\n" + "‚ïê" * 60)
#         print(" VALIDATION SUMMARY ".center(60, "‚ïê"))
#         print("‚ïê" * 60)

#         track_col = "Qwen2.5-1.5B-Instruct"

#         print(f"Rows: {len(df)}")
#         # print(f"Unique base IDs: {df['ID'].str.replace(r'_\d$', '', regex=True).nunique()}")
#         # print(f"Expected rows (√ó4): {df['ID'].str.replace(r'_\d$', '', regex=True).nunique() * 4}")

#         preds = df[df[track_col] != "placeholder"][track_col]
#         print(f"\nReal predictions: {len(preds)} / {len(df)}")

#         if len(preds) > 0:
#             print("\nAnswer distribution:")
#             print(preds.str.extract(r'\\boxed\{([^}]+)\}', expand=False).value_counts().sort_index())

#         print("\n" + "‚ïê" * 60)


# def main():
#     print("‚ïê" * 70)
#     print("  AI Telco Troubleshooting Challenge ‚Äì Track 3 Submission Generator  ")
#     print("  Model:", MODEL_PATH.split("/")[-1])
#     print("‚ïê" * 70)

#     try:
#         generator = RCASubmissionGenerator(MODEL_PATH)
#         generator.create_submission()
#     except Exception as e:
#         print("\nERROR during execution:")
#         print(str(e))
#         raise

#     print("\nDone. Please upload submission.csv to the platform.\n")


# if __name__ == "__main__":
#     main()

In [None]:
%%writefile inference.py

"""
Google Colab Compatible Submission Script
For AI Telco Troubleshooting Challenge - Track 3
Supports phase_1_test.csv and/or phase_2_test.csv
Updated: January 2025 ‚Äì Aligned with training format
"""

import pandas as pd
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import re
from tqdm.auto import tqdm
import gc
import os
from typing import Tuple, List

# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
#  Configuration
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

MODEL_PATH = "yehoshua00/Qwen2.5-RCA-1.5B"

BASE_DIR = "/content/drive/MyDrive/The AI Telco Troubleshooting Challenge/data"

PATHS = {
    "phase1": os.path.join(BASE_DIR, "phase_1_test.csv"),
    "phase2": os.path.join(BASE_DIR, "phase_2_test.csv"),
    "sample": os.path.join(BASE_DIR, "SampleSubmission.csv"),
    "output": "/content/drive/MyDrive/submission.csv",
}

# ‚îÄ‚îÄ‚îÄ EXACT same prompt as training/eval ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
BASE_PROMPT = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Reasoning:
"""

# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
#  Robust Answer Extraction (same as eval)
# ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def extract_answer(generated: str, verbose: bool = False) -> Tuple[str, str]:
    """
    Enhanced extraction to handle: C1-C8, M1-M5, P1-P8, 1-8, A-I, etc.
    Returns: (predicted_code, extraction_method)
    """
    text = generated.strip()

    if verbose:
        print(f"[DEBUG] Generated text length: {len(text)} chars")

    # Priority 1: \boxed{...}
    boxed_pattern = r'\\boxed\s*\{([^}]+)\}'
    boxed_matches = list(re.finditer(boxed_pattern, text, re.IGNORECASE))

    if boxed_matches:
        for m in reversed(boxed_matches):
            content = m.group(1).strip()
            if verbose:
                print(f"[DEBUG] \\boxed{{{content}}}")

            code = re.search(r'\b([A-Z]?\d|[A-I])\b', content, re.IGNORECASE)
            if code:
                return code.group(1).upper(), "boxed"

            if re.fullmatch(r'[A-Z]?\d|[A-I]', content, re.IGNORECASE):
                return content.upper(), "boxed_whole"

    # Priority 2: ### Response: section
    response_pattern = r'###\s*Response\s*:\s*([^\n]+)'
    response_matches = re.findall(response_pattern, text, re.IGNORECASE)

    if response_matches:
        for block in reversed(response_matches):
            code = re.search(r'\b([A-Z]?\d|[A-I])\b', block, re.IGNORECASE)
            if code:
                return code.group(1).upper(), "response_section"

    # Priority 3: Keywords (final answer, conclusion, etc.)
    keyword_patterns = [
        r'(?:final\s+answer|answer|conclusion|therefore|root\s+cause)\s*[:=]?\s*([A-Z]?\d|[A-I])\b',
        r'(?:select|choose)\s*([A-Z]?\d|[A-I])\b',
    ]

    for pat in keyword_patterns:
        m = re.search(pat, text, re.IGNORECASE)
        if m:
            return m.group(1).upper(), "keyword"

    # Priority 4: Last code token (weak fallback)
    all_codes = re.findall(r'\b([A-Z]?\d|[A-I])\b', text, re.IGNORECASE)
    if all_codes:
        last_code = all_codes[-1].upper()
        if verbose and len(set(c.upper() for c in all_codes)) > 1:
            print(f"[DEBUG] Multiple codes found, picked last: {last_code}")
        return last_code, "last_code_fallback"

    if verbose:
        print("[WARNING] No valid code pattern found")

    return "NOT_FOUND", "default_fallback"


class RCASubmissionGenerator:
    def __init__(self, model_name: str, device: str = "cuda"):
        print(f"Loading model: {model_name}")
        self.device = device

        self.tokenizer = AutoTokenizer.from_pretrained(
            model_name,
            trust_remote_code=True,
            padding_side="left"
        )

        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.float16,
            device_map="auto",
            trust_remote_code=True,
            low_cpu_mem_usage=True
        )
        self.model.eval()

        if self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

        print("Model & tokenizer loaded.")

    @torch.inference_mode()
    def generate_multiple(self, question: str, n: int = 4) -> List[str]:
        """Generate n predictions for a single question"""

        # Same question splitting logic as training
        given_pos = question.find("Given:")
        if given_pos != -1:
            instruction = question[:given_pos].strip()
            input_text = question[given_pos:].strip()
        else:
            instruction = question.strip()
            input_text = ""

        # Use EXACT same prompt format as training
        prompt = BASE_PROMPT.format(instruction, input_text)

        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)

        answers = []

        for _ in range(n):
            out = self.model.generate(
                **inputs,
                max_new_tokens=512,  # Match training/eval
                temperature=0.7,     # Match training/eval
                top_p=0.95,          # Match training/eval
                do_sample=True,
                pad_token_id=self.tokenizer.pad_token_id,
                eos_token_id=self.tokenizer.eos_token_id,
                repetition_penalty=1.1,
            )

            generated = self.tokenizer.decode(
                out[0][inputs.input_ids.shape[1]:],
                skip_special_tokens=True
            ).strip()

            code, method = extract_answer(generated)
            answers.append(code)

        return answers

    def create_submission(self):
        print("\nLoading test data...")

        test_dfs = []

        # Load phase 2 test data
        for key, path in [("phase2", PATHS["phase2"])]:
            if os.path.isfile(path):
                df = pd.read_csv(path)
                print(f"‚Üí Loaded {key}: {len(df)} rows")
                test_dfs.append(df)

        if not test_dfs:
            raise FileNotFoundError("No phase_2_test.csv found.")

        test_df = pd.concat(test_dfs, ignore_index=True)
        test_df = test_df.drop_duplicates(subset=['ID'], keep='first')
        print(f"Total unique questions: {len(test_df)}")

        sample_df = pd.read_csv(PATHS["sample"])
        print(f"Sample submission template: {len(sample_df)} rows")

        print("\nGenerating predictions...")

        predictions = {}

        for idx, row in tqdm(test_df.iterrows(), total=len(test_df), desc="Questions"):
            test_id = row["ID"]
            q = row["question"]

            codes = self.generate_multiple(q, n=4)

            for i, code in enumerate(codes, 1):
                row_id = f"{test_id}_{i}"
                predictions[row_id] = code

            # Memory management
            if (idx + 1) % 8 == 0:
                torch.cuda.empty_cache()
                gc.collect()

        # Build submission
        submission_rows = []

        for rid in sample_df["ID"]:
            submission_rows.append({
                "ID": rid,
                "Qwen3-32B": "placeholder",
                "Qwen2.5-7B-Instruct": "placeholder",
                "Qwen2.5-1.5B-Instruct": predictions.get(rid, "placeholder"),
            })

        sub_df = pd.DataFrame(submission_rows)

        # Format final predictions
        col = "Qwen2.5-1.5B-Instruct"
        sub_df[col] = sub_df[col].apply(
            lambda x: f"Based on the provided data, the most likely root cause is...: \\boxed{{{x}}}"
            if x != "placeholder" else x
        )

        sub_df.to_csv(PATHS["output"], index=False)
        print(f"\nSubmission saved ‚Üí {PATHS['output']}")

        self._print_validation(sub_df)

        return sub_df

    def _print_validation(self, df: pd.DataFrame):
        print("\n" + "‚ïê" * 60)
        print(" VALIDATION SUMMARY ".center(60, "‚ïê"))
        print("‚ïê" * 60)

        track_col = "Qwen2.5-1.5B-Instruct"

        print(f"Total rows: {len(df)}")

        preds = df[df[track_col] != "placeholder"][track_col]
        print(f"Real predictions: {len(preds)} / {len(df)}")

        if len(preds) > 0:
            print("\nAnswer distribution:")
            codes = preds.str.extract(r'\\boxed\{([^}]+)\}', expand=False)
            print(codes.value_counts().sort_index())

        print("\n" + "‚ïê" * 60)


def main():
    print("‚ïê" * 70)
    print("  AI Telco Troubleshooting Challenge ‚Äì Track 3 Submission  ")
    print(f"  Model: {MODEL_PATH.split('/')[-1]}")
    print("‚ïê" * 70)

    try:
        generator = RCASubmissionGenerator(MODEL_PATH)
        generator.create_submission()
    except Exception as e:
        print("\nERROR during execution:")
        print(str(e))
        raise

    print("\nDone! Upload submission.csv to the platform.\n")


if __name__ == "__main__":
    main()

In [None]:
!python inference.py

<div align="center">

<img src="https://capsule-render.vercel.app/api?type=rect&color=0:0f2027,100:203a43&height=2&section=header"/>

### üß† Intelligent 5G Network Fault Diagnosis  
**Qwen2.5 (1.5B Instruct) ‚Ä¢ Fine-Tuned with Unsloth (LoRA)**

<br/>

<a href="https://unsloth.ai">
  <img src="https://img.shields.io/badge/Training-Unsloth-0f62fe?style=flat-square">
</a>
<a href="https://huggingface.co/Qwen">
  <img src="https://img.shields.io/badge/Model-Qwen2.5--Instruct-5a5a5a?style=flat-square">
</a>
<a href="#">
  <img src="https://img.shields.io/badge/Task-Root%20Cause%20Analysis-2e7d32?style=flat-square">
</a>

<br/><br/>

<table>
<tr>
<td align="center">üì°<br/><sub>5G KPIs</sub></td>
<td align="center">‚öôÔ∏è<br/><sub>LoRA Fine-Tuning</sub></td>
<td align="center">üß©<br/><sub>Fault Classification</sub></td>
<td align="center">üìà<br/><sub>Low-Compute</sub></td>
</tr>
</table>

<br/>

<sub>
This project demonstrates parameter-efficient fine-tuning of a compact LLM  
for automated diagnosis of **5G radio access network performance degradation**.
</sub>

<br/>

<a href="https://docs.unsloth.ai">Unsloth Documentation</a> ‚Ä¢
<a href="https://github.com/unslothai/unsloth">Unsloth GitHub</a>

<br/><br/>

<img src="https://capsule-render.vercel.app/api?type=rect&color=0:203a43,100:0f2027&height=2&section=footer"/>

</div>
