# Math Question Answer Verification Competition

## Starter Code

Borrowed from [official Unsloth implementation](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing#scrollTo=MKX_XKs_BNZR)

In [1]:
# %%capture
# This cell will take time
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Collecting unsloth
  Downloading unsloth-2024.11.6-py3-none-any.whl.metadata (59 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/59.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.6/59.6 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth-zoo>=2024.11.1 (from unsloth)
  Downloading unsloth_zoo-2024.11.5-py3-none-any.whl.metadata (16 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.28.post3-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting triton>=3.0.0 (from unsloth)
  Downloading triton-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.8.14-py3-none-any.whl.metadata (8.4 kB)
Collecting datasets>=2.16.0 (from unsloth)
  Download

Found existing installation: unsloth 2024.11.6
Uninstalling unsloth-2024.11.6:
  Successfully uninstalled unsloth-2024.11.6
Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-q82_g4yq/unsloth_5087dfd6c2ae4b8a8fcea9eaab3ea72f
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-q82_g4yq/unsloth_5087dfd6c2ae4b8a8fcea9eaab3ea72f
  Resolved https://github.com/unslothai/unsloth.git to commit d8ff860c842095f4729fdd1d5aedf567a9e2c4da
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2024.11.6-py3-none-a

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [3]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2024.11.6: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 8.9. CUDA Toolkit = 12.4.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

## Load model and wrap with LoRA adapters

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",
                      "multi_head_attention",
                      "attention_output",
                      "lm_head",
                      "cls_output",
                      "attention_bias",
                     ],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth: You added custom modules, but Unsloth hasn't optimized for this.
Beware - your finetuning might be noticeably slower!
Unsloth: You added custom modules, but Unsloth hasn't optimized for this.
Beware - your finetuning might be noticeably slower!
Unsloth: You added custom modules, but Unsloth hasn't optimized for this.
Beware - your finetuning might be noticeably slower!
Unsloth: You added custom modules, but Unsloth hasn't optimized for this.
Beware - your finetuning might be noticeably slower!
Unsloth: Offloading output_embeddings to disk to save VRAM


  offloaded_W = torch.load(filename, map_location = "cpu", mmap = True)
Unsloth 2024.11.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Unsloth: Training lm_head in mixed precision to save VRAM


## Competition dataset

In [5]:
# download and load competition dataset

from datasets import load_dataset
dataset = load_dataset("ad6398/nyu-dl-teach-maths-comp")

# print and see dataset
dataset['train'][0]

README.md:   0%|          | 0.00/2.09k [00:00<?, ?B/s]

train-00000-of-00002.parquet:   0%|          | 0.00/195M [00:00<?, ?B/s]

train-00001-of-00002.parquet:   0%|          | 0.00/195M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/3.65M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10000 [00:00<?, ? examples/s]

{'question': 'What is the radius of the circle inscribed in triangle $ABC$ if $AB = 22, AC=12,$ and $BC=14$? Express your answer in simplest radical form.',
 'is_correct': True,
 'answer': '3.16227766016838',
 'solution': "The circle is inscribed in a triangle, and we know the sides of the triangle.\nTo use the inradius formula, we need to know the area of the triangle.\nWe can use Heron's formula to calculate the area.\n<llm-code>\nimport math\nfrom sympy import *\n\nAB, AC, BC = 22, 12, 14\n\n# Calculate the semiperimeter and area using Heron's formula\ns = (AB + AC + BC) / 2\nK = sqrt(s * (s - AB) * (s - AC) * (s - BC))\n\nprint(K)\n</llm-code>\n<llm-code-output>\n75.8946638440411\n</llm-code-output>\nLet's now use the formula for the radius of the inscribed circle.\n<llm-code>\nr = K / s\nprint(r)\n</llm-code>\n<llm-code-output>\n3.16227766016838\n</llm-code-output>\nThe answer is \\boxed{3.16227766016838}"}

In [6]:
prompt = """You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not.
Based on the question, the provided answer, and the explanation, your response should be 'True' if correct, otherwise 'False'. Below is Question and Answer.


### Question:
{}

### Answer:
{}

### Explanation:
{}

### Output:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    question = examples["question"]
    ans       = examples["answer"]
    solution = examples["solution"]
    output      = examples["is_correct"]
    texts = []
    for instruction, input, sol, output in zip(question, ans, solution, output):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt.format(instruction, input, sol, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

In [7]:
# Process the training dataset and generate prompt for each datapoint
train_dataset = dataset['train'].map(formatting_prompts_func, batched = True,)

# train_dataset_full = dataset['train'].shuffle(seed=42)
# validation_size = int(0.01 * len(train_dataset_full))
# train_dataset = train_dataset_full.select(range(len(train_dataset_full) - validation_size))
# validation_dataset = train_dataset_full.select(range(len(train_dataset_full) - validation_size, len(train_dataset_full)))

# print(f"Main training set size: {len(train_dataset_main)}")
# print(f"Validation set size: {len(validation_dataset)}")

Map:   0%|          | 0/1000000 [00:00<?, ? examples/s]

In [8]:
#print a smaple training example
train_dataset['text'][0]

"You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not.\nBased on the question, the provided answer, and the explanation, your response should be 'True' if correct, otherwise 'False'. Below is Question and Answer.\n\n\n### Question:\nWhat is the radius of the circle inscribed in triangle $ABC$ if $AB = 22, AC=12,$ and $BC=14$? Express your answer in simplest radical form.\n\n### Answer:\n3.16227766016838\n\n### Explanation:\nThe circle is inscribed in a triangle, and we know the sides of the triangle.\nTo use the inradius formula, we need to know the area of the triangle.\nWe can use Heron's formula to calculate the area.\n<llm-code>\nimport math\nfrom sympy import *\n\nAB, AC, BC = 22, 12, 14\n\n# Calculate the semiperimeter and area using Heron's formula\ns = (AB + AC + BC) / 2\nK = sqrt(s * (s - AB) * (s - AC) * (s - BC))\n\nprint(K)\n</llm-code>\n<llm-code-output>\n75.8946638440411\n</llm-code-output>\nLet's now use t

## SFT

In [9]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

training_args = TrainingArguments(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 8,
        warmup_steps = 5,
#         num_train_epochs = 1, # Set this for 1 full training run.
#         warmup_ratio = 0.1,
        max_steps = 25,
        learning_rate = 1e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
#         gradient_checkpointing=True,
#         max_grad_norm=0.5,
    )

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 4,
    packing = False, # Can make training 5x faster for short sequences.
    args = training_args
)

Map (num_proc=4):   0%|          | 0/1000000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [10]:
for i in range(8):
  trainer_stats = trainer.train()

# iteration = 0
# while True:
#     trainer_stats = trainer.train()

#     FastLanguageModel.for_inference(model)

#     # Evaluate on validation set
#     val_texts_batch = val_dataset.shuffle(seed=3407).select(range(500))
#     inputs = tokenizer(val_texts_batch['text'], return_tensors="pt", padding=True, truncation=True).to("cuda")
#     outputs = model.generate(**inputs, max_new_tokens=64)
#     predictions = tokenizer.batch_decode(outputs, skip_special_tokens=True)

#     # Calculate validation accuracy
#     correct_preds = sum(1 for pred in predictions if pred.strip().endswith("True"))
#     val_accuracy = correct_preds / len(predictions)

#     iteration += 1
#     print(f"Iteration {iteration}: Validation Accuracy: {val_accuracy:.4f}")

#     if val_accuracy >= 0.7:
#         print("Achieved target accuracy on validation set.")
#         break

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,1.5846
2,1.6023
3,1.6484
4,1.4196
5,1.3349
6,1.2158
7,1.2389
8,1.0951
9,0.8845
10,0.824


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.6813
2,0.6866
3,0.7258
4,0.5586
5,0.5397
6,0.5161
7,0.7194
8,0.646
9,0.5687
10,0.562


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.5505
2,0.5424
3,0.5765
4,0.417
5,0.4059
6,0.3911
7,0.5625
8,0.5088
9,0.4494
10,0.4515


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.4314
2,0.4269
3,0.4395
4,0.3083
5,0.2999
6,0.2802
7,0.4209
8,0.3779
9,0.3413
10,0.3465


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.3063
2,0.317
3,0.3038
4,0.2116
5,0.2032
6,0.1766
7,0.2773
8,0.2485
9,0.2244
10,0.2409


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.187
2,0.208
3,0.1859
4,0.1345
5,0.13
6,0.1052
7,0.1813
8,0.1538
9,0.1442
10,0.1673


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.1215
2,0.1385
3,0.1204
4,0.088
5,0.1035
6,0.0964
7,0.149
8,0.1036
9,0.0986
10,0.1265


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 8
\        /    Total batch size = 8 | Total steps = 25
 "-____-"     Number of trainable parameters = 567,279,616


Step,Training Loss
1,0.087
2,0.1
3,0.0804
4,0.0628
5,0.0692
6,0.0605
7,0.1163
8,0.1052
9,0.0945
10,0.0979


## inference

In [11]:
# Sample inferene data point
test_dataset = dataset['test']

sample_ques = test_dataset['question'][0]
sample_ans = test_dataset['answer'][0]
sample_solu = test_dataset['solution'][0]


In [12]:
# Running inference on single test
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
input_prompt = prompt.format(
        sample_ques, # ques
        sample_ans, # given answer
        sample_solu,
        "", # output - leave this blank for generation! LLM willl generate is it is True or False
    )

print("Input Promt:\n", input_prompt)
inputs = tokenizer(
[
    input_prompt
], return_tensors = "pt").to("cuda")

input_shape = inputs['input_ids'].shape
input_token_len = input_shape[1] # 1 because of batch
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
# you can get the whole generated text by uncommenting the below line
# text_generated = tokenizer.batch_decode([outputs, skip_special_tokens=True)

response = tokenizer.batch_decode([outputs[0][input_token_len:]], skip_special_tokens=True)
response

Input Promt:
 You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not.
Based on the question, the provided answer, and the explanation, your response should be 'True' if correct, otherwise 'False'. Below is Question and Answer.


### Question:
The Parker family needs to leave the house by 5 pm for a dinner party. Mrs. Parker was waiting to get into the bathroom at 2:30 pm. Her oldest daughter used the bathroom for 45 minutes and her youngest daughter used the bathroom for another 30 minutes. Then her husband used it for 20 minutes. How much time will Mrs. Parker have to use the bathroom to leave on time?

### Answer:
205

### Explanation:
Let's solve this problem using Python code.
<llm-code>
minutes_per_hour = 60
minutes_left_before_5 = 5 * minutes_per_hour
total_time_spent_by_family = 45 + 30 + 20
minutes_before_5_after_family = minutes_left_before_5 - total_time_spent_by_family
minutes_before_5_after_family
</llm-code>
<

['True']

## saving model

In [13]:
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

In [14]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference


==((====))==  Unsloth 2024.11.6: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 8.9. CUDA Toolkit = 12.4.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


# Batch Inference


In [15]:
prompt = """You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not.
Based on the question, the provided answer, and the explanation, your response should be 'True' if correct, otherwise 'False'. Below is Question and Answer.


### Question:
{}

### Answer:
{}

### Explanation:
{}

### Output:
{}"""

def formatting_test_prompts_func(examples):
    question = examples["question"]
    ans       = examples["answer"]
    solution = examples["solution"]
    texts = []
    for instruction, input, sol in zip(question, ans, solution):
        text = prompt.format(instruction, input, sol, "")
        texts.append(text)
    return { "text" : texts, }


test_text_dataset = dataset['test'].map(formatting_test_prompts_func, batched = True,)

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

In [16]:
# Running inference on batch test
import gc

FastLanguageModel.for_inference(model)

test_dataset = test_text_dataset['text']
batch_size = 8
output_label = []

for i in range(0, len(test_dataset), batch_size):
    torch.cuda.empty_cache()
    gc.collect()

    batch = test_dataset[i:i + batch_size]

    # Tokenize batch
    inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=512).to("cuda")

    input_lengths = inputs['input_ids'].shape[1]

    with torch.no_grad():
      outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True, num_beams=1, early_stopping=True)

    # Extract only the generated part and decode
    text_generated = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    # Extract the last line (True/False) from each generated text
    for text in text_generated:
        output_label.append(text.splitlines()[-1])

    print(f"batch-num: {min(i + batch_size, len(test_dataset))}/{len(test_dataset)}")


batch-num: 8/10000
batch-num: 16/10000
batch-num: 24/10000
batch-num: 32/10000
batch-num: 40/10000
batch-num: 48/10000
batch-num: 56/10000
batch-num: 64/10000
batch-num: 72/10000
batch-num: 80/10000
batch-num: 88/10000
batch-num: 96/10000
batch-num: 104/10000
batch-num: 112/10000
batch-num: 120/10000
batch-num: 128/10000
batch-num: 136/10000
batch-num: 144/10000
batch-num: 152/10000
batch-num: 160/10000
batch-num: 168/10000
batch-num: 176/10000
batch-num: 184/10000
batch-num: 192/10000
batch-num: 200/10000
batch-num: 208/10000
batch-num: 216/10000
batch-num: 224/10000
batch-num: 232/10000
batch-num: 240/10000
batch-num: 248/10000
batch-num: 256/10000
batch-num: 264/10000
batch-num: 272/10000
batch-num: 280/10000
batch-num: 288/10000
batch-num: 296/10000
batch-num: 304/10000
batch-num: 312/10000
batch-num: 320/10000
batch-num: 328/10000
batch-num: 336/10000
batch-num: 344/10000
batch-num: 352/10000
batch-num: 360/10000
batch-num: 368/10000
batch-num: 376/10000
batch-num: 384/10000
batch

In [17]:
# Running inference on batch test

# FastLanguageModel.for_inference(model)

# test_dataset = test_text_dataset['text']
# batch_size = 16
# batch_number = int(len(test_dataset)/batch_size)
# output_label = []

# for i in range(batch_number):
#   print("batch-num:", i)
#   inputs_prompt = []
#   for j in range(batch_size):
#       index = batch_size * i + j
#       inputs_prompt.append(test_dataset[index])

#   inputs = tokenizer(inputs_prompt, return_tensors = "pt", padding=True, truncation=True).to("cuda")
#   outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True, num_beams=2)
#   text_generated = tokenizer.batch_decode(outputs, skip_special_tokens=True)
#   for text in text_generated:
#     output_label.append(text.splitlines()[-1])

In [18]:
print(len(output_label))

import csv

cnt = 0
csv_data = [["ID", "is_correct"]]
for i in range(len(output_label)):
  if output_label[i] == 'True':
    cnt = cnt + 1
    csv_data.append([i, True])
  else:
    csv_data.append([i, False])

# Write to CSV file
with open('data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(csv_data)

print("successfully write to data.csv")

10000
successfully write to data.csv


In [19]:
from google.colab import files
files.download('data.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>