# TinySQL: Test output accuracy generated by models

**Background:** A "TinySQL" model takes as input 1) An Instruction, which is an english data request sentence and 2) A Context, which is a SQL table create statement. The model outputs a Response, which is a SQL select statement.  

**Notebook purpose:** Test if a model can generate accurate outputs

**Notebook details:** This notebook:
- runs with models M1 to M3 and command sets CS1 to CS3
- uses the generate_csN functions to generate prompts
- uses nnsight call "model.generate(inputs['input_ids'], max_new_tokens=max_new_tokens, ...)" to generate a full answer
- scores the prediction accuracy using evaluate_csN_prediction and also "exact match"

This notebook:
- Was developed on Google Colab using an A100 with 40GB GPU and 80GB system RAM.
- Runs with TinyStories/Qwen/Llama with base/CS1/CS2/CS3.
- Requires a GITHUB_TOKEN secret to access Martian TinySQL code repository.
- Requires a HF_TOKEN secret to access Martian HuggingFace repository.
- Is based on the https://nnsight.net/notebooks/tutorials/activation_patching tutorial
- Was developed under a grant provided by withmartian.com ( https://withmartian.com )


# Import libraries
Imports standard libraries. Do not read.

In [None]:
# https://nnsight.net/
!pip install -U nnsight -q
# !pip install nnsight==0.3.7 -q

In [None]:
from IPython.display import clear_output
import einops
import torch
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "colab"

import nnsight
from nnsight import LanguageModel, util

In [None]:
from getpass import getpass
from google.colab import userdata
import gc
import weakref
import json

In [None]:
!pip install datasets -q

In [None]:
github_token = userdata.get("GITHUB_TOKEN")

# Install the private repository using the token
!pip install --upgrade git+https://{github_token}@github.com/withmartian/TinySQL.git

import TinySQL as qts

# Select model, command set and feature to investigate


In [None]:
model_num = 1                 # 1=TinyStories, 2=Qwen, 3=Llama
cs_num = 2                    # 0=BaseModel, 1=CS1, 2=CS2 or 3=CS3
use_synonyms_table = True     # Use synonyms for table names in the english Instructions
use_synonyms_field = True     # Use synonyms for field names in the english Instructions
max_new_tokens = 100          # Max number of tokens to generate
batch_size = 100

# Load model

In [None]:
model = qts.load_tinysql_model(model_num, cs_num, auth_token=userdata.get("HF_TOKEN"))
clear_output()
print(model)

# Generate test data

In [None]:
# Generate a batch of prompts
examples = qts.generate_csn(batch_size=batch_size, csn=max(1,cs_num), min_cols=3, max_cols=5, use_synonyms_table=use_synonyms_table, use_synonyms_field=use_synonyms_field)

# Score generated outpts

In [None]:
score1_sum = 0
score2_sum = 0
for idx in range(batch_size):
    example = examples[idx]

    the_prompt = example.get_alpaca_prompt()
    #print("Run:", idx, "Model input:", the_prompt)

    inputs = model.tokenizer(the_prompt, return_tensors="pt", padding=True)
    with model.generate(inputs['input_ids'], max_new_tokens=max_new_tokens, pad_token_id=model.tokenizer.eos_token_id) as tracer:
        final_output = model.generator.output.save()

    final_output = final_output.detach().cpu().numpy()
    decoded_output = model.tokenizer.decode(final_output[0], skip_special_tokens=True)

    assert decoded_output.startswith(the_prompt)
    model_added = decoded_output[len(the_prompt):]
    #print("Run:", idx, "Model added:", model_added)

    score1 = 0
    if cs_num == 0 or cs_num == 1:
        score1 = qts.evaluate_cs1_prediction(example, model_added)
    elif cs_num == 2:
        score1 = qts.evaluate_cs2_prediction(example, model_added)
    elif cs_num == 3:
        score1 = qts.evaluate_cs3_prediction(example, model_added)
    score2 = 1 if model_added == example.sql_statement else 0

    if score1 < 1 or score2 < 0:
      print( "Expected:", example.sql_statement)
      print( "Actual:", model_added)
      print("Run:", idx, "Score1:", score1, "Score2:", score2)
      print()

    score1_sum += score1
    score2_sum += score2

print()
print("Averages: Score1: ", score1_sum/batch_size, "Score2: ", score2_sum/batch_size)