In [None]:
from transformers import StoppingCriteria, StoppingCriteriaList
import torch

class StopOnTokens(StoppingCriteria):
    def __init__(self, stop_token_ids: list):
        super().__init__()
        self.stop_token_ids = stop_token_ids

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        # Check if the last token matches any of the stop tokens
        return any(input_ids[0, -len(token):].tolist() == token for token in self.stop_token_ids)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("eagle0504/qwen-distilled-scout-1.5b-instruct-gen1")
tokenizer = AutoTokenizer.from_pretrained("eagle0504/qwen-distilled-scout-1.5b-instruct-gen1")

# Example stop sequence
stop_sequence = "</response>"
stop_ids = tokenizer.encode(stop_sequence, add_special_tokens=False)
stopping_criteria = StoppingCriteriaList([StopOnTokens([stop_ids])])

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


config.json:   0%|          | 0.00/702 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/6.96k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/485 [00:00<?, ?B/s]

For math problems, use instruction: `<instruction>This is a math problem.</instruction>`

In [None]:
%%time

prompt = (
    "<instruction>This is a math problem.</instruction>"
    "<question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?</question>"
)

# Run generation with stop sequence
inputs = tokenizer(
    prompt,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    stopping_criteria=stopping_criteria
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


<instruction>This is a math problem.</instruction><question>Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?</question><think>
To solve this problem, we need to calculate the total number of clips Natalia sold in April and May. Here's the step-by-step reasoning:

1. **Clips sold in April**: The problem states that Natalia sold clips to 48 of her friends in April. This means the number of clips sold in April is 48.

2. **Clips sold in May**: The problem mentions that Natalia sold half as many clips in May as she did in April. Since she sold 48 clips in April, half of that number is 24. So, she sold 24 clips in May.

3. **Total clips sold**: To find the total number of clips sold in April and May, we add the number of clips sold in each month: 48 (April) + 24 (May) = 72.

Thus, Natalia sold a total of 72 clips in April and May.
</think>

<response>
Natalia sold a total of 72 clips i

For SQL problems, use instruction: `<instruction>This is a SQL problem.</instruction>`

In [None]:
%%time

prompt = (
    "<instruction>This is a SQL problem.</instruction>"
    "<question>What is the total trade value and average price for each trader and stock in the trade_history table?</question>"
)

# Run generation with stop sequence
inputs = tokenizer(
    prompt,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    stopping_criteria=stopping_criteria
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


<instruction>This is a SQL problem.</instruction><question>What is the total trade value and average price for each trader and stock in the trade_history table?</question><think>
To answer the question, we need to calculate the total trade value and average price for each trader and stock using the `trade_history` table. Here's the reasoning:

1. **Total Trade Value**: The total trade value for each trader and stock can be calculated by summing the product of the quantity and price for each trade. This is done using the `SUM` function on the `quantity * price` column, grouped by `trader` and `stock`.

2. **Average Price**: The average price for each trader and stock can be calculated by averaging the `price` column, grouped by `trader` and `stock`.

3. **Preventive Coding**: To avoid SQL injection, we use `execEditor` to prevent SQL injection attacks. This is a best practice in database interactions.

4. **Code Structure**: The code is structured to define the `trader`, `stock`, and th

For python problems, use instruction: `<instruction>This is a python problem.</instruction>`

In [None]:
%%time

prompt = (
    "<instruction>This is a python problem.</instruction>"
    "<question>Write a replace method for a string class which replaces the given string with a given set of characters. \nstring = 'Hello World!' \nreplace_with = 'Greetings!'</question>"
)

# Run generation with stop sequence
inputs = tokenizer(
    prompt,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    stopping_criteria=stopping_criteria
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


<instruction>This is a python problem.</instruction><question>Write a replace method for a string class which replaces the given string with a given set of characters. 
string = 'Hello World!' 
replace_with = 'Greetings!'</question><think>
To solve this problem, we need to create a `replace` method for a string class that replaces spaces in the string with a given set of characters. Here's the reasoning:

1. **Understand the Problem**: We need to replace all spaces in the string with the string `replace_with`. For example, if the input string is "Hello World!" and `replace_with` is "Greetings!", the output should be "HelloGreetings!World!".

2. **Method Signature**: The method should take `self` (since it's a method of a string class) and `replace_with` as parameters.

3. **Iterate Through the String**: We need to loop through each character in the string (`self`).

4. **Check for Spaces**: For each character, check if it is a space (`" "`). If it is, append the `replace_with` string t

If interested in subscribe to this API, please use this [payment plan](https://buy.stripe.com/cNieVcePOcTt9rFdLa43S0a). 💳💡

For questions, please email: `yiqiao.yin@wyn-associates.com`. 📧❓


In [15]:
from google.colab import userdata
RUNPOD_API_KEY = userdata.get('RUNPOD_API_KEY')

In [20]:
import requests
import time

def run_inference(api_key: str, endpoint_id: str, instruction: str, prompt: str) -> None:
    """
    Submits a job to RunPod and polls until the result is ready.

    Args:
        api_key (str): Your RunPod API key.
        endpoint_id (str): The ID of the RunPod endpoint.
        instruction (str): The instruction for the problem, e.g. "math", "SQL", or "python".
        prompt (str): The prompt/question to send for inference.

    Returns:
        None. Prints result to console.
    """
    # Submit the job
    structured_prompt = (
        f"<instruction>This is a {instruction} problem.</instruction>"
        f"<question>{prompt}</question>"
    )
    url = f'https://api.runpod.ai/v2/{endpoint_id}/run'
    payload = {
        "input": {
            "prompt": structured_prompt,
            "sampling_params": {
                "temperature": 0.8,
                "max_tokens": 1024,
                "stop": "</response>"
            }
        }
    }
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {api_key}'
    }

    submit_response = requests.post(url, headers=headers, json=payload)
    if submit_response.status_code != 200:
        print("Failed to submit job:", submit_response.text)
        return

    job_id = submit_response.json().get('id')
    if not job_id:
        print("No job ID returned.")
        return

    print(f"Job submitted successfully. Job ID: {job_id}")

    # Poll for status
    status_url = f'https://api.runpod.ai/v2/{endpoint_id}/status/{job_id}'
    while True:
        res = requests.get(status_url, headers=headers)
        result = res.json()
        print("Status:", result['status'])

        if result['status'] == 'COMPLETED':
            print("Output:", result['output'])
            try:
                output_text = result['output'][0]["choices"][0]["tokens"][0] + "</response>"
                print("Answer:", output_text)
            except Exception as e:
                print("Unexpected output format:", e)
            break
        elif result['status'] in ['FAILED', 'CANCELLED']:
            print("Job failed or was cancelled:", result)
            break

        time.sleep(2)  # wait before polling again

# === Usage: Ask Yiqiao Yin ===
API_KEY = "XXX"   # RUNPOD_API_KEY
ENDPOINT_ID = "XXX"       # ENDPOINT_ID
PROMPT = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"

# run_inference(API_KEY, ENDPOINT_ID, PROMPT)

In [23]:
%%time

# Once the connection is set, it's much faster and should only take a few seconds unlike on Colab a T4 GPU would probably take 2 minutes for one inference.

PROMPT = "Give me a py function of fibonacci sequence. "

run_inference(API_KEY, ENDPOINT_ID, "python", PROMPT)

Job submitted successfully. Job ID: 3a0a1349-fdcd-405b-ada1-0b9e7bfc2ebb-e2
Status: IN_PROGRESS
Status: IN_PROGRESS
Status: COMPLETED
Output: [{'choices': [{'tokens': ["<think>To get a Python function that generates the Fibonacci sequence, let's think about how the sequence works. The Fibonacci sequence starts with 0 and 1, and each subsequent number is the sum of the two preceding ones. So, we can use this logic to create a function.\n\nFirst, I need to handle the base cases. If the user's argument `n` is 0 or 1, I should return `n` directly since the 0th element is 0 and the 1st element is 1.\n\nNow, for `n > 1`, I need to start building the sequence. I'll initialize a list with the first two elements, `[0, 1]`, called `fib_list`. Then, I'll use a `for` loop to generate the rest of the sequence. The loop will run from 2 to `n-1` because the list already includes the first two elements.\n\nInside the loop, I'll append the sum of the two elements in the list to it. This will continue u

If interested in subscribe to this API, please use this [payment plan](https://buy.stripe.com/cNieVcePOcTt9rFdLa43S0a). 💳💡

For questions, please email: `yiqiao.yin@wyn-associates.com`. 📧❓