<a href="https://colab.research.google.com/github/wandb/aihackercup/blob/main/one_shot_solver.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# A simple one-shot solver for the AI Hacker Cup 2024 Qualification Round

## Setup 

**Note: You need to run this cell only once**
We will clone the starter-kits repo
Set the rag folder as our working directory
and install the dependencies for the project.

**You can comment out the cell after you have run it once.**

In [None]:
# # Clone the starter-kits repo
# !git clone https://github.com/wandb/aihackercup
# # Change directory to the rag folder. Running the next line twice in the same session will raise an error.
# %cd aihackercup
# # Install dependencies
# !pip install -r requirements.txt -qq

To run this colab, create a [free Weights & Biases (W&B) account here](https://wandb.ai/site?utm_source=colab&utm_medium=code&utm_campaign=lightning-ai-hacker-cup) and then copy your API key from https://wandb.ai/authorize into the input box below when requested.

In [1]:
import os
import weave

WEAVE_PROJECT = "ai-hacker-cup-benchmark"
weave_client = weave.init(WEAVE_PROJECT)

weave version 0.51.6 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: morgan.
View Weave data at https://wandb.ai/morgan/ai-hacker-cup-benchmark/weave


In [2]:
os.environ["FAST_LLM"] ="gpt-4o-2024-08-06"
os.environ["STRONG_LLM"] = "gpt-4o-2024-08-06"


# URL for the MistralAI api we'll be using
# os.environ["BASE_URL"] = "http://195.242.25.198:8000/v1"
# os.environ["API_KEY"] = "dummy_key"

# Set the max tokens for the models and how many parallel requests to make in Weave Evaluations
# os.environ["MAX_TOKENS"] = "4096"
os.environ["WEAVE_PARALLELISM"] = "5"

In [3]:
import asyncio
import logging

# Start of workout
from utils import Problem, async_client, oai_client, format_response, check_correctness

from dotenv import load_dotenv
load_dotenv()



True

In [4]:
# get dataset
practice_dataset_uri = "weave:///parambharat/hackercup/object/practice_dataset:R35fXf9N3FE2IOesg7bRPaPAxiE9YbpirhXO9HcHs8w"
problems_dataset = weave.ref(practice_dataset_uri).get().rows[:]
problems = list(map(lambda x: Problem(**x), problems_dataset))

In [5]:
from pydantic import BaseModel, Field

class Solution(BaseModel):
    core_question: str = Field(..., description="Core question of the problem")
    problem_solving_info: str = Field(..., description="Problem-solving information related to the core question")
    plan: str = Field(..., description="Step by step plan to solve the problem")
    pseudocode: str = Field(..., description="Pseudocode to solve the problem")
    source_code: str = Field(..., description="Valid Python3 sourcecode to solve the problem.")

In [6]:
system_prompt = """
You are a world-class competitive programmer tasked with solving a programming problem. 
You will be provided with a problem statement, and you need to create a Python3 solution for it. 
Your task it to develop a winning solution to the problem in Python3 programming language.
You will do this in a step-by-step manner.

Step 1: Extract the core question and the problem-solving information from the problem statement.
Step 2: Generate a step by step plan to solve the problem.
Step 3: Generate the pseudocode to solve the problem.
Step 4: Write the final solution in Python3 programming language to solve the problem.

Competition Guidelines:
    a. Do not use any external libraries; stick to Python 3 standard library
    b. Handle input and output using standard input/output (stdin/stdout)
    c. Use helper functions to improve readability of the code.
    c. Use the `input()` function to take input from stdin and print the output to stdout.
    d. Do not add extra print statements otherwise it will fail the test cases.
    e. Make sure your code passes all potential test cases, including edge cases
    f. Follow the input/output format specified in the problem statement and the sample test cases."""

prompt_template = """
Let's think step by step to solve the problem:

Problem: 
{problem_description}

Input: 
{sample_input}

Output: 
{sample_output}
"""

In [7]:
import openai

In [18]:

@weave.op
async def one_shot_solver(
    problem: Problem, 
    llm_model: str,
    system_prompt: str, 
    prompt_template: str,
    temperature: float = 0.7,
    timeout: int = 10
) -> str:
    logging.info(f"Solving problem: {problem.problem_name}")

    # call model one first time to get the code
    logging.info("Calling model to solve the problem")
    # o1_client = openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    model_output = await oai_client.chat.completions.create(
        # model="o1-preview",
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", 
             "content": [
                 {"type": "text", "text": system_prompt}
             ]
             },
            {"role": "user", 
             "content": [
                {"type": "text", "text": prompt_template.format(
                problem_description=problem.problem_description,
                sample_input=problem.sample_input,
                sample_output=problem.sample_output)}
                ]
            }
        ],
        # temperature=temperature,
    )

    out = model_output.choices[0].message.content

    # extract code from the response
    logging.info("Formatting the response")
    solution = await format_response(out, Solution)

    # check if the code is correct
    logging.info("Checking if the code is correct")
    test_report = await check_correctness(
        solution.source_code,
        problem.sample_input,
        problem.sample_output,
        timeout=timeout,
    )

    return {"solution": solution, "test_report": test_report}

# Evaluation

In [19]:
# STRONG_LLM = "o1-preview"
STRONG_LLM = "gpt-4o-2024-08-06"

In [20]:
class OneShotSolver(weave.Model):
    code_execution_timeout: int = 30
    llm_model: str = STRONG_LLM
    system_prompt: str = system_prompt
    prompt_template: str = prompt_template
    # temperature: float = None

    @weave.op
    async def predict(self, problem: dict):
        return await one_shot_solver(
            problem=Problem(**problem), 
            llm_model=self.llm_model,
            system_prompt=self.system_prompt, 
            prompt_template=self.prompt_template, 
            timeout=self.code_execution_timeout,
            # temperature=self.temperature
        )

In [21]:
evals_dataset = [{"problem": problem.model_dump(), "expected_result": "passed"} for problem in problems]

In [22]:
@weave.op
def scorer(expected_result: str, model_output: dict) -> dict:
    if model_output is None or model_output["test_report"].status is None:
        return {"solution_passed": False}
    return {"solution_passed": expected_result == model_output["test_report"].status} # check if the test_report status == passed

In [24]:
model = OneShotSolver()

evaluator = weave.Evaluation(dataset=evals_dataset, scorers=[scorer], trials=1)

results = await evaluator.evaluate(model)