# LLM-Reasoners Demo

This notebook is accompanied with our tutorial at SIGIR VF:
[[slides](https://www.llm-reasoners.net/2024-02-Reasoners-SIGIR.pdf)]
[[video](https://www.youtube.com/watch?v=d_x2pzEHGQY&pp=ygUJc2hpYm8gaGFv) (starting at 37:20)]

## Setup

The following code assumes you have cloned our library with `git clone https://github.com/maitrix-org/llm-reasoners.git --recursive`

This notebook is tested with several model choices.
- By default, the output cells are the results using [`TheBloke/Llama-2-70B-GPTQ`](https://huggingface.co/TheBloke/Llama-2-70B-GPTQ) with GPT-Q quantization.
- You could also use the [`meta-llama/Llama-3.1-8B`](https://huggingface.co/meta-llama/Llama-3.1-8B) with either SGLang or Huggingface model (the command lines are shown below), but the results will not be the same as output cells.

In [5]:
# This block would load Llama-2-70B-GPTQ with Exllama

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
from reasoners.lm import ExLlamaModel
import torch

# https://huggingface.co/TheBloke/Llama-2-70B-GPTQ
# It may take a few minutes to download the model

model = ExLlamaModel(model_dir='TheBloke/Llama-2-70B-GPTQ',
                     lora_dir=None,
                     device = torch.device("cuda:0"),
                     max_batch_size=1,
                     max_new_tokens=200,
                     mem_map=[16,22], # For 2 * 24GB GPUs. If you have > 40GB you can set it to None
                     max_seq_length=2048)

KeyboardInterrupt: 

In [None]:
# Huggingface

import torch
from reasoners.lm import HFModel
model_name = "meta-llama/Llama-3.1-8B"
model = HFModel(model_name, model_name, device=torch.device("cuda:0"), max_batch_size=1, max_new_tokens=512)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  3.23s/it]


In [None]:
# SGLang model
# You'll need to first set up a server with the [SGLang](https://github.com/sgl-project/sglang) API.

import os
from reasoners.lm import SGLangModel
os.environ["SGLANG_API_URL"] = "http://127.0.0.1:30001"
os.environ["OPENAI_API_KEY"] = "None"
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model = SGLangModel(model_name, is_instruct_model = True)

  from .autonotebook import tqdm as notebook_tqdm


We gather one example from the Blocksworld dataset, and the proper prompt for in-context learning examples.
We will talk more about Evaluators later.

In [None]:
len(evaluator.full_dataset[:1000])

84

In [None]:
from reasoners.benchmark import BWEvaluator
import json

with open('examples/CoT/blocksworld/prompts/pool_prompt_v1.json') as f:
    prompt = json.load(f)
evaluator = BWEvaluator(config_file='examples/CoT/blocksworld/data/bw_config.yaml',
                        domain_file='examples/CoT/blocksworld/data/generated_domain.pddl',
                        data_path='examples/CoT/blocksworld/data/split_v1/split_v1_step_4_data.json',
                        init_prompt=prompt)
prompt = evaluator.sample_prompt(shuffle_prompt=False, num_shot=4)
example = evaluator.full_dataset[:10]
# cot_inputs = (prompt['icl'].replace('<init_state>', example["init"])
#                            .replace('<goals>', example["goal"])
#                            .replace('<action>', ''))

In [None]:
cot_inputs = []
for e in example:
    temp= (prompt['icl'].replace('<init_state>', e["init"])
                        .replace('<goals>', e["goal"])
                        .replace('<action>', ''))
    cot_inputs.append(temp)

NameError: name 'example' is not defined

In [None]:
output = model.generate([cot_inputs],
                        hide_input=True,
                        eos_token_id='\n[').text[0][:-1].strip()

In [None]:
import os
import shutil
from multiprocessing import Pool
import reasoners.benchmark.bw_utils as bw_utils

def clear_directory(directory):
    if os.path.exists(directory):
        shutil.rmtree(directory) 
    os.makedirs(directory) 

def process_input(args):
    index, input_data = args
    
    output = model.generate([input_data],
                             hide_input=True,
                             eos_token_id=['\n[']).text[0][:-1].strip()
    
    output_dir = "CoT_results"
    
    output_file = os.path.join(output_dir, f"process_{index}.txt")
    
    with open(output_file, 'w') as file:
        file.write(output)
    
    return output_file

if __name__ == "__main__":
    num_processes = 10
    output_dir = "CoT_results"

    clear_directory(output_dir)

    inputs_with_indices = list(enumerate(cot_inputs))

    with Pool(num_processes) as pool:
        results = pool.map(process_input, inputs_with_indices)
    
    print("Files saved:", results)


Files saved: ['CoT_results/process_0.txt', 'CoT_results/process_1.txt', 'CoT_results/process_2.txt', 'CoT_results/process_3.txt', 'CoT_results/process_4.txt', 'CoT_results/process_5.txt', 'CoT_results/process_6.txt', 'CoT_results/process_7.txt', 'CoT_results/process_8.txt', 'CoT_results/process_9.txt']


In [None]:
import reasoners.benchmark.bw_utils as bw_utils

bw_utils.text_to_plan_blocksworld(output, example["instance_file"], 'examples/CoT/blocksworld/data/bw_config.yaml', 'examples/CoT/blocksworld/data/generated_domain.pddl', 'tmp_plan2.txt')

[+]: Saving plan in tmp_plan2.txt


('(pick-up b)\n(stack b c)\n(pick-up d)\n(stack d c)\n',
 '(pick-up blue)\n(stack blue orange)\n(pick-up yellow)\n(stack yellow orange)\n')

In [None]:
example

{'init': 'the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table',
 'goal': 'the red block is on top of the blue block',
 'plan': '\nunstack the orange block from on top of the red block\nput down the orange block\npick up the red block\nstack the red block on top of the blue block\n[PLAN END]\n',
 'question': '\n[STATEMENT]\nAs initial conditions I have that, the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table.\nMy goal is to have that the red block is on top of the blue block.\n\nMy plan is as follows:\n\n[PLAN]\n',
 'instance_file': 'LLMs-Planning/llm_planning_analysis/instances/blocksworld/generated_basic_3/instance-52.pddl'}

In [None]:
len(example)

76

In [None]:
cot_inputs = []
for e in example:
    temp= (prompt['icl'].replace('<init_state>', e["init"])
                        .replace('<goals>', e["goal"])
                        .replace('<action>', ''))
    cot_inputs.append(temp)

TypeError: string indices must be integers

In [None]:
len(cot_inputs)

3719

In [None]:
for i in cot_inputs:
    temp = model.generate([i],
                            hide_input=True,
                            eos_token_id=['\n[']).text[0][:-1].strip()

KeyboardInterrupt: 

In [None]:
from multiprocessing import Pool

def process_input(i):
    temp = model.generate([i],
                          hide_input=True,
                          eos_token_id=['\n[']).text[0][:-1].strip()
    return temp


if __name__ == "__main__":
    num_processes = 100

    with Pool(num_processes) as pool:
        results = pool.map(process_input, cot_inputs)
    print(results)


['pick up the blue block\nstack the blue block on top of the orange block', 'pick up the red block\nstack the red block on top of the blue block', 'pick up the blue block\nstack the blue block on top of the orange block\npick up the yellow block\nstack the yellow block on top of the orange block', 'unstack the blue block from on top of the yellow block\nput down the blue block\npick up the red block\nstack the red block on top of the yellow block\npick up the blue block\nstack the blue block on top of the orange block', 'pick up the blue block\nstack the blue block on top of the red block', 'pick up the blue block\nstack the blue block on top of the red block', 'pick up the blue block\nstack the blue block on top of the orange block\npick up the yellow block\nstack the yellow block on top of the red block', 'unstack the orange block from on top of the yellow block\nput down the orange block\npick up the blue block\nstack the blue block on top of the orange block', 'pick up the yellow b

In [None]:
results[83]

'pick up the white block\nstack the white block on top of the red block\npick up the yellow block\nstack the yellow block on top of the blue block\npick up the orange block\nstack the orange block on top of the blue block'

Here is the example.

In [None]:
print(example['init'])

the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table


In [None]:
print(example['goal'])

the red block is on top of the blue block


## Chain-of-Thought
We first experiment with the Chain-of-Thought method.
Since we are having the simplest generation algorithm, we directly ask the model to generate all the steps.
We look at the 4-shot prompt and the generated answer.

In [None]:
print(cot_inputs)

I am playing with a set of blocks where I need to arrange the blocks into stacks. Here are the actions I can do

Pick up a block
Unstack a block from on top of another block
Put down a block
Stack a block on top of another block

I have the following restrictions on my actions:
I can only pick up or unstack one block at a time.
I can only pick up or unstack a block if my hand is empty.
I can only pick up a block if the block is on the table and the block is clear. A block is clear if the block has no other blocks on top of it and if the block is not picked up.
I can only unstack a block from on top of another block if the block I am unstacking was really on top of the other block.
I can only unstack a block from on top of another block if the block I am unstacking is clear.
Once I pick up or unstack a block, I am holding the block.
I can only put down a block that I am holding.
I can only stack a block on top of another block if I am holding the block being stacked.
I can only stack a 

In [None]:
output = model.generate([cot_inputs],
                        hide_input=True,
                        eos_token_id='\n[').text[0][:-1].strip()

In [None]:
import reasoners.benchmark.bw_utils as bw_utils

bw_utils.text_to_plan_blocksworld(output, answer["instance_file"], self.config_file, self.domain_file, self.lm_plan_file)

In [None]:
output

'pick up the red block\nstack the red block on top of the blue block'

In [None]:
print(output)

pick up the red block
stack the red block on top of the blue block


Clearly that's not a valid solution :( 
The orange block is on the red block, so we cannot pick up the red block as the first step.

## Tree-of-Thought
Then let's turn to a tree search algorithm, [Tree-of-Thought]((https://arxiv.org/abs/2305.10601)).
We will need to define a simple world model, and a search algorithm, for the Blocksworld task.

In [None]:
from reasoners import WorldModel, LanguageModel, SearchConfig, State, Reasoner
from reasoners.algorithm import BeamSearch, MCTS
import reasoners.benchmark.bw_utils as utils
from typing import NamedTuple
import copy
import numpy as np


# We use NamedTuple for clearer presentation, you may just use normal tuple if you want a quick experiment.
class BWStateToT(NamedTuple):
    step_idx: int
    action_history: list[str]
    end: bool


# We just use the description str as the action, we use a type alias for better presentation.
# You may directly use str of you want a quick experiment.
BWAction = str


class BlocksWorldModelToT(WorldModel):
    def __init__(self,
                 base_model: LanguageModel,
                 prompt: dict,
                 max_steps: int = 4,
                 batch_size: int = 1) -> None:
        super().__init__()
        self.max_steps = max_steps
        self.base_model = base_model
        self.prompt = prompt
        self.batch_size = batch_size

    def init_state(self) -> BWStateToT:
        return BWStateToT(step_idx=0, action_history=[], end=False)
    
    def step(self, state: BWStateToT, action: BWAction) -> tuple[BWStateToT, dict]:
        state = copy.deepcopy(state)
        if action != "[PLAN END]":
            state = BWStateToT(step_idx=state.step_idx + 1, action_history=state.action_history + [action], end=False)
        else:
            state = BWStateToT(step_idx=state.step_idx + 1, action_history=state.action_history, end=True)
        return state, {}  # the dict is auxiliary information for SearchConfig, we don't need it here.
    
    def is_terminal(self, state: State) -> bool:
        return state.end or state.step_idx >= self.max_steps


class BWConfigToT(SearchConfig):
    def __init__(self,
                 base_model: LanguageModel,
                 prompt: dict,
                 temperature: float = 0.8,
                 n_candidate: int = 4) -> None:
        super().__init__()
        self.base_model = base_model
        self.example = None
        self.prompt = prompt
        self.n_candidate = n_candidate
        self.temperature = temperature

    def get_actions(self, state: BWStateToT) -> list[BWAction]:
        prompts = (self.prompt["icl"]
                       .replace("<action>", "\n".join(state.action_history + [""]))
                       .replace("<init_state>", utils.extract_init_state(self.example))
                       .replace("<goals>", utils.extract_goals(self.example, return_raw=True)))
        outputs = self.base_model.generate([prompts],
                                           num_return_sequences=self.n_candidate,
                                           max_length=20,
                                           eos_token_id="\n",
                                           temperature=self.temperature,
                                           do_sample=True,
                                           hide_input=True).text
        outputs = [output.split("\n")[0] for output in outputs]
        outputs = list(dict.fromkeys(outputs))  # deduplicate
        return outputs

    # Some reward functions are fast to calculate.
    # We calculate the reward before executing the action, which can be used to better guide the search.
    def fast_reward(self, state: BWStateToT, action: BWAction) -> tuple[float, dict]:
        # We use two rewards here:
        # 1. Intuition: The loglikelihood of the action given the prompt.
        # 2. Self-eval: Ask the language model whether this step is "Good".
        inputs = self.prompt["icl"].replace("<action>", "\n".join(state.action_history + [""])) \
            .replace("<init_state>", utils.extract_init_state(self.example)) \
            .replace("<goals>", utils.extract_goals(self.example, return_raw=True))[:-1]
        
        intuition = self.base_model.get_loglikelihood(inputs + "\n", [inputs + "\n" + action])[0]

        self_eval_prompt = (self.prompt["self-eval"].replace("<init_state>", utils.extract_init_state(self.example))
                                                    .replace("<goals>", utils.extract_goals(self.example, return_raw=True))
                                                    .replace("<action>", action))
        self_eval = self.base_model.get_loglikelihood(self_eval_prompt, [self_eval_prompt + "good"])[0]

        return intuition + self_eval, {'intuition': intuition, "self_eval": self_eval}
    
    # kwargs is the auxiliary information returned by SearchConfig.fast_reward and WorldModel.step,
    # so that we do not need duplicated calculations.
    # In this case, we just use the fast_reward result as the reward.
    # Generally, if a reward function depends on the new state, or is slow to calculate,
    # we will calculate it here.
    def reward(self, state, action, **kwargs) -> tuple[float, dict]:
        return kwargs['intuition'] + kwargs['self_eval'], kwargs

Note: The following command may take to 2 minutes to run

In [None]:

world_model = BlocksWorldModelToT(base_model=model, prompt=prompt)
config = BWConfigToT(base_model=model, prompt=prompt)
algorithm = BeamSearch(beam_size=4, max_depth=7)
reasoner_tot = Reasoner(world_model=world_model, search_config=config, search_algo=algorithm)
#result_tot = reasoner_tot(example)


# import time

# start_time_cell1 = time.time()

# result_tot = [reasoner_tot(e) for e in example]
# print(result_tot)

# time.sleep(5)  # Simulating a long-running task
# end_time_cell1 = time.time()

# # Save timing info
# with open("ToT_log_SgLang.txt", "a") as log_file:
#     log_file.write(f"Elapsed time: {end_time_cell1 - start_time_cell1} seconds\n")



In [None]:
import os
import shutil
from multiprocessing import Pool

def clear_directory(directory):
    if os.path.exists(directory):
        shutil.rmtree(directory)
    os.makedirs(directory)

def process_example_ToT(args):
    index, example_input = args

    # Reasoning process for each example
    result = reasoner_tot(example_input)

    # Save the result to an output file
    output_dir = "ToT_results"
    output_file = os.path.join(output_dir, f"process_{index}.txt")

    with open(output_file, 'w') as file:
        file.write(str(result))

    return output_file

if __name__ == "__main__":
    num_processes = 10
    output_dir = "ToT_results"

    clear_directory(output_dir)

    inputs_with_indices = list(enumerate(example))

    with Pool(num_processes) as pool:
        results = pool.map(process_example_ToT, inputs_with_indices)

    print("Files saved:", results)




Files saved: ['ToT_results/process_0.txt', 'ToT_results/process_1.txt', 'ToT_results/process_2.txt', 'ToT_results/process_3.txt', 'ToT_results/process_4.txt', 'ToT_results/process_5.txt', 'ToT_results/process_6.txt', 'ToT_results/process_7.txt', 'ToT_results/process_8.txt', 'ToT_results/process_9.txt']


In [None]:
example

{'init': 'the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table',
 'goal': 'the blue block is on top of the red block',
 'plan': '\nunstack the orange block from on top of the red block\nput down the orange block\npick up the blue block\nstack the blue block on top of the red block\n[PLAN END]\n',
 'question': '\n[STATEMENT]\nAs initial conditions I have that, the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table.\nMy goal is to have that the blue block is on top of the red block.\n\nMy plan is as follows:\n\n[PLAN]\n',
 'instance_file': 'LLMs-Planning/llm_planning_analysis/instances/blocksworld/generated_basic_3/instance-71.pddl'}

In [None]:
print('Action, Reward')
for action, _, reward in result_tot.trace:
    print(action, reward)

Action, Reward
None 0.0
unstack the orange block from on top of the red block -0.7903961927319566
put down the orange block -0.7176974050467834
pick up the red block -0.50134990173392
stack the red block on top of the blue block -0.5919118742342107


Still the same error :(

## RAP
With [RAP](https://arxiv.org/abs/2305.14992), we are truly using the latest block configuration as the state, instead of a history of actions.
Thus, we define a new world model to transit between states, which is just a little complex than the previous one.

In [None]:
BWAction = str


class BWStateRAP(NamedTuple):
    step_idx: int
    last_blocks_state: str
    blocks_state: str
    buffered_action: BWAction


class BlocksWorldModelRAP(WorldModel):
    def __init__(self,
                 base_model: LanguageModel,
                 prompt: dict,
                 max_steps: int = 4,
                 batch_size: int = 1) -> None:
        super().__init__()
        self.max_steps = max_steps
        self.base_model = base_model
        self.prompt = prompt
        self.batch_size = batch_size

    def init_state(self) -> BWStateRAP:
        return BWStateRAP(step_idx=0, last_blocks_state="", blocks_state=utils.
                       extract_init_state(self.example), buffered_action="")

    def step(self, state: BWStateRAP, action: BWAction) -> tuple[BWStateRAP, dict]:
        state = copy.deepcopy(state)
        blocks_state = state.blocks_state
        step_idx = state.step_idx
        blocks_state = self.update_blocks(blocks_state, action)
        new_buffered_action = action if state.buffered_action == "" else ""

        state = BWStateRAP(step_idx=step_idx + 1,
                        last_blocks_state=state.blocks_state,
                        blocks_state=blocks_state,
                        buffered_action=new_buffered_action)
        return state, {"goal_reached": utils.goal_check(utils.extract_goals(self.example), blocks_state)}

    def update_blocks(self, block_states: str, action: BWAction) -> str:
        if "pick" in action:
            key = "world_update_pickup"
        elif "unstack" in action:
            key = "world_update_unstack"
        elif "put" in action:
            key = "world_update_putdown"
        elif "stack" in action:
            key = "world_update_stack"
        else:
            raise ValueError("Invalid action")
        world_update_prompt = self.prompt[key].format(block_states, action.capitalize() + ".")
        world_output = self.base_model.generate([world_update_prompt],
                                                eos_token_id=["\n", ".\n", ".\n\n"],
                                                hide_input=True,
                                                temperature=0).text[0].strip()
        new_state = utils.apply_change(world_output, block_states)
        return new_state

    def is_terminal(self, state: BWStateRAP) -> bool:
        if utils.goal_check(utils.extract_goals(self.example), state.blocks_state)[0]:
            return True
        elif state.step_idx == self.max_steps:
            return True
        return False

In [None]:
class BWConfigRAP(SearchConfig):
    def __init__(self,
                 base_model: LanguageModel,
                 prompt: dict,
                 batch_size: int = 1,
                 reward_alpha: float = 0.5,
                 goal_reward_default: float = 0.,
                 goal_reached_reward: float = 100.) -> None:
        super().__init__()
        self.base_model = base_model
        self.example = None
        self.prompt = prompt
        self.batch_size = batch_size
        self.reward_alpha = reward_alpha
        self.goal_reward_default = goal_reward_default
        self.goal_reached_reward = goal_reached_reward

    def get_actions(self, state: BWStateRAP) -> list[BWAction]:
        blocks_state = state.blocks_state
        return utils.generate_all_actions(blocks_state)

    def fast_reward(self, state: BWStateRAP, action: BWAction) -> tuple[float, dict]:
        if state.buffered_action == "":
            current_blocks_state = state.blocks_state
        else:
            current_blocks_state = state.last_blocks_state
        previous_action = state.buffered_action + "\n" if state.buffered_action != "" else ""
        
        # every two steps, we will also reduce the icl examples by 2 steps
        # so that the distribution of step length in examples is more reasonable
        icl_template = self.prompt["icl_list"][state.step_idx // 2]
        
        inputs = (icl_template.replace("<init_state>", current_blocks_state)
                              .replace("<goals>", utils.extract_goals(self.example, return_raw=True))
                              .replace("<action>", previous_action))
        intuition = self.base_model.get_loglikelihood(inputs, [inputs + action])[0]

        self_eval_prompt = (self.prompt["self-eval"]
                                .replace("<init_state>", current_blocks_state)
                                .replace("<goals>", utils.extract_goals(self.example, return_raw=True))
                                .replace("<action>", action))
        self_eval = self.base_model.get_loglikelihood(self_eval_prompt, [self_eval_prompt + "good"])[0]

        return (self.calculate_reward(intuition, self_eval),
                {'intuition': intuition, "self_eval": self_eval})

    def calculate_reward(self, intuition, self_eval, goal_reached=None) -> float:
        # to provide a unified interface for reward and fast_reward
        if goal_reached is None:
            goal_reward = self.goal_reward_default
        elif goal_reached[0]:
            goal_reward = self.goal_reached_reward
        else:
            goal_reward = goal_reached[1]
        return (intuition + self_eval) * self.reward_alpha + goal_reward * (1 - self.reward_alpha)

    def reward(self, state: BWStateRAP, action: BWAction,
               intuition: float = None,
               self_eval: float = None,
               goal_reached: tuple[bool, float] = None) -> tuple[float, dict]:
        return (self.calculate_reward(intuition, self_eval, goal_reached),
                {'intuition': intuition, 'goal_reached': goal_reached})

We just use the MCTS algorithm embedded in Reasoners, and build up the pipeline again.
Note: the following command may take 2 minutes to run

In [None]:
world_model = BlocksWorldModelRAP(base_model=model, prompt=prompt, max_steps=4)
config = BWConfigRAP(base_model=model, prompt=prompt)
algorithm = MCTS(depth_limit=4, disable_tqdm=False, output_trace_in_each_iter=True, n_iters=10)
reasoner_rap = Reasoner(world_model=world_model, search_config=config, search_algo=algorithm)
#result_rap = reasoner_rap(example)

# result_rap = [reasoner_rap(e) for e in example]
# print(result_rap)





In [None]:
import os
import shutil
from multiprocessing import Pool

def clear_directory(directory):
    if os.path.exists(directory):
        shutil.rmtree(directory)
    os.makedirs(directory)

def process_example_RAP(args):
    index, example_input = args

    # Reasoning process for each example
    result = reasoner_rap(example_input)

    # Save the result to an output file
    output_dir = "RAP_results"
    output_file = os.path.join(output_dir, f"process_{index}.txt")

    with open(output_file, 'w') as file:
        file.write(str(result))

    return output_file

if __name__ == "__main__":
    num_processes = 10
    output_dir = "RAP_results"

    clear_directory(output_dir)

    inputs_with_indices = list(enumerate(example))

    with Pool(num_processes) as pool:
        results = pool.map(process_example_RAP, inputs_with_indices)

    print("Files saved:", results)

                                                              

Files saved: ['RAP_results/process_0.txt', 'RAP_results/process_1.txt', 'RAP_results/process_2.txt', 'RAP_results/process_3.txt', 'RAP_results/process_4.txt', 'RAP_results/process_5.txt', 'RAP_results/process_6.txt', 'RAP_results/process_7.txt', 'RAP_results/process_8.txt', 'RAP_results/process_9.txt']


In [None]:
result_rap.trace

([BWStateRAP(step_idx=0, last_blocks_state='', blocks_state='the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table.', buffered_action=''),
  BWStateRAP(step_idx=1, last_blocks_state='the blue block is clear, the orange block is clear, the hand is empty, the orange block is on top of the red block, the red block is on the table and the blue block is on the table.', blocks_state='the blue block is clear, the orange block is in the hand, the red block is clear, the hand is holding the orange block, the blue block is on the table, and the red block is on the table.', buffered_action='unstack the orange block from on top of the red block'),
  BWStateRAP(step_idx=2, last_blocks_state='the blue block is clear, the orange block is in the hand, the red block is clear, the hand is holding the orange block, the blue block is on the table, and the red block is on the table

Finally, we get a valid solution!

## Visualization

Visualization is as simple as calling `visualize(log)`

In [None]:
from reasoners.visualization import visualize
from reasoners.visualization.tree_snapshot import NodeData, EdgeData
from reasoners.algorithm.mcts import MCTSNode


# (Optional) You can write node_data_factory and edge_data_factory to show customized information.
def blocksworld_node_data_factory(n: MCTSNode) -> NodeData:
    return NodeData({"block state": n.state.blocks_state if n.state else "Not expanded",
                     "# goals satisfied": n.reward_details["goal_reached"][1] if hasattr(n, "reward_details") else "N/A",
                     "# visited": len(n.cum_rewards)})

def blocksworld_edge_data_factory(n: MCTSNode) -> EdgeData:
    return EdgeData({"Q": n.Q,
                     "intuition": n.fast_reward_details["intuition"],
                     "self_eval": n.fast_reward_details["self_eval"],
                     "action": n.action})

visualize(result_rap,
          node_data_factory=blocksworld_node_data_factory,
          edge_data_factory=blocksworld_edge_data_factory)

Getting log upload link...
Tree log size: 50506 bytes
Tree log compressed size: 1515 bytes
Uploading log...
Visualizer URL: https://main.d1puk3wdon4rk8.amplifyapp.com/visualizer/31b74c5f-37d1-4fb2-ae4b-ca399f837d9a?accessKey=6f66ce74


This evaluator module provides standard APIs and easy implementation of multiple popular reasoning datasets.

In [None]:
# a helper function to extract the action history from the output of the algorithm

def bfs_bw_extractor(algo_output):
    if torch.distributed.is_initialized():
        torch.distributed.barrier()
    # to make sure the plan is saved before evaluation in multi-process setting
    try:
        return "\n".join(algo_output.terminal_node.state.action_history)
    except Exception as e:
        print("Error in output extraction,", e)
        return ""

In [None]:
import os

# Replace this with the actual path to your VAL folder
val_path = "data/irving/VAL/bin"  
os.environ["VAL"] = val_path

In [None]:
import torch
from reasoners.benchmark import BWEvaluator
import json

torch.cuda.set_device(0)

with open('examples/CoT/blocksworld/prompts/pool_prompt_v1.json') as f:
    prompt = json.load(f)

evaluator = BWEvaluator(config_file='examples/CoT/blocksworld/data/bw_config.yaml',
                        domain_file='examples/CoT/blocksworld/data/generated_domain.pddl',
                        data_path='examples/CoT/blocksworld/data/split_v1/split_v1_step_4_data.json',
                        init_prompt=prompt,
                        output_extractor=bfs_bw_extractor)

world_model = BlocksWorldModelToT(base_model=model, prompt=prompt)
config = BWConfigToT(base_model=model, prompt=prompt)
algorithm = BeamSearch(beam_size=4, max_depth=7)
reasoner_tot = Reasoner(world_model=world_model, search_config=config, search_algo=algorithm)

evaluator.evaluate(reasoner_tot, shuffle_prompt=True, num_shot=4, resume=0)

  from .autonotebook import tqdm as notebook_tqdm


KeyboardInterrupt: 

In [4]:
import os
from reasoners.lm.deepseek_model import DeepseekModel
os.environ["DEEPSEEK_API_KEY"] = "sk-b309eb85b0954ff28d793fa782978406"
os.environ["OPENROUTER_API_KEY"] = "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"
model_name = 'deepseek/deepseek-r1'
model = DeepseekModel(model_name, backend = 'openrouter',max_tokens = None)

In [None]:
response = model.generate(["Every gerpus is a terpus. Every terpus is a zhorpus. Every lempus is a yerpus. Every boompus is a zhorpus. Every brimpus is a rempus. Every lempus is a jelpus. Every lorpus is a rorpus. Bob is a yerpus. Every worpus is a rempus. Every lempus is a impus. Every rempus is a sterpus. Every yimpus is a zumpus. Every lempus is a yumpus. Every shumpus is a jelpus. Every brimpus is a zhorpus. Every scrompus is a rempus. Every lempus is a wumpus. Sally is a boompus. Sally is a gerpus. Every gerpus is a scrompus. Bob is a wumpus. Every wumpus is a lorpus. Every yerpus is a rorpus. Sally is a terpus. Every gerpus is a zhorpus. Every yimpus is a zhorpus. Every boompus is a terpus. Every gerpus is a worpus. Bob is a lorpus. Every gerpus is a yimpus. Every scrompus is a brimpus. Every lempus is a rorpus. Every lempus is a shumpus. Bob is a jelpus. Sally is a scrompus. Every gerpus is a brimpus. Every lempus is a lorpus. Every boompus is a yumpus. Every scrompus is a zumpus. Every zhorpus is a tumpus. Sally is a zumpus. Every lempus is a storpus. Every yerpus is a lorpus. Every scrompus is a zhorpus. Every yimpus is a rempus. Every impus is a jelpus. Jack is a yimpus. Every yerpus is a wumpus. Every rorpus is a hilpus. Every yimpus is a sterpus. Bob is a lempus. Every worpus is a storpus. Every rorpus is a impus. Every boompus is a gerpus. Is Sally a hilpus or sterpus?\n\nPlease output your conclusion like 'Answer: XXX is YYY'"], temeprature = 0.8, hide_input=True)

ChatCompletion(id='gen-1739500484-oPIrzmP84niDRt97smXo', choices=[Choice(finish_reason='length', index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='token_id:128798', bytes=[60, 116, 104, 105, 110, 107, 62], logprob=0.0, top_logprobs=[]), ChatCompletionTokenLogprob(token='token_id:201', bytes=[10], logprob=-1.1920928244535389e-07, top_logprobs=[]), ChatCompletionTokenLogprob(token='token_id:33001', bytes=[79, 107, 97, 121], logprob=-0.12740746140480042, top_logprobs=[]), ChatCompletionTokenLogprob(token='token_id:14', bytes=[44], logprob=-0.00011324241495458409, top_logprobs=[]), ChatCompletionTokenLogprob(token='token_id:2513', bytes=[32, 108, 101, 116], logprob=-0.1609610617160797, top_logprobs=[]), ChatCompletionTokenLogprob(token='token_id:734', bytes=[39, 115], logprob=-0.20143087208271027, top_logprobs=[]), ChatCompletionTokenLogprob(token='token_id:3047', bytes=[32, 116, 114, 121], logprob=-0.40484288334846497, top_logprobs=[]), ChatCompletionTokenLogprob

In [2]:
from openai import OpenAI
import os
os.environ["DEEPSEEK_API_KEY"] = "sk-b309eb85b0954ff28d793fa782978406"
os.environ["OPENROUTER_API_KEY"] = "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"
client = OpenAI(
                base_url="https://openrouter.ai/api/v1",
                api_key=os.getenv("OPENROUTER_API_KEY", None)
)
messages = [{"role": "user", "content": "Every gerpus is a terpus. Every terpus is a zhorpus. Every lempus is a yerpus. Every boompus is a zhorpus. Every brimpus is a rempus. Every lempus is a jelpus. Every lorpus is a rorpus. Bob is a yerpus. Every worpus is a rempus. Every lempus is a impus. Every rempus is a sterpus. Every yimpus is a zumpus. Every lempus is a yumpus. Every shumpus is a jelpus. Every brimpus is a zhorpus. Every scrompus is a rempus. Every lempus is a wumpus. Sally is a boompus. Sally is a gerpus. Every gerpus is a scrompus. Bob is a wumpus. Every wumpus is a lorpus. Every yerpus is a rorpus. Sally is a terpus. Every gerpus is a zhorpus. Every yimpus is a zhorpus. Every boompus is a terpus. Every gerpus is a worpus. Bob is a lorpus. Every gerpus is a yimpus. Every scrompus is a brimpus. Every lempus is a rorpus. Every lempus is a shumpus. Bob is a jelpus. Sally is a scrompus. Every gerpus is a brimpus. Every lempus is a lorpus. Every boompus is a yumpus. Every scrompus is a zumpus. Every zhorpus is a tumpus. Sally is a zumpus. Every lempus is a storpus. Every yerpus is a lorpus. Every scrompus is a zhorpus. Every yimpus is a rempus. Every impus is a jelpus. Jack is a yimpus. Every yerpus is a wumpus. Every rorpus is a hilpus. Every yimpus is a sterpus. Bob is a lempus. Every worpus is a storpus. Every rorpus is a impus. Every boompus is a gerpus. Is Sally a hilpus or sterpus?\n\nPlease output your conclusion like 'Answer: XXX is YYY'"}]
response = client.chat.completions.create(
    model='deepseek/deepseek-r1',
    messages=messages,
    max_tokens=None,
    temperature=0,
    top_p=1,
    n=1,
    stop=None,
)

In [3]:
response

ChatCompletion(id='gen-1739509687-ESd3h5elLWiIZ0EW0RUG', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Sally is sterpus.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739509687, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=2693, prompt_tokens=482, total_tokens=3175, completion_tokens_details=None, prompt_tokens_details=None), provider='DeepInfra')

In [4]:
[choice.message.content for choice in response.choices]

['Answer: Sally is sterpus.']

In [5]:
import sys
sys.path.append('examples/LongCoT_Search/ProsQA')

from utils import prosqa_extractor 
from reasoner import ProsQAReasoner
from reasoners.benchmark.prosqa import ProsQAEvaluator

In [6]:
evaluator = ProsQAEvaluator(output_extractor=prosqa_extractor, answer_extractor = lambda x: x["answer"])
reasoner = ProsQAReasoner(model)
evaluator.evaluate(reasoner, shuffle_prompt=True, num_shot=4, resume=0)

prosQA:   0%|          | 0/500 [00:00<?, ?it/s]

prosQA:   0%|          | 1/500 [04:00<33:16:16, 240.03s/it]

ChatCompletion(id='gen-1739511160-ANBJ3IzueRbeFRkGgLao', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Sally is a sterpus. \n\n**Step-by-Step Explanation:**\n\n1. **Sally is a gerpus** (Statement 19).\n2. **Every gerpus is a yimpus** (30) → Sally is a yimpus.\n3. **Every yimpus is a sterpus** (50) → Therefore, Sally is a sterpus.\n\nAdditionally, other paths confirm sterpus:\n- **Gerpus → scrompus** (20) → **scrompus → brimpus** (31) → **brimpus → rempus** (5) → **rempus → sterpus** (11).\n- **Gerpus → worpus** (28) → **worpus → rempus** (9) → **rempus → sterpus** (11).\n\nNo valid path leads to Sally being a hilpus (requires rorpus, which she cannot reach). \n\n**Answer: Sally is a sterpus.**', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739511160, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None,

prosQA:   0%|          | 2/500 [06:33<26:10:44, 189.25s/it]

ChatCompletion(id='gen-1739511400-f8wRJTYLTmviHa1eSvMO', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Bob is a shumpus.\n\nThe reasoning begins by identifying Bob\'s direct categories: storpus, rempus, impus, jompus, and yerpus. Key inferences are drawn from hierarchical relationships:\n\n1. **Bob is a storpus** (given).  \n   - From "Every storpus is a grimpus" (33), Bob becomes a **grimpus**.  \n   - From "Every grimpus is a jelpus" (14), Bob becomes a **jelpus**.  \n   - From "Every jelpus is a shumpus" (26), Bob is a **shumpus**.  \n\n2. **Alternative path via rempus**:  \n   - Bob is a rempus (23).  \n   - From "Every rempus is a grimpus" (3), Bob is a **grimpus**, leading to the same chain (grimpus → jelpus → shumpus).  \n\nNo valid path connects Bob to **gerpus**. Gerpus requires being a bompus (Tom, not Bob) or rompus (via kerpus, which Bob isn’t). Thus, Bob is conclusively a **shumpus**.  \n\nAnswer: Bob is a shum

prosQA:   1%|          | 3/500 [09:13<24:16:18, 175.81s/it]

ChatCompletion(id='gen-1739511553-CQXFy6SZPxxkKNePvsgs', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Max is a boompus. \n\nThe reasoning follows these steps:  \n1. Max is a storpus (given).  \n2. Every storpus is a sterpus (premise 1), so Max is a sterpus.  \n3. Every sterpus is a worpus (premise 24), so Max is a worpus.  \n4. Every worpus is a boompus (premise 25), thus Max is a boompus.  \n\nNo valid path connects Max to gwompus, as gwompus requires being a rempus or jelpus, which Max’s lineage (storpus → sterpus → worpus) does not intersect.  \n\n**Answer: Max is a boompus.**', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739511553, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=2263, prompt_tokens=284, total_tokens=2547, completion_tokens_details=None

prosQA:   1%|          | 4/500 [10:36<19:10:35, 139.18s/it]

ChatCompletion(id='gen-1739511713-84pM6W8fhYncquCKKU20', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Davis is a worpus. \n\nThe reasoning follows these steps:  \n1. **Davis is a lorpus** (given).  \n2. Every **lorpus** is a **bompus** (given).  \n3. Every **bompus** is a **rempus** (given).  \n4. Every **rempus** is a **timpus** (given).  \n5. Every **timpus** is a **yumpus** (given).  \n6. Every **yumpus** is a **tumpus** (given).  \n7. Every **tumpus** is a **worpus** (given).  \n\nThis chain establishes Davis as a **worpus**. Additionally, Davis is explicitly stated to be a **vumpus**, but this path connects to **sterpus** (via **terpus**), which does not conflict with the **worpus** conclusion. Since **chorpus** is only linked to **quimpus** (Fae’s category), Davis cannot be a **chorpus**.  \n\n**Answer: Davis is a worpus.**', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_fin

prosQA:   1%|          | 5/500 [15:51<27:49:35, 202.37s/it]

ChatCompletion(id='gen-1739511796-tQT11kicTtpD6JAaI5Br', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Rex is a rompus. \n\n**Step-by-Step Explanation:**\n\n1. **Rex is a jelpus** (given).\n2. From "Every jelpus is a tumpus" → Rex is a tumpus.\n3. From "Every tumpus is a zumpus" → Rex is a zumpus.\n4. From "Every zumpus is a rompus" → Rex is a rompus.\n\n**Alternative Path:**\n1. From "Every jelpus is a shumpus" → Rex is a shumpus.\n2. From "Every shumpus is a kerpus" → Rex is a kerpus.\n3. From "Every kerpus is a rompus" → Rex is a rompus.\n\nBoth paths confirm Rex is a **rompus**. No valid paths lead to lempus for Rex. \n\nAnswer: Rex is a rompus.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739511796, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=4387

prosQA:   1%|          | 6/500 [17:48<23:47:24, 173.37s/it]

ChatCompletion(id='gen-1739512111-TTSod63PS4OoRYe8A9UT', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Tom is a vumpus.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739512111, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1873, prompt_tokens=284, total_tokens=2157, completion_tokens_details=None, prompt_tokens_details=None), provider='Fireworks')
Answer: Tom is a vumpus.
Case #6: correct=True, output='Tom is a vumpus.', answer='Tom is a vumpus.';accuracy=1.000 (6/6)


prosQA:   1%|▏         | 7/500 [21:38<26:18:50, 192.15s/it]

ChatCompletion(id='gen-1739512228-hnlCf8hVOxQ9VNxDcJpo', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))], created=1739512228, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=502, prompt_tokens=297, total_tokens=799, completion_tokens_details=None, prompt_tokens_details=None), provider='Nebius')

Case #7: correct=False, output='', answer='Stella is a hilpus.';accuracy=0.857 (6/7)


prosQA:   2%|▏         | 8/500 [24:01<24:07:04, 176.47s/it]

ChatCompletion(id='gen-1739512459-IbIwKNDcWr0tSNxTOxN9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Fae is a dumpus. \n\n**Step-by-Step Explanation:**\n\n1. **Fae is a brimpus** (Statement 17).\n2. **Every brimpus is a storpus** (Statement 23).\n3. **Every storpus is a dumpus** (Statement 29).\n\nThus, Fae inherits the dumpus category through the chain: brimpus → storpus → dumpus. No valid path connects Fae to yerpus, as yerpus requires being a jompus (Statement 11), and Fae’s categories (brimpus, storpus, etc.) do not link to jompus. \n\n**Answer:** Fae is a dumpus.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739512459, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=2991, prompt_tokens=336, total_tokens=3327, completion_tokens_details=None, prompt_to

prosQA:   2%|▏         | 9/500 [25:11<19:31:38, 143.17s/it]

ChatCompletion(id='gen-1739512602-VjdQeMjwwTrTRAlX4Lfw', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Davis is a quimpus. \n\n**Step-by-Step Explanation:**\n\n1. **Davis is a sterpus** (Statement 23).\n2. **Every sterpus is a boompus** (Statement 13) → Davis is a boompus.\n3. **Every boompus is a lorpus** (Statement 33) → Davis is a lorpus.\n4. **Every lorpus is a quimpus** (Statement 26) → Davis is a quimpus.\n\nNo valid path connects Davis to bompus, as he is not a jompus (the only route to bompus via Statement 9). Thus, Davis is definitively a quimpus.\n\nAnswer: Davis is a quimpus.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739512602, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1453, prompt_tokens=342, total_tokens=1795, completion_tokens_detail

prosQA:   2%|▏         | 10/500 [28:08<20:54:20, 153.59s/it]

ChatCompletion(id='gen-1739512671-Tzswxl05r29qRKaojJJN', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Answer: Alex is a rorpus.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), native_finish_reason='stop')], created=1739512671, model='deepseek/deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1921, prompt_tokens=310, total_tokens=2231, completion_tokens_details=None, prompt_tokens_details=None), provider='Fireworks')
Answer: Alex is a rorpus.
Case #10: correct=True, output='Alex is a rorpus.', answer='Alex is a rorpus.';accuracy=0.900 (9/10)


prosQA:   2%|▏         | 10/500 [30:19<24:45:55, 181.95s/it]


KeyboardInterrupt: 

In [None]:
question = "Every gerpus is a terpus. Every terpus is a zhorpus. Every lempus is a yerpus. Every boompus is a zhorpus. Every brimpus is a rempus. Every lempus is a jelpus. Every lorpus is a rorpus. Bob is a yerpus. Every worpus is a rempus. Every lempus is a impus. Every rempus is a sterpus. Every yimpus is a zumpus. Every lempus is a yumpus. Every shumpus is a jelpus. Every brimpus is a zhorpus. Every scrompus is a rempus. Every lempus is a wumpus. Sally is a boompus. Sally is a gerpus. Every gerpus is a scrompus. Bob is a wumpus. Every wumpus is a lorpus. Every yerpus is a rorpus. Sally is a terpus. Every gerpus is a zhorpus. Every yimpus is a zhorpus. Every boompus is a terpus. Every gerpus is a worpus. Bob is a lorpus. Every gerpus is a yimpus. Every scrompus is a brimpus. Every lempus is a rorpus. Every lempus is a shumpus. Bob is a jelpus. Sally is a scrompus. Every gerpus is a brimpus. Every lempus is a lorpus. Every boompus is a yumpus. Every scrompus is a zumpus. Every zhorpus is a tumpus. Sally is a zumpus. Every lempus is a storpus. Every yerpus is a lorpus. Every scrompus is a zhorpus. Every yimpus is a rempus. Every impus is a jelpus. Jack is a yimpus. Every yerpus is a wumpus. Every rorpus is a hilpus. Every yimpus is a sterpus. Bob is a lempus. Every worpus is a storpus. Every rorpus is a impus. Every boompus is a gerpus. Is Sally a hilpus or sterpus?\n\nPlease output your conclusion like 'Answer: XXX is YYY'"

"Every gerpus is a terpus. Every terpus is a zhorpus. Every lempus is a yerpus. Every boompus is a zhorpus. Every brimpus is a rempus. Every lempus is a jelpus. Every lorpus is a rorpus. Bob is a yerpus. Every worpus is a rempus. Every lempus is a impus. Every rempus is a sterpus. Every yimpus is a zumpus. Every lempus is a yumpus. Every shumpus is a jelpus. Every brimpus is a zhorpus. Every scrompus is a rempus. Every lempus is a wumpus. Sally is a boompus. Sally is a gerpus. Every gerpus is a scrompus. Bob is a wumpus. Every wumpus is a lorpus. Every yerpus is a rorpus. Sally is a terpus. Every gerpus is a zhorpus. Every yimpus is a zhorpus. Every boompus is a terpus. Every gerpus is a worpus. Bob is a lorpus. Every gerpus is a yimpus. Every scrompus is a brimpus. Every lempus is a rorpus. Every lempus is a shumpus. Bob is a jelpus. Sally is a scrompus. Every gerpus is a brimpus. Every lempus is a lorpus. Every boompus is a yumpus. Every scrompus is a zumpus. Every zhorpus is a tumpu

In [None]:
output = model.generate(["Every gerpus is a terpus. Every terpus is a zhorpus. Every lempus is a yerpus. Every boompus is a zhorpus. Every brimpus is a rempus. Every lempus is a jelpus. Every lorpus is a rorpus. Bob is a yerpus. Every worpus is a rempus. Every lempus is a impus. Every rempus is a sterpus. Every yimpus is a zumpus. Every lempus is a yumpus. Every shumpus is a jelpus. Every brimpus is a zhorpus. Every scrompus is a rempus. Every lempus is a wumpus. Sally is a boompus. Sally is a gerpus. Every gerpus is a scrompus. Bob is a wumpus. Every wumpus is a lorpus. Every yerpus is a rorpus. Sally is a terpus. Every gerpus is a zhorpus. Every yimpus is a zhorpus. Every boompus is a terpus. Every gerpus is a worpus. Bob is a lorpus. Every gerpus is a yimpus. Every scrompus is a brimpus. Every lempus is a rorpus. Every lempus is a shumpus. Bob is a jelpus. Sally is a scrompus. Every gerpus is a brimpus. Every lempus is a lorpus. Every boompus is a yumpus. Every scrompus is a zumpus. Every zhorpus is a tumpus. Sally is a zumpus. Every lempus is a storpus. Every yerpus is a lorpus. Every scrompus is a zhorpus. Every yimpus is a rempus. Every impus is a jelpus. Jack is a yimpus. Every yerpus is a wumpus. Every rorpus is a hilpus. Every yimpus is a sterpus. Bob is a lempus. Every worpus is a storpus. Every rorpus is a impus. Every boompus is a gerpus. Is Sally a hilpus or sterpus?\n\nPlease output your conclusion like 'Answer: XXX is YYY'"],
                        hide_input=True,
                        eos_token_id='\n[').text[0]#[:-1].strip()

In [None]:
output

'To determine if Sally is a hilpus or sterpus, we need to analyze the given information.\n\nFrom the given statements, we know that Sally is a boompus (Sally is a boompus) and Sally is a gerpus (Sally is a gerpus). \n\nSince every gerpus is a scrompus (Every gerpus is a scrompus), Sally is a scrompus.\n\nWe also know that Sally is a scrompus (Sally is a scrompus) and every scrompus is a rempus (Every scrompus is a rempus), so Sally is a rempus.\n\nFurthermore, every rempus is a sterpus (Every rempus is a sterpus), so Sally is a sterpus.\n\nTherefore, Sally is a sterpus.\n\nAnswer: Sally is a sterpus.'