<a href="https://colab.research.google.com/github/tttequila/Kaggle_20Q/blob/main/7B_MultiCoT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Configuring your kaggle token, see more details in the [**Configure your API key**](https://ai.google.dev/gemma/docs/setup) section

### set up env

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
%%bash
mkdir ~/.kaggle
# change the first path to your path of kaggle.json
cp /content/drive/MyDrive/kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [3]:
%%bash
pip install -q -U torch immutabledict sentencepiece

In [4]:
!git clone https://github.com/google/gemma_pytorch.git

fatal: destination path 'gemma_pytorch' already exists and is not an empty directory.


### set up Gemma lib

In [5]:
# login the kaggle (need to store you kaggle.json to your google dirve)
import kagglehub
kagglehub.login()

import sys
sys.path.append("gemma_pytorch/gemma")
sys.path.append("gemma_pytorch")
import contextlib, torch


VBox(children=(HTML(value='<center> <img\nsrc=https://www.kaggle.com/static/images/site-logo.png\nalt=\'Kaggle…

In [6]:
from gemma.config import get_config_for_7b, get_config_for_2b
from gemma.model import GemmaForCausalLM, GemmaModel

In [7]:
import torch
import gemma
import itertools
from typing import Iterable
from typing import Any, List, Optional, Sequence, Tuple, Union
import os
import re

In [8]:
@contextlib.contextmanager
def _set_default_tensor_type(dtype: torch.dtype):
  """Sets the default torch dtype to the given dtype."""
  torch.set_default_dtype(dtype)
  yield
  torch.set_default_dtype(torch.float)


def interleave_unequal(x, y):
    '''
        Interleave two lists of unequal length.
    '''
    return [
        item for pair in itertools.zip_longest(x, y) for item in pair if item is not None
    ]

### Building Agents

<details>
  <summary> model </summary>
  
  - `self.forward()`: getting next token and corresponding logits
  - `self.generate()`: see summary below. May need to be rewriten if we wanna get the cumulative logits for the whole sentence  


</details>



<details>
  <summary> model.generate() </summary>
  
  - **prompts** | `Union[str, Sequence[str]]`: Your prompts
  - **device** | `Any`: Devices
  - **output_len** | `int`: max output length
  - **temperature** | `Union[float, None]`: temperature degree, controlling how variant its response could be  
  - **top_p** | `float`:
  - **top_k** | `int`:

  regarding temperature, top_p and top_k, check this [link](https://blog.csdn.net/REfusing/article/details/137866583)

</details>

#### Define Formatter


In [9]:
from typing import Iterable
import itertools

class PromptFormatter:

    '''
        formatter class to format the prompt text for the model.
        A general idea is
    '''

    _start_token = '<start_of_turn>'
    _end_token = '<end_of_turn>'

    def __init__(self, system_prompt: str = None, few_shot_examples: Iterable = None, sample_num = None):
        self._system_prompt = system_prompt
        self._few_shot_examples = few_shot_examples
        self._template_user = f"{self._start_token}user\n{{}}{self._end_token}\n"
        self._template_model = f"{self._start_token}model\n{{}}{self._end_token}\n"
        self._all_prompt = ''
        if sample_num:
          self.sample_num = sample_num
        else:
          self.sample_num = len(few_shot_examples)

        self.reset()

    def __repr__(self):
        return self._all_prompt

    def reset(self):
        self._all_prompt = ''
        # if system prompt is provided, add it to the prompt
        if self._system_prompt:
            self._all_prompt += self._template_user.format(self._system_prompt)
        # same for few shot examples
        if self._few_shot_examples:
            # self.add_rounds(self._few_shot_examples, start_agent='user')
            self.add_new_round('user', 'Here are some examples of analyzing with causal reasoning:', True)
            for example in random.sample(self._few_shot_examples, self.sample_num):
              self.add_new_round('model', example.strip(), True)


    def add_user_round(self, user_prompt: str):
        # add user round to the prompt
        self._all_prompt += self._template_user.format(user_prompt)

    def add_agent_round(self, model_response: str):
        self._all_prompt += self._template_model.format(model_response)

    def add_rounds(self, rounds: Iterable, start_agent: str):
        '''
            Apply a sequence of rounds to the formatter, starting with the specified agent.
        '''
        formatters = [self.add_agent_round, self.add_user_round] if start_agent == 'model' else [self.add_user_round, self.add_agent_round] # here, self.model and self.user are functions definded above
        formatters = itertools.cycle(formatters)
        for fmt, round in zip(formatters, rounds):
            fmt(round)
        return self

    # def add_end_token(self):
    #     self._all_prompt += f"{self._end_token}\n"

    def add_new_round(self, player: str, prompt:str = None, end_token: bool = False):
        self._all_prompt += f"{self._start_token}{player}\n"
        if prompt:
            self._all_prompt += f'{prompt}'
        if end_token:
            # self.add_end_token()
            self._all_prompt += f"{self._end_token}\n"

    def formate_MCQA(self):
        raise NotImplementedError

# print(str(PromptFormatter(sys_prompt, few_shot_examples_ask)))

#### Define Agent

In [10]:
from ast import parse
import random

class GemmaAgent_Guesser:

    def __init__(self, model_variant, device='cuda:0', env="kaggle", output_len=200, system_prompt=None, few_shot_examples=None, example_sample_num=None):
        # model initialization
        self.device = device
        self.model_variant = model_variant
        self.few_shot_exmaples_ask = few_shot_examples[0]
        self.few_shot_exmaples_guess = few_shot_examples[1]
        self.example_sample_num = example_sample_num

        WEIGHTS_PATH = self._set_up_env(env)

        # Ensure that the tokenizer is present
        tokenizer_path = os.path.join(WEIGHTS_PATH, 'tokenizer.model')
        assert os.path.isfile(tokenizer_path), 'Tokenizer not found!'

        # Ensure that the checkpoint is present
        ckpt_path = os.path.join(WEIGHTS_PATH , f'gemma-{model_variant}.ckpt')
        assert os.path.isfile(ckpt_path), 'PyTorch checkpoint not found!'

        # loading model configuration
        model_config = get_config_for_2b() if "2b" in model_variant else get_config_for_7b()
        model_config.quant = "quant" in model_variant
        model_config.tokenizer = os.path.join(WEIGHTS_PATH, "tokenizer.model")

        with _set_default_tensor_type(model_config.get_dtype()):
            self.model = GemmaForCausalLM(model_config)
            self.model.load_weights(ckpt_path)
            self.model = self.model.to(self.device).eval()

        # agent args
        self.formatter = PromptFormatter(system_prompt=system_prompt,
                                         few_shot_examples=few_shot_examples[0],
                                         sample_num=self.example_sample_num)
        self.round_num = 0
        self.known_info = []
        self.last_key_attribute = None
        self.last_guess = None
        self.output_len = output_len

    def _set_up_env(self, env):

        if env == 'kaggle':
            print("Loading model in Kaggle, model weights will be searched within local directories.")

            # kaggle configuration
            KAGGLE_AGENT_PATH = "/kaggle_simulations/agent/"
            if os.path.exists(KAGGLE_AGENT_PATH):
                WEIGHTS_PATH = os.path.join(KAGGLE_AGENT_PATH, f"gemma/pytorch/{self.model_variant}/2")
            else:
                WEIGHTS_PATH = f"/kaggle/input/gemma/pytorch/{self.model_variant}/2"

        elif env == 'colab':
            print("Loading model in Colab, starting from downloading the model weights.")

            WEIGHTS_PATH = kagglehub.model_download(f'google/gemma/pyTorch/{self.model_variant}')
        else:
            raise ValueError("Argument 'env' should be in ['kaggle', 'colab']")

        return WEIGHTS_PATH

    def _parse_response(self, response : str):
        '''
            parse the response into a dictionary.
            may contain three keys: key_attribute, question, guess and their value respectively.
            e.g.: {"key_attribute": 'a country', "question": 'Is it a country?'} for a parsed asker response.
        '''
        pattern = re.compile(r'\*\*([^*]+)\*\*:?\s*(.*?)(?=\*\*|$)', re.DOTALL)
        matches = pattern.findall(response.lower())
        parse_dict = {'key attribute':None, 'question':None, 'guess':None}
        for k, v in matches:
          if ('attr' in k) and (parse_dict['key attribute'] == None):
            parse_dict['key attribute'] = v
          elif ('ques' in k) and (parse_dict['question'] == None):
            parse_dict['question'] = v
          elif ('guess' in k) and (parse_dict['guess'] == None):
            parse_dict['guess'] = v

        return parse_dict

    def _format_prompt(self, obs):
        if obs["turnType"] == 'ask':
          self.formatter._few_shot_examples = self.few_shot_exmaples_ask
        if obs["turnType"] == 'guess':
          self.formatter._few_shot_examples = self.few_shot_exmaples_guess
        # reset formatter
        self.formatter.reset()
        self.formatter.add_new_round('user', 'Remember how to use CoT as above and the format of answering, now let us play a new game from the begining.', True)
        # add played rounds
        rounds = interleave_unequal(obs["questions"], obs["answers"])
        self.formatter.add_rounds(rounds, start_agent='user')
        if obs["turnType"] == 'ask':
            self.formatter.add_new_round('user',
                                         'Now it is your turn to ask a question. Please construct a new question based on what you have known and your common sense with your reasoning. Do not forget to indicate key attribute, question with double asterisk (**?**) and their content with double quotation markers ("?"), such as **Key Attribute** "your attribute" and **Question** "your question"',
                                         True)
        elif obs["turnType"] == 'guess':
            self.formatter.add_new_round('user',
                                         'Now guess the keyword based on your analysis with reasoning and known information. And surround a pointer of guess with double asterisks (**?**) and your guessed keyword with doublr quotation markers ("?") in the end, such as **Guess** "your guess"',
                                         True)
        # start of model response
        self.formatter.add_new_round('model', f'Given information: {str(self.known_info)}', False)

    def __call__(self, obs, cfg=None):
        '''
            Main function to interact with the model.
                1. update known information based on the observation and round number
                2. format the prompt with the observation
                3. generate response from the model
                4. parse the response and update information if needed
        '''
        print(f"=========== round: {self.round_num} ===========")
        self.round_num += 1
        # Update known information
        if self.round_num - 1 == 0:
            # for the first round, randomly choose an attribute as initialization
            initial_categories = ['person', 'thing', 'place']
            self.last_key_attribute = random.choice(initial_categories)
            self.else_categories = list(set(initial_categories) - set([self.last_key_attribute]))
            response = f'Is it a {self.last_key_attribute}?'
            round
            return response
        elif self.round_num - 1 == 1:
            # if it is the second round, grab the answer and update known information
            last_answer = obs["answers"][-1]
            if last_answer == 'yes':
                self.known_info.append(f'a {self.last_key_attribute}')
            else:
                self.known_info.append(f'not a {self.last_key_attribute}')
                self.known_info.append(f'either a {self.else_categories[0]} or a {self.else_categories[1]}')
        else:
            # other rounds, update known information
            last_answer = obs['answers'][-1]
            if last_answer == 'yes':
                self.known_info.append(f'is {self.last_key_attribute}')
            else:
                self.known_info.append(f'is not {self.last_key_attribute}')

        # # Update guessed keywords
        # self.known_info.append(f'keyword is not {self.last_guess}')
        # self.known_info = list(set(self.known_info))

        print(f"\nkey attribute: {self.last_key_attribute}, guess: {self.last_guess}")

        # Formatting prompt with observations
        self._format_prompt(obs)
        prompt = str(self.formatter)
        print(f"\nPrompt: {prompt}")
        # Getting response from LLM
        response = self.model.generate(prompt, device=self.device, output_len=self.output_len)
        print(f'\nTurn Type: {obs["turnType"]}\nResponse: {response}')

        # parse response and update information if needed
        parse_dict = self._parse_response(response)
        print(f'\nParse_dict: {parse_dict}')

        # if in an ask turn, try to grab the key attribute from the response and update last key attribute for the next information updating
        if obs["turnType"] == 'ask':
            # successfully grab the key attribute from the response
            if parse_dict['key attribute']:
                self.last_key_attribute = parse_dict['key attribute']
            # no key attribute parsed from the response, directly use question as key attribute
            elif parse_dict['question']:
                self.last_key_attribute  = parse_dict['question']
                ret = parse_dict['question']
                return ret
            # worst case, there is neither key attribute nor question parsed from the response, randomly choose an attribute
            else:
                random_alphabet = random.choice('abcdefghijklmnopqrstuvwxyz')
                ret = f'is it started with alphabet {random_alphabet}?'
                self.last_key_attribute = f'started with alphabet {random_alphabet}'
                return ret

            # parsed response has question, use question as response
            if parse_dict['question']:
                ret = parse_dict['question']
            # parsed response has no question but have key attribute, use key attribute as question
            else:

                ret = f'is it {self.last_key_attribute}?'
        # if in a guess turn, try to grab the guess from the response and update last guess for the next information updating
        elif obs["turnType"] == 'guess':
            if parse_dict['guess']:
                self.last_guess = parse_dict['guess']
                ret = parse_dict['guess']
            # if there is no guess parsed from the response, return empty string
            else:
                ret = ''
        else:
            raise ValueError('Invalid turnType.')

        return ret


### Debug

In [11]:
!pip -q install kaggle_environments
import kaggle_environments

In [12]:
# dummy agent
def simple_agent1(obs, cfg):
    # if agent is guesser and turnType is "ask"
    if obs['turnType'] == "ask": response = "Is it a duck?"
    elif obs.turnType == "guess": response = "duck"
    elif obs.turnType == "answer": response = "no"
    return response

def simple_agent2(obs, cfg):
    # if agent is guesser and turnType is "ask"
    if obs.turnType == "ask": response = "Is it a bird?"
    elif obs.turnType == "guess": response = "bird"
    elif obs.turnType == "answer": response = "no"
    return response

def simple_agent3(obs, cfg):
    # if agent is guesser and turnType is "ask"
    if obs.turnType == "ask": response = "Is it a pig?"
    elif obs.turnType == "guess": response = "pig"
    elif obs.turnType == "answer": response = "no"
    return response

def simple_agent4(obs, cfg):
    # if agent is guesser and turnType is "ask"
    if obs.turnType == "ask": response = "Is it a cow?"
    elif obs.turnType == "guess": response = "cow"
    elif obs.turnType == "answer": response = "no"
    return response

  and should_run_async(code)


In [13]:
# **IMPORTANT:** Define agent as a global so you only have to load
# the agent you need. Loading both will likely lead to OOM.
# del agent
agent = None
print(agent)

# def get_agent(name: str):
#     global agent

#     if agent is None and name == 'questioner':
#         agent = GemmaAgent_Guesser(VARIANT, MACHINE_TYPE, 'colab',
#                                    sys_prompt, few_shot_examples)
#     elif agent is None and name == 'answerer':
#         raise NotImplementedError
#     assert agent is not None, "Agent not initialized."

#     return agent


# def agent_fn(obs, cfg):
#     if obs.turnType == "ask":
#         response = get_agent('questioner')(obs)
#     elif obs.turnType == "guess":
#         response = get_agent('questioner')(obs)
#     elif obs.turnType == "answer":
#         response = get_agent('answerer')(obs)
#     if response is None or len(response) <= 1:
#         return "yes"
#     else:
#         return response


None


In [14]:
debug_config = {'episodeSteps': 10,     # initial step plus 3 steps per round (ask/answer/guess)
                'actTimeout': 5,       # agent time per round in seconds; default is 60
                'runTimeout': 60,      # max time for the episode in seconds; default is 1200
                'agentTimeout': 3600}  # obsolete field; default is 3600
env = kaggle_environments.make("llm_20_questions", debug_config, debug=True)
print(f'the key word is: {kaggle_environments.envs.llm_20_questions.llm_20_questions.keyword}\nwith alts as {kaggle_environments.envs.llm_20_questions.llm_20_questions.alts}')

the key word is: Utility Box
with alts as []


In [15]:
import time
# Choose variant and machine type
VARIANT = '7b-it-quant'
MACHINE_TYPE = 'cuda'
agent = None


### Prompt Define

In [21]:
sys_prompt = f'You are a highly knowledgeable naturalist with extensive knowledge about objects, places, and people around the world, almost like a search engine. You also possess strong reasoning abilities, allowing you to deduce the answer to your query using existing knowledge and information. \
You need to utilize your knowledge and reasoning skills to play a game of 20 questions. \
In each round of the game, choose the binary attribute most likely to aid your deduction, such as "there is a cross on the flag." \
Using this attribute, ask a question in English. Generally, the most helpful attribute provides the maximum information gain, meaning it significantly reduces the entropy of your guess, regardless of whether the answer is **yes** or **no**. For example, if you believe asking about the color is most helpful for your deduction, you should ask, "Is it a thing with a red color?" \
And after your thorough thinking, you need to wrap some key point with double stars and their corresponding content with double quotation marks (such as **key point**: "corresponding content"). \
Baiscally it should follows the format similar to within 100 words: **Key attribute**:"your chosen key attribute"\n**Question**:"your formatted question regarding to the key attribute"\n**Reasoning**:"your reasoning abput why this key attribute can provide you the most information gain" \
 Here are few examples of how to use Chain of thought:'

few_shot_examples_ask = [

# Round 1 | Keyword: Yangtze River
'''
Known information: ["is not a country", "is either a city or a landmark", "is not a city"]
**Key attribute**: "man-made structure"
**Question**: "Is it a man-made structure?"
****Reasoning****:
Man-made vs. Natural: This question helps to differentiate between natural landmarks (e.g., Grand Canyon) and man-made structures (e.g., Eiffel Tower).
Broad Categories: Man-made structures include a wide range of possibilities (e.g., buildings, monuments), whereas cities are all man-made but represent a different category.
Even Split: The distinction between natural and man-made provides an even split, maximizing information gain by effectively narrowing down the possibilities based on the answer.
''',

'''
Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America"]
**Key attribute**: "in Asia"
**Question**: "Is it located in Asia?"
****Reasoning****:
Geographic Focus: Identifying the continent will significantly narrow down the possible natural landmarks.
Large Landmarks: Asia has several major natural landmarks (e.g., Mount Everest, Great Wall).
Even Split: Given the global distribution of landmarks, this question will effectively split the remaining possibilities.
''',

'''
Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa", "is in China", "is a river"]
**Key attribute**: "longest river in China"
**Question**: "Is it the longest river in China?"
****Reasoning****:
Length-Specific Focus: Knowing whether it is the longest river will help confirm or eliminate the Yangtze River.
Major Rivers: The Yangtze River is the longest river in China, followed by the Yellow River.
Effective Split: This question will help confirm or deny one of the most prominent landmarks in China, providing a clear yes/no split.',
''',


# Round 2 | Keyword: Congo
'''
Known information: ["is a country"]
**Key attribute**: "in Europe"
**Question**: "Is this country in Europe?"
**Reasoning**:
Continental Focus: Knowing the continent will help significantly narrow down the list of possible countries.
Effective Reduction: Europe has many countries, so confirming or eliminating Europe will reduce the search space.
Balanced Distribution: Countries are evenly distributed across continents, providing an effective yes/no split.
''',

'''
Known information: ["is a country", "not in Europe", "not in Asia", "in Africa", "not in Northern Africa", "not in Eastern Africa", "not in Western Africa", "in Southern Africa"]
**Key attribute**: "landlocked country"
**Question**: "Is this country landlocked?"
**Reasoning**:
To further narrow down the possibilities within Southern Africa, I will focus on another well-known country in that region.
Geographical Focus: Knowing whether the country is landlocked will significantly narrow down the possibilities within Southern Africa.
Effective Reduction: There are several landlocked countries in Southern Africa, so confirming or eliminating this will help focus the search.
Balanced Distribution: This provides a clear yes/no split, effectively narrowing down the search.
''',

'''
Known information: ["is a country", "not in Europe", "not in Asia", "in Africa", "not in Northern Africa", "not in Eastern Africa", "not in Western Africa", "in Southern Africa", "not landlocked", "Portuguese is not an official language"]
**Key attribute**: "borders the Indian Ocean"
**Question**: "Does this country border the Indian Ocean?"
**Reasoning**:
Geographic Focus: Knowing whether the country borders the Indian Ocean can help narrow down the options.
Effective Reduction: There are only a few countries in Southern Africa that border the Indian Ocean.
Balanced Distribution: This question provides a clear yes/no split, effectively narrowing down the search.
 ''',

# round 3 | Keyword: Ryan
'''
Known information: ["not is a place", "is a person"]
**Key attribute**: "a historical figure"
**Question**: "Is this person a historical figure?"
**Reasoning**:
Time Frame: Identifying whether the person is historical can significantly narrow down the possibilities.
Effective Reduction: This helps focus on either historical or contemporary figures.
Balanced Distribution: This question provides a clear yes/no split, guiding the search effectively.
''',

'''
Known information: ["not is a place", "is a person", "is a historical figure", "is involved in the entertainment industry.", "is not primarily known for their work in music.", "is primarily known for their work as an actor.", "has not won an Academy Award (Oscar)."]
**Key attribute**: "primarily known for television work"
**Question**: "Is this person primarily known for their work in television?"
**Reasoning**:
Medium Focus: Identifying whether the person is known for television can help narrow down the possibilities.
Effective Reduction: This helps distinguish between actors primarily known for television versus those known for film.
Balanced Distribution: This question provides a clear yes/no split, effectively guiding the search.
'''

'''
Known information: ["not is a place", "is a person", "is a historical figure", "is involved in the entertainment industry.", "is not primarily known for their work in music.", "is primarily known for their work as an actor.", "has not won an Academy Award (Oscar).", "is not primarily known for their work in television.", "is primarily known for their work in action movies.", "is associated with a major action movie franchise."]
**Key attribute**: "known for science fiction action movies"
**Question**: "Is this person known for their work in science fiction action movies?"
**Reasoning**:
Genre Specificity: Identifying whether the person is known for science fiction action movies can help narrow down the possibilities.
Effective Reduction: This helps distinguish between different types of action movie genres, such as sci-fi, fantasy, or military.
Balanced Distribution: This question provides a clear yes/no split, guiding the search effectively.
''',

# round 4 | Keyword: Noosa
'''
Known information: ["is not a person", "is a place", "is not a country", "is a city"]
**Key attribute**: "a capital city"
**Question**: "Is it a capital city?"
**Reasoning**:
Geographic Focus: This question helps determine if the city is a capital, significantly narrowing the possibilities.
Effective Reduction: If the answer is yes, it focuses on capital cities, eliminating all non-capital cities. If no, it eliminates all capital cities from consideration.
Context Relevance: Knowing whether the city is a capital helps tailor subsequent questions to specific types of cities.
Balanced Distribution: Capital cities are a smaller subset of cities, providing an effective yes/no split.
''',

# '''
# Known information: ["is not a person", "is a place", "is not a country", "is a city", "is not a capital city", "is in the Southern Hemisphere."]
# **Key attribute**: "in the Southern Hemisphere"
# **Question**: "Is this city in the Southern Hemisphere?"
# **Reasoning**:
# Hemispheric Focus: This question splits the world into two large, nearly equal parts, effectively narrowing down possibilities.
# Effective Reduction: If the answer is yes, it narrows the focus to cities in the Southern Hemisphere, eliminating those in the Northern Hemisphere. If no, it focuses on Northern Hemisphere cities.
# Balanced Distribution: The Earth is divided evenly by the equator, providing a clear yes/no split that maximizes information gain.
# ''',


'''
Known information: ["is not a person", "is a place", "is not a country", "is a city", "is not a capital city", "is in the Southern Hemisphere", "not in Africa.", "not in South America.", "is in Australia or Oceania", "in Australia", "is a coastal city."]
**Key attribute**: "a well-known tourist destination"
**Question**: "Is this city a well-known tourist destination?"
**Reasoning**:
Tourism Focus: This question helps determine if the city is popular with tourists, which can narrow down the list of coastal cities.
Effective Reduction: If the answer is yes, it focuses on well-known tourist cities, eliminating less-known cities. If no, it narrows down to less-touristic coastal cities.
Context Relevance: Many of Australia's coastal cities are also popular tourist destinations, making this an important distinction.
Balanced Distribution: Tourist versus non-tourist cities provide a clear yes/no split.
''',

# '''
# Known information: ["is not a person", "is a place", "is not a country", "is a city", "is not a capital city", "is in the Southern Hemisphere", "not in Africa.", "not in South America.", "is in Australia or Oceania", "in Australia", "is a coastal city", "is a well-known tourist destination", "is not Gold Coast", "is not in New South Wales", "is in Queensland", "is not associated with the Great Barrier Reef"]
# **Key attribute**: "popular for its beaches"
# **Question**: "Is this city popular for its beaches?"
# **Reasoning**:
# Tourism Focus: Identifying if the city is known for its beaches can help narrow down the possibilities.
# Effective Reduction: If the answer is yes, it focuses on coastal cities popular for their beaches, eliminating inland cities or those known for other attractions. If no, it narrows down to other types of tourist destinations.
# Context Relevance: Many coastal cities in Queensland are known for their beaches, making this a significant distinction.
# Balanced Distribution: Coastal versus inland tourist destinations provide a clear yes/no split.
# ''',

'''
Known information: ["is not a person", "is a place", "is not a country", "is a city", "is not a capital city", "is in the Southern Hemisphere", "not in Africa.", "not in South America.", "is in Australia or Oceania", "in Australia", "is a coastal city", "is a well-known tourist destination", "is not Gold Coast", "is not in New South Wales", "is in Queensland", "is not associated with the Great Barrier Reef", "popular for its beaches"]
**Key attribute**: "located on the Sunshine Coast"
**Question**: "Is this city located on the Sunshine Coast?"
**Reasoning**:
Regional Focus: Identifying if the city is on the Sunshine Coast can significantly narrow down the possibilities.
Effective Reduction: If the answer is yes, it focuses on cities on the Sunshine Coast, eliminating cities in other coastal regions of Queensland. If no, it narrows down to other coastal areas.
Context Relevance: The Sunshine Coast is known for its beach destinations, making this a significant distinction.
Balanced Distribution: This question provides a clear yes/no split, effectively guiding the search.
'''
]

# few_shot_examples_ask = [

# 'Now give me an example of thinking with the Chain of Thought',

# 'Known information: ["is not a country", "is either a city or a landmark", "is not a city"]\n\
# **Key attribute**: "man-made structure" \
# **Question**: "Is it a man-made structure?" \
# **Reasoning**: Man-made vs. Natural: This question helps to differentiate between natural landmarks (e.g., Grand Canyon) and man-made structures (e.g., Eiffel Tower). \
# Broad Categories: Man-made structures include a wide range of possibilities (e.g., buildings, monuments), whereas cities are all man-made but represent a different category. \
# Even Split: The distinction between natural and man-made provides an even split, maximizing information gain by effectively narrowing down the possibilities based on the answer.',

# 'no',

# 'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure"]\n\
# **Key attribute**: "in North America" \
# **Question**: "Is it located in North America?" \
# **Reasoning**: Continental Location: Knowing the continent helps narrow down the options to specific regions.\
# Significant Reduction: This can significantly reduce the number of potential landmarks.\
# Balanced Distribution: The world major landmarks are fairly evenly distributed across continents, providing a balanced yes/no split.',

# 'no',

# 'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America"]\n\
# **Key attribute**: "in Asia" \
# **Question**: "Is it located in Asia?" \
# **Reasoning**: Geographic Focus: Identifying the continent will significantly narrow down the possible natural landmarks. \
# Large Landmarks: Asia has several major natural landmarks (e.g., Mount Everest, Great Wall). \
# Even Split: Given the global distribution of landmarks, this question will effectively split the remaining possibilities.',

# 'yes',

# 'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa"]\n\
# **Key attribute**: "in China" \
# **Question**: "Is it located in China?" \
# **Reasoning**: Country-Specific Focus: Knowing the specific country will help narrow down the landmark significantly. \
# Major Landmarks: China has several well-known natural landmarks (e.g., Yangtze River, Mount Everest on the border). \
# Even Split: Given the number of countries in Asia with significant natural landmarks, this question provides a balanced yes/no split.',

# 'yes',

# 'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa", "is in China"]\n\
#  **Key attribute**: "a river" \
# **Question**: "Is it a river?" \
# **Reasoning**: Type-Specific Focus: Knowing whether the natural landmark is a river will help narrow down the possibilities significantly. \
# Major Landmarks: China has several famous rivers (e.g., Yangtze River, Yellow River). \
# Even Split: This question provides a balanced yes/no split, effectively narrowing down the possibilities.',

# 'yes',

# 'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa", "is in China", "is a river"]\n\
# **Key attribute**: "longest river in China" \
# **Question**: "Is it the longest river in China?" \
# **Reasoning**: Length-Specific Focus: Knowing whether it is the longest river will help confirm or eliminate the Yangtze River. \
# Major Rivers: The Yangtze River is the longest river in China, followed by the Yellow River. \
# Effective Split: This question will help confirm or deny one of the most prominent landmarks in China, providing a clear yes/no split.',

# 'yes',

# 'It is the Yangtze River'

# 'Correct!']

few_shot_examples_guess = [
'Now give me an example of guessing with the reasoning',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city"]\n\
Given these clues, it is likely a natural landmark rather than a city or a man-made structure. An example of a well-known natural landmark is the Grand Canyon.\
Guess: "Grand Canyon"',

'no',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure"]\n\
Given these clues, it is likely a natural landmark outside North America. An example of a famous natural landmark outside North America is Mount Everest.\
Guess: "Mount Everest"',

'no',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America"]\n\
Given these clues, it is likely a natural landmark in Asia. A famous natural landmark in Asia is the Himalayas.\
Guess: "the Himalayas"',

'no',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa"]\n\
Given these clues, it is likely a famous natural landmark in China. An example of such a landmark is the Mekong River.\
Guess: "the Mekong River"',

'no',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa", "is in China"]\n\
Given these clues, it is likely a natural landmark in China that is not a mountain. An example of such a landmark could be the Tianchi Lake.\
Guess: "the Tianchi Lake"',

'no',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa", "is in China", "is a river"]\n\
Given these clues, a very famous river in China is the Yellow River.\
Guess: "the Yellow River"',

'Known information: ["is not a country", "is either a city or a landmark", "is not a city", "not a man-made structure", "is not in North America", "is in Aisa", "is in China", "is a river", "is the longest river in China"]\n\
Given these clues, it should be the longest river in China which is the Yangtze River.\
Guess: "the Yangtze River"',

'Correct!']

### Runing Test

In [22]:
# agent = GemmaAgent_Guesser(VARIANT, MACHINE_TYPE, 'colab', 100, sys_prompt, [few_shot_examples_ask, few_shot_examples_guess], None)
agent.output_len = 100
# agent.example_sample_num = None
agent.formatter._sys_prompt = sys_prompt
agent.formatter._few_shot_examples = few_shot_examples_ask
agent.round_num = 4
agent.last_key_attribute = 'a person'
agent.last_guess = 'pig'
agent.known_info = ['not a place', 'either a person or a thing', 'not a thing']
# print(str(agent.formatter))
obs = {'remainingOverageTime': 300, 'questions': ['Is it a place?', 'Is it a thing?', 'Is it a person?'], 'guesses': ['pig', 'pig', 'pig'], 'answers': ['no', 'no', 'yes'], 'role': 'guesser', 'turnType': 'ask', 'keyword': '', 'category': ''}
start = time.time()
agent(obs)
print(f"TIME: {(time.time()-start):.4f}s")


key attribute: a person, guess: pig

Prompt: <start_of_turn>user
You are a highly knowledgeable naturalist with extensive knowledge about objects, places, and people around the world, almost like a search engine. You also possess strong reasoning abilities, allowing you to deduce the answer to your query using existing knowledge and information. You need to utilize your knowledge and reasoning skills to play a game of 20 questions. In each round of the game, choose the binary attribute most likely to aid your deduction, such as "there is a cross on the flag." Using this attribute, ask a question in English. Generally, the most helpful attribute provides the maximum information gain, meaning it significantly reduces the entropy of your guess, regardless of whether the answer is **yes** or **no**. For example, if you believe asking about the color is most helpful for your deduction, you should ask, "Is it a thing with a red color?" And after your thorough thinking, you need to wrap some

In [18]:
game_output = env.run(agents=[simple_agent1,simple_agent2, simple_agent3, simple_agent4])
print(game_output[9])

[{'action': 'duck', 'reward': 0, 'info': {}, 'observation': {'remainingOverageTime': 300, 'step': 9, 'questions': ['Is it a duck?', 'Is it a duck?', 'Is it a duck?'], 'guesses': ['duck', 'duck', 'duck'], 'answers': ['no', 'no', 'no'], 'role': 'guesser', 'turnType': 'ask', 'keyword': '', 'category': ''}, 'status': 'DONE'}, {'action': '', 'reward': 0, 'info': {}, 'observation': {'remainingOverageTime': 300, 'questions': ['Is it a duck?', 'Is it a duck?', 'Is it a duck?'], 'guesses': ['duck', 'duck', 'duck'], 'answers': ['no', 'no', 'no'], 'role': 'answerer', 'turnType': 'answer', 'keyword': 'Utility Box', 'category': 'things'}, 'status': 'DONE'}, {'action': 'pig', 'reward': 0, 'info': {}, 'observation': {'remainingOverageTime': 300, 'questions': ['Is it a pig?', 'Is it a pig?', 'Is it a pig?'], 'guesses': ['pig', 'pig', 'pig'], 'answers': ['no', 'no', 'no'], 'role': 'guesser', 'turnType': 'ask', 'keyword': '', 'category': ''}, 'status': 'DONE'}, {'action': '', 'reward': 0, 'info': {}, 'o

In [19]:
# env.render(mode="ipython", width=600, height=500)

### Experimental Recordings

- 1 question deduction,**150** output length
  - 72s❌, no indication of answer❌
- 3 question deduction,**150** output length
  - 72s❌, no indication of answer❌
- **6** question deduction,**150** output length
  - 83s❌, completed answer and indicator✅
- 3 question deduction,**100** output length
  - 52s✅, incompleted response❌
  (maybe we can limit the length of reasoning within 100 character)
- **6** question deduction,**100** output length
  - 59s✅, incompleted response❌

---

**👇 Try to downsize the length of CoT example. (better few shot cases and system prompt)**

---

- **6** question deduction, **100** output length
  - 51s✅, completed and fair reasoning✅
- **6** question deduction, **150** output length
  - 68s❌, similar result to above✅

---

**👇 Apparent path dependence, maybe we need multiple deduction of different cases instead of all steps within one game**

---
