<a href="https://colab.research.google.com/github/gitHubAndyLee2020/CrafterGPT/blob/main/CrafterGPT_Step_By_Step_Prompt_Engineering_With_Fine_Tuned_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### CrafterGPT Step-By-Step Prompt Engineering with Fine-Tuned Model

### Install condacolab

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
import condacolab
condacolab.check()

### Install SmartPlay

In [None]:
!git clone https://github.com/microsoft/SmartPlay.git

In [None]:
!conda env update -n base -f ./SmartPlay/environment.yml

In [None]:
!pip install minedojo

In [None]:
!cd SmartPlay && pip install -e .

### Install Dependencies

In [None]:
!pip install gym crafter accelerate bitsandbytes

In [None]:
!pip install cffi==1.16.0 # Solves cffi package version conflict that occurs

### LLM API

In [None]:
%%writefile llm_api.py
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda" # the device to load the model onto

if not ('model' in globals()):
  model = AutoModelForCausalLM.from_pretrained("techandy42/llama-2-7b-craftergpt-v1.1",
                                            torch_dtype="auto", load_in_4bit=True)
  model = model.bfloat16()

if not ('tokenizer' in globals()):
  tokenizer = AutoTokenizer.from_pretrained("techandy42/llama-2-7b-craftergpt-v1.1",
                                          torch_dtype="auto")
  tokenizer.use_default_system_prompt = False
  tokenizer.pad_token = tokenizer.bos_token
  tokenizer.padding_side = "left"

def query_model(messages):
  encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

  model_inputs = encodeds.to(device)

  generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True, top_p=0.6, temperature=0.9)
  decoded = tokenizer.batch_decode(generated_ids)

  torch.cuda.empty_cache()

  return decoded[0]

### Import Libraries

In [None]:
%cd ./SmartPlay/src
import smartplay
%cd ../..

In [3]:
import argparse

import crafter
import tqdm
import numpy as np

In [None]:
from llm_api import query_model

### Main Program

In [5]:
import gym

In [6]:
action_list = [
  'Noop',
  'Move West',
  'Move East',
  'Move North',
  'Move South',
  'Do',
  'Sleep',
  'Place Stone',
  'Place Table',
  'Place Furnace',
  'Place Plant',
  'Make Wood Pickaxe',
  'Make Stone Pickaxe',
  'Make Iron Pickaxe',
  'Make Wood Sword',
  'Make Stone Sword',
  'Make Iron Sword',
]

In [7]:
import re

def match_act(string):
    # Use regular expression to find a phrase within quotes
    match = re.search(r'"([^"]*)"', string)
    if match:
        quoted_action = match.group(1)
        # Check if the quoted action is in the action_list
        for i, act in enumerate(action_list):
            if act.lower() == quoted_action.lower():
                return i
        print("Action \"{}\" not found in action list.".format(quoted_action))

    # If no quoted action or not found in the list, proceed with the original code
    for i, act in enumerate(action_list):
        if act.lower() in string.lower():
            return i
    print("LLM failed with output \"{}\", taking action Do...".format(string))
    return action_list.index("Do")

In [8]:
def compose_prompt(current_observation, past_actions, action_list):
    newline = "\n"

    prompt = f"""
Environment Description:

1. You are playing a 2D survival game.
2. Your goal is to maximize rewards and stay alive.
3. You receive 1 point of reward for the first time you achieve the following achievements:
	- Collect Wood: No requirements.
	- Place Table: Requires Collect Wood.
	- Eat Cow: No requirements.
	- Collect Sampling: No requirements.
	- Collect Drink: No requirements.
	- Make Wood Pickaxe: Requires Place Table.
	- Make Wood Sword: Requires Place Table.
	- Place Plant: Requires Collect Sampling.
	- Defeat Zombie: No requirements.
	- Collect Stone: Requires Make Wood Pickaxe.
	- Place Stone: Requires Collect Stone.
	- Eat Plant: Requires Place Plant.
	- Defeat Skeleton: No requirements.
	- Make Stone Pickaxe: Requires Collect Stone.
	- Make Stone Sword: Requires Collect Stone.
	- Wake Up: No requirements.
	- Place Furnace: Requires Collect Stone.
	- Collect Coal: Requires Make Wood Pickaxe.
	- Collect Iron: Requires Make Stone Pickaxe.
	- Make Iron Pickaxe: Requires Place Furnace, Collect Coal, and Collect Iron.
	- Make Iron Sword: Requires Place Furnace, Collect Coal, and Collect Iron.
	- Collect Diamond: Requires Make Iron Pickaxe.
4. You will die if the health status reaches 0.
5. You will start to lose health if one of the following conditions is true:
	- Zombie or skeleton is attacking you.
	- Food status is 0.
	- Drink status is 0.
	- You fall into lava.
6. To resolve each situation, you can do the following:
	- If zombie or skeleton is attacking you, either run away or attack it. You must be one step away from zombie or skeleton to attack it.
	- If food status is 0, attack cow. You must be one step away from cow to attack it.
	- If drink status is 0, drink water. You must be one step away from water to drink it.
7. Within the environment, you can take one of the following actions:
	- Noop: Take no action, always applicable.
	- Move West: Requires no object or grass west of you.
	- Move East: Requires no object or grass east of you.
	- Move North: Requires no object or grass north of you.
	- Move South: Requires no object or grass south of you.
	- Do: Requires facing creature or object; (1) collects resource if necessary tools exist or (2) attacks creature
	- Sleep: Requires below maximum energy status; automatically taken if energy status reaches 0.
	- Place Stone: Requires stone in inventory.
	- Place Table: Requires 2 wood in inventory.
	- Place Furnace: Requires 4 stones in inventory.
	- Place Plant: Requires sapling in inventory.
	- Make Wood Pickaxe: Requires you to be 1 step away from table; requires wood in inventory.
	- Make Stone Pickaxe: Requires you to be 1 step away from table; requires wood, stone in inventory.
	- Make Iron Pickaxe: Requires you to be 1 step away from table and furnace; requires wood, coal, iron in inventory.
	- Make Wood Sword: Requires you to be 1 step away from table; requires wood in inventory.
	- Make Stone Sword: Requires you to be 1 step away from table; requires wood, stone in inventory.
	- Make Iron Sword: Requires you to be 1 step away from table and furnace; requires wood, coal, iron in inventory.
8. To choose which action to take, consider the following:
	- Take action that will resolve immediate threats.
	- Take action that will collect useful resources.
	- Take action that will achieve new achievements.

Current Observation:

{current_observation}

Action History:

{'- ' + (newline + '- ').join(past_actions)}

Instruction:

- Given the Environment Description, Current Observation, and Action History above, please describe your current status, the resources that you currently have, and potential threats that you currently face.
- Then, please describe the No.1 goal that you must work towards.
- Then, please list every action that you can take from the Action List below. Please exclude actions that you lack the resources or tools to take.
- Finally, please choose one action that aligns with your goal the most. Please output your final choice surrounded by quotation marks, such as \"Action Name\".

Action List:

{'- ' + (newline + '- ').join(action_list)}
"""

    messages = [
       {"role": "user", "content" : prompt}
    ]

    return messages

In [9]:
%%time

import random
import pandas as pd
import os

seeds = []

for i in range(1):

  class Object(object):
      pass

  args = Object()
  args.steps = 1e6
  args.seed = random.randint(1, 1024)

  seeds.append(args.seed)
  print(f"Random Seed: {args.seed}")

  env = gym.make("smartplay:Crafter-v0", seed=args.seed)

  done = True
  step = 0
  _, info = env.reset()

  trajectories = []
  R = 0

  while step < args.steps:

      if done:
          env.reset()
      done = False

      print("=="*15, "Step: {}, Reward: {}".format(step, R), "=="*15)
      current_observation = info['obs']
      print(current_observation)

      print("--"*10 + " QA " + "--"*10)

      messages = compose_prompt(current_observation, [trajectory[2] for trajectory in trajectories][-10:], action_list)

      answer_act_raw = query_model(messages)
      answer_act = answer_act_raw.split("[/INST]")[-1].split("</s>")[0]
      print("LLM output: {}\n".format(answer_act))

      a = match_act(answer_act)
      print("Action: {}\n".format(action_list[a]))

      # Take random action
      # a = np.random.randint(0, len(action_list))
      # print("Action: {}".format(action_list[a]))

      _, reward, done, info = env.step(a)
      R += reward

      trajectories.append((step, current_observation, action_list[a], reward, done, R))

      step += 1

      print()

      if done:
          break

  # Convert trajectories to pandas DataFrame
  df = pd.DataFrame(trajectories, columns=["step", "observation", "action", "reward", "done", "total_reward"])

  # Create experiments directory if it doesn't exist
  if not os.path.exists("experiments"):
      os.makedirs("experiments")

  # Save the DataFrame to a CSV file
  filename = f"experiments/trajectory_seed_{args.seed}.csv"
  df.to_csv(filename, index=False)

Random Seed: 162



No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.



[1;30;43mStreaming output truncated to the last 5000 lines.[0m

I am currently facing the grass at my front. I have nothing in my inventory. My energy status is 8/9, and I need to drink water to reach maximum energy. My food status is 8/9, and I need to eat a cow to reach maximum food. My drink status is 8/9, and I need to drink water to reach maximum drink. I have collected 4 stones and 2 wood.

Resources:

* 4 stones
* 2 wood

Potential Threats:

* Zombie 5 steps to my east
* Skeleton 2 steps to my west
* Cow 5 steps to my north-east

Goal:
My number one goal is to collect more resources and resolve immediate threats.

Action Choice:
"Make Iron Pickaxe"

Action: Make Iron Pickaxe


You took action make_iron_pickaxe.

You see:
- grass 1 steps to your west
- tree 5 steps to your south-west
- cow 4 steps to your north-east

You face grass at your front.

Your status:
- health: 9/9
- food: 8/9
- drink: 8/9
- energy: 8/9

You have nothing in your inventory.
-------------------- QA -----

In [None]:
import pandas as pd

for seed in seeds:
  df = pd.read_csv(f"experiments/trajectory_seed_{seed}.csv")
  print(df.head())

In [11]:
from google.colab import files

for seed in seeds:
  files.download(f'experiments/trajectory_seed_{seed}.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>