### Intro
Quick & dirty experiment to evaluate `natural language -> CLI command` conversion using a LLM with a simple API doc context instead of zero/few-shot examples. <br/>
Nerfstudio provides a simple API that allows for a simplified end-to-end process of creating, training, and testing NeRFs. This is an attempt to take a text description of a task and convert it to CLI commands for handling the sequential steps of data processing, training and rendering. <br/>

We evaluate some simple prompting techniques (zero-shot CoT, self-criticism) and culminate in a mock interactive session where a user can reject a response at any step and/or provide feedback to request a correction.

Other notes:
- limited, high-level subset of NeRFStudio CLI docs without detailed instructions using gpt-3.5-turbo.
- doesn't leverage capabilities like gpt message roles, sophisticated prompting etc.
- Performance is evaluated only on provided context. <br/>
  - Doing this provides guidance on reducing hallucinations. On a larger scale, we could imagine working with multiple versions of a constantly changing API spec. Keeping it context-focused can aid this when using something like retrieval-augmented generation for larger or more context-focused docs.
  - For eg, instruct-nerf2nerf requires first training a regular nerfacto scene and using those checkpoints. However, this info isn't in-context, so we don't care about this.<br/>

In [1]:
import openai
import os
from IPython.display import Markdown, display, clear_output

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

openai.api_key  = os.getenv('OPENAI_API_KEY')

In [2]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

In [3]:
with open('context.txt', 'r') as file:
    context = file.read()

### A basic NeRF request

In [4]:
scenario = "Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it."

In [5]:
prompt = """ You are a programmer specialized in producing NeRFs. You're given an API specification for the NerfStudio CLI, which can be used to make NERFs for several scenarios. 
API specification: ```{context}```
Based on this API specification, write a CLI command that will produce a NERF for the following scenario: {scenario}"""

The full prompt with context:

In [6]:
display(Markdown(prompt.format(context=context, scenario=scenario)))

 You are a programmer specialized in producing NeRFs. You're given an API specification for the NerfStudio CLI, which can be used to make NERFs for several scenarios. 
API specification: ```The following CLI modules is available for use:
- ns-process-data: Process images into a nerfstudio dataset. Scales images to a specified size and calculates the camera poses for each image using COLMAP.
  usage: ns-process-data [-h]
                       {images,video,polycam,metashape,realitycapture,record3d}
  ns-process-data images [-h] --data PATH --output-dir PATH
                       [--verbose | --no-verbose]
                       [--camera-type {perspective,fisheye,equirectangular}]
                       [--matching-method {exhaustive,sequential,vocab_tree}]
                       [--sfm-tool {any,colmap,hloc}]
                       [--refine-pixsfm | --no-refine-pixsfm]
                       [--feature-type 
{any,sift,superpoint,superpoint_aachen,superpoint_max,superpoint_inloc,r2d2,d2
net-ss,sosnet,disk}]
                       [--matcher-type 
{any,NN,superglue,superglue-fast,NN-superpoint,NN-ratio,NN-mutual,adalam}]
                       [--num-downscales INT]
                       [--skip-colmap | --no-skip-colmap]
                       [--skip-image-processing | --no-skip-image-processing]
                       [--colmap-model-path PATH] [--colmap-cmd STR]
                       [--images-per-equirect {8,14}]
                       [--crop-factor FLOAT FLOAT FLOAT FLOAT]
                       [--crop-bottom FLOAT] [--gpu | --no-gpu]
                       [--use-sfm-depth | --no-use-sfm-depth]
                       [--include-depth-debug | --no-include-depth-debug]

- ns-train: Primary interface for training a NeRF model
  usage: ns-train {method} [method args] {dataparser} [dataparser args]
    
    To learn about the available methods:
    ns-train --help
    To learn about a methods parameters:

    ns-train {method} --help
    By default the nerfstudio dataparser is used. If you would like to use a different dataparser it can be specified after all of the method arguments. For a list of dataparser options:
    ns-train {method} {dataparser} --help

    methods:
        - vanilla-nerf: The original NERF implementation. Generally not recommended for most applications.
        - nerfacto: We created the nerfacto model to act as our default for real data captures of static scenes. The model is not existing published work, but rather a combination of many published methods that we have found work well for real data.
        - instant-ngp: Many of the main contributions of Instant-NGP are built into our Nerfacto method, so for real-world scenes, we recommend using the Nerfacto model.
        - in2n: instruct-NeRF2NeRF is a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, the method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction. The paper demonstrates that their method is able to edit large-scale, real-world scenes, and is able to accomplish more realistic, targeted edits than prior work.
        - lerf: LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume.
        - tetra-nerf: The input to Tetra-NeRF is a point cloud which is triangulated to get a set of tetrahedra used to represent the radiance field. Rays are sampled, and the field is queried. The barycentric interpolation is used to interpolate tetrahedra vertices, and the resulting features are passed to a shallow MLP to get the density and colours for volumetric rendering.

- ns-viewer: Load a checkpoint and start the viewer.
  usage: ns-viewer [-h] --load-config PATH [--viewer.relative-log-filename STR]
    [--viewer.websocket-port {None}|INT]
    [--viewer.websocket-port-default INT]
    [--viewer.websocket-host STR]
    [--viewer.max-num-display-images INT]
    [--viewer.quit-on-train-completion | 
    --viewer.no-quit-on-train-completion]
                [--viewer.image-format {jpeg,png}]
                [--viewer.jpeg-quality INT] [--vis {viewer,viewer_beta}]

The general workflow for training a model is as follows:
- Process your images into a nerfstudio dataset using ns-process-data
- Train a model using ns-train. Choose the NERF model to use carefully based on the needs and requirements.
- View the results using ns-viewer```
Based on this API specification, write a CLI command that will produce a NERF for the following scenario: Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it.

In [7]:
response = get_completion(prompt.format(context=context, scenario=scenario))
display(Markdown(response))

To produce a NERF model for the given scenario, you can use the following CLI command:

```
ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset
ns-train nerfacto --data ./fireHydrantDataset
ns-viewer --load-config ./fireHydrantDataset/config.txt
```

Explanation:
1. `ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset`: This command processes the images in the `./fireHydrant` folder and generates a nerfstudio dataset in the `./fireHydrantDataset` directory. It automatically scales the images and calculates camera poses using COLMAP.
2. `ns-train nerfacto --data ./fireHydrantDataset`: This command trains a NERF model using the `nerfacto` method on the generated dataset in the `./fireHydrantDataset` directory.
3. `ns-viewer --load-config ./fireHydrantDataset/config.txt`: This command loads the trained model checkpoint from the `./fireHydrantDataset` directory and starts the viewer to render the NERF model.

Make sure to replace `./fireHydrant` with the actual path to the folder containing the fire hydrant images.

Works fine, let's try a more complex scenario.

In [11]:
scenario = "Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it. I want to be able to edit this NERF based on text-instructions."
response = get_completion(prompt.format(context=context, scenario=scenario))
display(Markdown(response))

To produce a NERF model for the given scenario and enable editing based on text-instructions, you can use the following CLI command:

```
ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset --camera-type perspective
```

This command will process the images in the './fireHydrant' folder and create a nerfstudio dataset in the './fireHydrantDataset' directory. The camera type is set to 'perspective' as it is the most common camera type for general images.

After processing the data, you can train a NERF model using the 'ns-train' command. Since you want to enable editing based on text-instructions, the 'in2n' method (instruct-NeRF2NeRF) is recommended. The command to train the NERF model with the 'in2n' method would be:

```
ns-train in2n --data-parser nerfstudio ./fireHydrantDataset
```

This command will train the NERF model using the 'in2n' method and the nerfstudio data parser. The './fireHydrantDataset' directory is provided as the input data for training.

Once the model is trained, you can view the results using the 'ns-viewer' command. The command to load the checkpoint and start the viewer would be:

```
ns-viewer --load-config ./fireHydrantDataset/checkpoints/last.ckpt
```

This command will load the checkpoint from the trained model and start the viewer, allowing you to render and interact with the NERF model of the fire hydrant.

Please note that the above commands assume that you have the NerfStudio CLI installed and available in your command-line environment.

Works fine! However, we are cheating a bit as the scenario is somewhat present verbatim in context. <br/>
Scenario: `I want to be able to edit this NERF based on text-instructions` <br/>
Context: `instruct-NeRF2NeRF is a method for editing NeRF scenes with text-instructions` <br/>

Let's word the scenario a bit differently, like a human would.

In [10]:
scenario = "Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it. I'd like to edit this NERF"
response = get_completion(prompt.format(context=context, scenario=scenario))
display(Markdown(response))

Based on the API specification, the CLI command to produce a NERF for the given scenario would be as follows:

```
ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset
ns-train nerfacto --data ./fireHydrantDataset
ns-viewer --load-config ./fireHydrantDataset/config.txt
```

Explanation:
1. `ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset`: This command processes the images in the `./fireHydrant` folder and generates a nerfstudio dataset in the `./fireHydrantDataset` directory. It automatically scales the images and calculates camera poses using COLMAP.
2. `ns-train nerfacto --data ./fireHydrantDataset`: This command trains a NERF model using the `nerfacto` method on the generated dataset in the `./fireHydrantDataset` directory.
3. `ns-viewer --load-config ./fireHydrantDataset/config.txt`: This command loads the trained model from the `./fireHydrantDataset` directory and starts the viewer to render and visualize the NERF model.

To edit the NERF model, you would need to use additional commands or methods specific to the desired editing functionality. The provided API specification does not include specific details about editing NERF models.

Nope

In [12]:
scenario = "Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it. I'd like to edit this NERF with text."
response = get_completion(prompt.format(context=context, scenario=scenario))
display(Markdown(response))

To produce a NERF model for the given scenario and edit it with text, you can use the following CLI command:

```
ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset
```

This command will process the images in the './fireHydrant' folder and generate a nerfstudio dataset in the './fireHydrantDataset' directory.

After processing the data, you can train a NERF model using the 'ns-train' command. Since you want to edit the NERF with text, you can use the 'in2n' method:

```
ns-train in2n --data-parser nerfstudio ./fireHydrantDataset
```

This command will train an 'in2n' NERF model using the 'nerfstudio' data parser and the dataset generated from the previous step.

Finally, to view the results and render the edited NERF, you can use the 'ns-viewer' command:

```
ns-viewer --load-config ./fireHydrantDataset/checkpoints/last.ckpt
```

This command will load the trained model checkpoint and start the viewer, allowing you to visualize and edit the NERF model.

Please note that the above commands assume that you have the NerfStudio CLI installed and available in your system's PATH.

Nope again. Let's try some basic prompt engineering. First up, the zero-shot Chain of Thought trick..

In [13]:
scenario = "Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it. I'd like to edit this NERF with text. Let's think step by step."
response = get_completion(prompt.format(context=context, scenario=scenario))
display(Markdown(response))

To produce a NERF model for the given scenario and enable text editing, the following CLI command can be used:

```
ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset
```
This command will process the images in the './fireHydrant' folder and generate a nerfstudio dataset in the './fireHydrantDataset' directory.

```
ns-train nerfacto --data ./fireHydrantDataset
```
This command will train a NERF model using the 'nerfacto' method and the generated dataset from the previous step.

```
ns-viewer --load-config ./fireHydrantDataset/checkpoints/nerfacto
```
This command will load the trained NERF model and start the viewer, allowing you to render and visualize the results.

To edit the NERF model with text, you can use the 'in2n' method. Assuming you have a text instruction file named 'edit_instructions.txt', the following command can be used:

```
ns-train in2n --data ./fireHydrantDataset --edit-instructions ./edit_instructions.txt
```
This command will use the 'in2n' method to edit the NERF model based on the instructions provided in the 'edit_instructions.txt' file.

Please note that the above commands are just an example and may require additional arguments or modifications based on the specific implementation and requirements of the NerfStudio CLI.

Gets the model right, but hallucinates `--edit_instructions`. Next up, self-evaluation

In [15]:
prompt2 = f"""
You are a programmer specialized in producing NeRFs. You're given an API specification for the NerfStudio CLI, which can be used to make NERFs for several scenarios. 
API specification: ```{context}```
Based on this API specification, write a CLI command that will produce a NERF for the following scenario: 

Your task is to determine if the student's solution \
is correct or not.
To solve the problem do the following:
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution \ 
and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until 
you have done the problem yourself.

Use the following format:
Question:
```
question here
```
Student's solution:
```
student's solution here
```
Actual solution:
```
steps to work out the solution and your solution here
```
Is the student's solution the same as actual solution \
just calculated:
```
yes or no
```
Student grade:
```
correct or incorrect
```

Question:
```
Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it. I'd like to edit this NERF with text.
``` 
Student's solution:
```
'To produce a NERF model for the given scenario and enable text editing, you can use the following CLI command:\n\n```\nns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset\nns-train nerfacto --data ./fireHydrantDataset\nns-viewer --load-config ./fireHydrantDataset/config.txt\n```\n\nExplanation:\n1. `ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset`: This command processes the images in the `./fireHydrant` folder and generates a nerfstudio dataset in the `./fireHydrantDataset` directory. It automatically scales the images and calculates camera poses using COLMAP.\n2. `ns-train nerfacto --data ./fireHydrantDataset`: This command trains a NERF model using the `nerfacto` method on the generated dataset in the `./fireHydrantDataset` directory.\n3. `ns-viewer --load-config ./fireHydrantDataset/config.txt`: This command loads the trained model checkpoint from the `./fireHydrantDataset` directory and starts the viewer, allowing you to render and edit the NERF model.\n\nNote: Make sure you have the NerfStudio CLI installed and accessible in your environment before running these commands.'
```
Actual solution:
"""


In [16]:
response = get_completion(prompt2)
display(Markdown(response))

To produce a NERF model for the given scenario and enable text editing, you can use the following CLI command:

```
ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset
ns-train in2n --data ./fireHydrantDataset
ns-viewer --load-config ./fireHydrantDataset/config.txt
```

Explanation:
1. `ns-process-data images --data ./fireHydrant --output-dir ./fireHydrantDataset`: This command processes the images in the `./fireHydrant` folder and generates a nerfstudio dataset in the `./fireHydrantDataset` directory. It automatically scales the images and calculates camera poses using COLMAP.
2. `ns-train in2n --data ./fireHydrantDataset`: This command trains a NERF model using the `in2n` method on the generated dataset in the `./fireHydrantDataset` directory. The `in2n` method allows for editing the NERF model with text instructions.
3. `ns-viewer --load-config ./fireHydrantDataset/config.txt`: This command loads the trained model checkpoint from the `./fireHydrantDataset` directory and starts the viewer, allowing you to render and edit the NERF model.

Note: Make sure you have the NerfStudio CLI installed and accessible in your environment before running these commands.

Is the student's solution the same as the actual solution just calculated:
No

Student grade:
Incorrect

Works!

### Stepwise, Sequential Prompting with Manual Intervention

We have a sequence of steps that build atop each other. I'd like to be able to intervene or ask clarifying questions when I feek there is an obvious mistake with a step. Since the model is capable of self-criticism, we can try something else. <br/>

Let's leverage this self-criticism ability to mock an interactive session where a user can reject a response at a step and/or provide feedback to request a correction.
- A command is generated at each step. I can either accept it by pressing 'enter' or reject it and let the model know what's wrong
- For eg, in the example below, we ask for an editable model but the model suggests nerfacto, which is not directly editable. <br/>
So I reject the suggestion and say that the 'nerfacto may not be editable'. This self-critiques and updates the step to use in2n instead.<br/>
- Such a flow can be quite useful when dealing with chains of commands or semi-technical users trying to reduce some manual effort.

In [27]:
# Function to get user input for the next prompt
def get_user_input(prompt_num):
    user_input = input(f"Step {prompt_num} - Enter to accept or type 'r' to reject: ")
    return user_input.strip()

def main():
    initial_scenario = "Given a folder of images of a fire hydrant in './fireHydrant', make a NERF model of it and render it. I'd like to edit this NERF with text.\n"
    step_prompts = [
        "Step 1: Process your images into a nerfstudio dataset using ns-process-data. Give the actual command\n",
        "Step 2: Train the model using ns-train. Give the actual command\n",
        "Step 3: Visualise the results with ns-viewer. Give the actual command\n",
    ]

    with open('context_noWorflow.txt', 'r') as file:
        context2 = file.read()
    context2 += '\n\n'+initial_scenario + '\n'
    responses = []
    

    for i, prompt in enumerate(step_prompts, start=1):
        context2 += '\n'+ prompt+ '\n'
        prompt_text =  context2
        display(Markdown(prompt_text))
        response = get_completion(prompt_text)
        display(Markdown(response))
        prompt_text += response + "\n"
        mistake = get_user_input(i)

        if mistake.lower() == "r":
            mistake = input("What was wrong?:")
            prompt_text += mistake + "\n"
            prompt_text += "Correct the last step based on the above feedback. Think step by step and keep it concise.\n"
            response = get_completion(prompt_text)
            display(Markdown(response))
        context2 += response + "\n"
        responses.append(response)
        clear_output(wait=True)

    print("Interaction Flow:")
    print('+'*20)
    print(context2)

    print("\Final resulting responses:")
    print('+'*20)
    for i, response in enumerate(responses, start=1):
        print(f"Step {i}:", response)

if __name__ == "__main__":
    main()

Interaction Flow:
++++++++++++++++++++
The following CLI modules is available for use:
- ns-process-data: Process images into a nerfstudio dataset. Scales images to a specified size and calculates the camera poses for each image using COLMAP.
  usage: ns-process-data [-h]
                       {images,video,polycam,metashape,realitycapture,record3d}
  ns-process-data images [-h] --data PATH --output-dir PATH
                       [--verbose | --no-verbose]
                       [--camera-type {perspective,fisheye,equirectangular}]
                       [--matching-method {exhaustive,sequential,vocab_tree}]
                       [--sfm-tool {any,colmap,hloc}]
                       [--refine-pixsfm | --no-refine-pixsfm]
                       [--feature-type 
{any,sift,superpoint,superpoint_aachen,superpoint_max,superpoint_inloc,r2d2,d2
net-ss,sosnet,disk}]
                       [--matcher-type 
{any,NN,superglue,superglue-fast,NN-superpoint,NN-ratio,NN-mutual,adalam}]
           