<a href="https://colab.research.google.com/github/zzyunzhi/scene-language/blob/dev/colab/text_to_scene.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Text-to-3D generation using the Scene Language.
[[Paper](https://arxiv.org/abs/2410.16770)]
[[Page](https://ai.stanford.edu/~yzzhang/projects/scene-language/)]
[[Code](https://github.com/zzyunzhi/scene-language)]

# Installation

In [None]:
# This is required by transforms3d for python 3.11.
# You will need to restart the session once afterwards.
!pip install "numpy<2.0.0"

In [None]:
!cd /content/ && git clone https://github.com/zzyunzhi/scene-language.git
!cd /content/scene-language && pip install -e .

In [None]:
%cd /content/scene-language

In [None]:
from pathlib import Path
import anthropic
import json

from engine.utils.argparse_utils import setup_save_dir, modify_string_for_file
from engine.constants import ENGINE_MODE, PROJ_DIR
import engine.utils.claude_client

from scripts.run_utils import SYSTEM_HEADER, run, SYSTEM_RULES, read_example, save_prompts

# Set LLM prompts

Running this script for one task prompt will consume ~`2.1k * NUM_COMPLETIONS` input tokens (as long as your task description is not crazily long) and <= `MAX_OUTPUT_TOKENS * NUM_COMPLETIONS` output tokens.
The exact input token counts will be displayed below before the actual LLM query.

In [None]:
# Please check https://docs.anthropic.com/en/api/overview#accessing-the-api
# if you are not sure what it is.
ANTHROPIC_API_KEY = ""

In [None]:
MAX_OUTPUT_TOKENS = 8192  # Decrease for lower cost budget
NUM_COMPLETIONS = 2

### Your scene description: ###
task = "A busy city."

In [None]:
engine.utils.claude_client.ANTHROPIC_API_KEY = ANTHROPIC_API_KEY
engine.utils.claude_client.MAX_TOKENS = MAX_OUTPUT_TOKENS

In [None]:
system_prompt = """\
You are a code completion model and can only write python functions wrapped within ```python```.

You are provided with the following `helper.py` which defines the given functions and definitions:
```python
{header}
```

{rules}

You should be precise and creative.
""".format(
    header=SYSTEM_HEADER, rules=SYSTEM_RULES
)


user_prompt = '''Here are some examples of how to use `helper.py`:
```python
{example}
```
IMPORTANT: THE FUNCTIONS ABOVE ARE JUST EXAMPLES, YOU CANNOT USE THEM IN YOUR PROGRAM!

Now, write a similar program for the given task:
```python
from helper import *

"""
{task}
"""
```
'''.format(
    task=task,
    example=read_example(animate=False),
)

client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

response = client.messages.count_tokens(
    model=engine.utils.claude_client.CLAUDE_MODEL_NAME,
    system=system_prompt,
    messages=[{
        "role": "user",
        "content": user_prompt,
    }],
)
print(f'The following cell will query {engine.utils.claude_client.CLAUDE_MODEL_NAME} {NUM_COMPLETIONS} times.')
print(f'In total, it will cost {json.loads(response.model_dump_json())["input_tokens"] * NUM_COMPLETIONS} input tokens \
and <= {MAX_OUTPUT_TOKENS * NUM_COMPLETIONS} output tokens.')


In [None]:
save_dir = Path("/content/outputs")
save_dir.mkdir(exist_ok=True)

name = modify_string_for_file(task)
save_subdir = save_dir / name
save_subdir.mkdir(exist_ok=True)
print(f"Outputs will be saved under {save_subdir}.")

# save system prompt and user prompt as .txt files
save_prompts(save_subdir.as_posix(), system_prompt, user_prompt)

# Query LLM & render


Running the following sesison multiple times the same system, user prompt, and temperature will retrieve cached results, even if you changed `MAX_OUTPUT_TOKENS`.
This means that you need to increase `num_completions` or manually remove `/content/scene-language/cache.json` for new queries.

In [None]:
_ = run(
    save_dir=save_subdir.as_posix(),
    user_prompt=user_prompt,
    system_prompt=system_prompt,
    extra_info={"task": task},
    lm_config={
        "num_completions": NUM_COMPLETIONS,
        "temperature": 0.2,
    },
)

In [None]:
from IPython.display import Image

for p in save_dir.glob("*/*/renderings/*.gif"):
    print(f"Displaying {p}")
    display(Image(filename=p))