# CADEvolve-style Generator Bootstrapping (Seminar)

Pipeline:
1) Write a **new client part family** spec (name + abstract + detailed constraints).
2) Ask **gpt-5-mini** to pick **5 similar** part families from `code_db.json`.
3) Feed the **5 generators** as few-shot context and synthesize a **new generator**.
4) Run it, visualize, validate, iterate.

## 0) Setup assumptions
- `code_db.json` exists (list or `{ "parts": [...] }`)
- Each record has `name` and `generator_code` (or similar).
- Python packages: `openai`, `numpy`, optional `cadquery`, optional `trimesh`.

In [None]:
import os, json, textwrap
from pathlib import Path
from typing import List, Tuple
import numpy as np
import trimesh

CODE_DB_PATH = Path("code_db.json")  

## 1) Load `code_db.json`

In [None]:
code_db = json.loads(CODE_DB_PATH.read_text(encoding="utf-8"))

parts = code_db

def get_part_name(p: dict) -> str:
    for k in ["name", "part_name", "title", "id"]:
        if k in p and isinstance(p[k], str) and p[k].strip():
            return p[k].strip()
    raise KeyError("No name field found.")

part_names = [get_part_name(p) for p in parts]
print("Loaded parts:", len(parts))
print("Example names:", part_names[:10])

## 2) Write your new client family spec

In [None]:
NEW_PART_NAME = "YOUR_PART_FAMILY_NAME"

NEW_PART_ABSTRACT = (
    "2–4 lines: what is the family, what is invariant, what varies."
)

NEW_PART_DETAILED = "\n".join([
    "Detailed constraints (bullets encouraged):",
    "- Context",
    "- Narrow requirements / invariants",
    "- What to reconstruct from STL",
    "- Generator constraints: parameter ranges, allowed CAD ops",
    "- Validation criteria",
])

assert NEW_PART_NAME != "YOUR_PART_FAMILY_NAME"
print("OK:", NEW_PART_NAME)

## 3) Configure OpenAI client (set OPENAI_API_KEY)

In [None]:
# pip install openai
from openai import OpenAI

assert os.getenv("OPENAI_API_KEY"), "Set OPENAI_API_KEY env var"
client = OpenAI()

## 4) Similarity search over part names (pick 5)

### Вариант A — через GPT API (автоматически в ноутбуке)



In [None]:
import json, textwrap
from typing import List
from openai import OpenAI

client = OpenAI()

def llm_pick_similar_parts_api(
    new_name: str,
    abstract: str,
    detailed: str,
    candidate_names: List[str],
    k: int = 5,
    model: str = "gpt-5-mini",
) -> List[str]:
    name_blob = "\n".join(f"- {n}" for n in candidate_names)

    instructions = textwrap.dedent(f"""
    Pick exactly {k} MOST SIMILAR part families from the candidate list.
    Output MUST be valid JSON: {{"similar": ["name1", ...]}} with exactly {k} items.
    Rules:
    - Each name must be copied EXACTLY from the candidate list.
    - No extra keys, no commentary.
    """).strip()

    user_input = "\n".join([
        "NEW PART FAMILY",
        f"Name: {new_name}",
        "",
        "Abstract:",
        abstract,
        "",
        "Detailed description:",
        detailed,
        "",
        "CANDIDATE PART FAMILIES (names only):",
        name_blob,
    ])

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": user_input},
        ],
    )
    data = json.loads(resp.output_text)
    sims = data["similar"]

    if not (isinstance(sims, list) and len(sims) == k and all(isinstance(x, str) for x in sims)):
        raise ValueError(f"Bad format: {data}")
    missing = [x for x in sims if x not in candidate_names]
    if missing:
        raise ValueError(f"Names not in DB: {missing}")
    return sims

similar_names = llm_pick_similar_parts_api(
    NEW_PART_NAME, NEW_PART_ABSTRACT, NEW_PART_DETAILED, part_names, k=5
)
similar_names




### Вариант B — вручную в чате (ChatGPT / Gemini / Claude)

Markdown-инструкция для студентов

### Поиск 5 похожих семейств вручную (в чате)

1) Откройте любой LLM (ChatGPT / Gemini / Claude).
2) Скопируйте промпт ниже.
3) Модель должна вернуть **строго JSON**:
`{"similar": ["name1", "name2", "name3", "name4", "name5"]}`
4) Скопируйте JSON обратно в ноутбук и распарсьте.

#### Промпт
You are helping to bootstrap a CAD generator for a NEW part family.

NEW PART FAMILY
Name: <NEW_PART_NAME>

Abstract:
<NEW_PART_ABSTRACT>

Detailed description:
<NEW_PART_DETAILED>

CANDIDATE PART FAMILIES (names only):
- name_a
- name_b
- ...

Task:
Pick exactly 5 MOST SIMILAR part families from the candidate list.
Output MUST be valid JSON: {"similar": ["name1", "name2", "name3", "name4", "name5"]}
Rules:
- Each name must be copied EXACTLY from the list.
- No extra keys, no commentary.

Python (вставить JSON из чата)



In [None]:
import json

# вставьте сюда ответ модели (строго JSON)
manual_json = r'''{"similar": ["...", "...", "...", "...", "..."]}'''

data = json.loads(manual_json)
similar_names = data["similar"]

# проверка
assert isinstance(similar_names, list) and len(similar_names) == 5
missing = [x for x in similar_names if x not in part_names]
assert not missing, f"Names not in DB: {missing}"

similar_names




### Вариант C — предвычисленные похожие имена (без LLM)

Формат файла (.txt): обычный Python-список строк, например:

['bracket_plate_slot',
 'bracketed_plate_slot',
 'corner_block_bracket',
 'medial_gusset_plate',
 'conformal_bracket_box']

Python: загрузить список из .txt (надежно)





In [None]:
from pathlib import Path
import ast

SIMILAR_DIR = Path("CADEvolve seminar/similar_names")  # поправьте путь
path = SIMILAR_DIR / f"{NEW_PART_NAME}.txt"

txt = path.read_text(encoding="utf-8")
similar_names = ast.literal_eval(txt)   # безопасно парсит литералы Python

# проверки
assert isinstance(similar_names, list), type(similar_names)
assert len(similar_names) == 5, f"Expected 5 names, got {len(similar_names)}"
missing = [x for x in similar_names if x not in part_names]
assert not missing, f"Names not in DB: {missing}"

similar_names


In [None]:
def get_similar_names(method: str = "precomputed") -> list[str]:
    method = method.lower()
    if method == "api":
        return llm_pick_similar_parts_api(
            NEW_PART_NAME, NEW_PART_ABSTRACT, NEW_PART_DETAILED, part_names, k=5
        )
    if method == "manual":
        raise RuntimeError("Use manual_json cell (paste JSON from chat).")
    if method == "precomputed":
        data = json.loads((SIMILAR_DIR / f"{NEW_PART_NAME}.json").read_text(encoding="utf-8"))
        return data["similar"]
    raise ValueError("method must be one of: api | manual | precomputed")

# example:
# similar_names = get_similar_names("api")

## 5) Pull the 5 generators from DB

In [None]:
def find_part_record_by_name(parts: List[dict], name: str) -> dict:
    for p in parts:
        if get_part_name(p) == name:
            return p
    raise KeyError(name)

def get_generator_code(p: dict) -> str:
    for k in ["generator_code", "code", "py", "source"]:
        if k in p and isinstance(p[k], str) and p[k].strip():
            return p[k]
    raise KeyError(f"No generator code in record: {get_part_name(p)}")

similar_records = [find_part_record_by_name(parts, n) for n in similar_names]
similar_generators: List[Tuple[str, str]] = [(get_part_name(r), get_generator_code(r)) for r in similar_records]

print("Pulled:", [n for n,_ in similar_generators])

## 6) Synthesize a new generator (few-shot)

### Вариант A — через GPT API 



In [None]:
from pathlib import Path
from typing import List, Tuple

GENERATOR_OUTPUT_PATH = Path("../data/generators/client_part.py")

def llm_synthesize_generator_api(
    new_name: str,
    abstract: str,
    detailed: str,
    fewshot_generators: List[Tuple[str, str]],
    model: str = "gpt-5-mini",
) -> str:
    fewshot_blob = "\n\n".join(
        f"### Example: {name}\n<CODE>\n{code}\n</CODE>"
        for name, code in fewshot_generators
    )

    instructions = "\n".join([
        "You are generating a *parametric CAD generator* in CadQuery style.",
        "",
        "GOAL:",
        "- Maximize reliability: default parameters MUST always produce a valid, non-empty solid.",
        "- Prefer simpler geometry over matching every minor detail.",
        "",
        "OPERATION VOCABULARY (very important):",
        "- Use ONLY CAD operations/patterns that appear in the FEW-SHOT EXAMPLES.",
        "- Prioritize the most common, robust primitives: sketch + extrude/revolve/loft, union, cut, translate/rotate.",
        "- If a feature would require an operation NOT present in the examples, DO NOT implement that feature; simplify it away.",
        "",
        "CODE REQUIREMENTS (mandatory):",
        "- The code MUST define exactly ONE top-level generator function named <generator_name>.",
        "- The function signature MUST be explicit keyword parameters with type hints and defaults, like:",
        "  def my_part(a: float = 1.0, b: int = 3, ... ) -> cq.Workplane:",
        "- The function MUST return a cadquery.Workplane (cq.Workplane).",
        "- Import cadquery as cq inside the code (either at top or inside the function).",
        "- No classes. No DEFAULT_PARAMS dict. No make_part().",
        "",
        "ROBUSTNESS (mandatory):",
        "- Include basic parameter validation (raise ValueError) for invalid ranges (<=0, thickness>=radius, counts<1, etc.).",
        "- Avoid fragile features: extensive fillets, complex edge selectors, thin walls with tight tolerances, self-intersections.",
        "- Do NOT use try/except to mask failures. Instead, design the geometry so it does not need try/except.",
        "- If fillet is necessary, keep it minimal and on safe edges; otherwise omit fillet.",
        "- Use small eps/fudge overlaps where needed so boolean ops are stable.",
        "",
        "NAMING:",
        "- <generator_name> MUST be snake_case and should reflect the part family name.",
        "",
        "OUTPUT FORMAT (mandatory):",
        "Return ONLY python code (no markdown fences, no wrapper block).",
        "",
        "DO NOT:",
        "- Do NOT output multiple functions.",
        "- Do NOT output explanations.",
    ])

    user_input = "\n".join([
        "NEW PART FAMILY",
        f"Name: {new_name}",
        "",
        "Abstract:",
        abstract,
        "",
        "Detailed description:",
        detailed,
        "",
        "FEW-SHOT EXAMPLES:",
        fewshot_blob,
        "",
        "Now write the generator code.",
    ])

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": user_input},
        ],
    )
    return resp.output_text

new_code = llm_synthesize_generator_api(
    NEW_PART_NAME, NEW_PART_ABSTRACT, NEW_PART_DETAILED, similar_generators
)

GENERATOR_OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)
GENERATOR_OUTPUT_PATH.write_text(new_code, encoding="utf-8")
print("Wrote:", GENERATOR_OUTPUT_PATH.resolve())




### Вариант B — вручную в чате (любой LLM), затем вставить код в ноутбук

#### 1) Markdown-инструкция (для студентов)

Синтез генератора вручную (без API)

1) Откройте любой LLM (ChatGPT / Gemini / Claude).
2) Скопируйте промпт ниже целиком.
3) Модель должна вернуть **ТОЛЬКО Python-код** (без объяснений и без ```).
4) Скопируйте код в ноутбук в переменную `manual_code` и сохраните в файл `client_part.py`.

#### 2) Промпт для чата (шаблон)

You are generating a *parametric CAD generator* in CadQuery style.

GOAL:
- Maximize reliability: default parameters MUST always produce a valid, non-empty solid.
- Prefer simpler geometry over matching every minor detail.

OPERATION VOCABULARY (very important):
- Use ONLY CAD operations/patterns that appear in the FEW-SHOT EXAMPLES.
- Prioritize: sketch + extrude/revolve/loft, union, cut, translate/rotate.
- If a feature would require an operation NOT present in the examples, DO NOT implement that feature.

CODE REQUIREMENTS (mandatory):
- The code MUST define exactly ONE top-level generator function named <generator_name>.
- Signature must be explicit keyword params with type hints and defaults:
  def my_part(a: float = 1.0, b: int = 3, ... ) -> cq.Workplane:
- Must return cq.Workplane.
- Import cadquery as cq (top-level or inside function).
- No classes. No DEFAULT_PARAMS. No make_part().

ROBUSTNESS:
- Validate parameters (raise ValueError).
- Avoid fragile features (heavy fillets, complex selectors, self-intersections).
- Do NOT use try/except.
- Use small eps/fudge overlaps where booleans need stability.

NAMING:
- <generator_name> must be snake_case reflecting the part family name.

OUTPUT:
Return ONLY python code. No markdown fences. No explanations.

NEW PART FAMILY
Name: <NEW_PART_NAME>

Abstract:
<NEW_PART_ABSTRACT>

Detailed description:
<NEW_PART_DETAILED>

FEW-SHOT EXAMPLES:
### Example: ...
<CODE>
...
</CODE>

### Example: ...
<CODE>
...
</CODE>

Now write the generator code.



#### 3) Python-ячейка: вставить код и сохранить



In [None]:
from pathlib import Path

GENERATOR_OUTPUT_PATH = Path("../data/generators/client_part.py")

manual_code = r'''
# paste LLM output here (ONLY python code)
'''

# минимальная проверка (опционально)
assert "def " in manual_code and "cadquery" in manual_code.lower()

GENERATOR_OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)
GENERATOR_OUTPUT_PATH.write_text(manual_code.strip() + "\n", encoding="utf-8")
print("Wrote:", GENERATOR_OUTPUT_PATH.resolve())




## 7) Run generator (defaults) + export STL + preview

In [None]:
import importlib.util
import inspect
import re
from pathlib import Path

GENERATOR_OUTPUT_PATH = Path("../data/generators/client_part.py")
PREVIEW_STL_PATH = Path("preview.stl")

def import_module_from_path(path: Path, module_name: str = "client_part_mod"):
    spec = importlib.util.spec_from_file_location(module_name, str(path))
    mod = importlib.util.module_from_spec(spec)
    assert spec and spec.loader
    spec.loader.exec_module(mod)
    return mod

def find_single_generator_function(mod):
    """
    Ищем ровно одну "публичную" функцию-генератор:
    - callable
    - не начинается с '_'
    - определена в этом модуле
    """
    funcs = []
    for name, obj in vars(mod).items():
        if not callable(obj):
            continue
        if name.startswith("_"):
            continue
        # только функции, определённые в этом файле (не импорты)
        if getattr(obj, "__module__", None) != mod.__name__:
            continue
        # отбросим очевидные утилиты, если вдруг появятся
        if name in {"main", "run", "test"}:
            continue
        funcs.append((name, obj))
    if len(funcs) != 1:
        raise ValueError(f"Expected exactly 1 generator function, found: {[n for n,_ in funcs]}")
    return funcs[0]  # (name, fn)

mod = import_module_from_path(GENERATOR_OUTPUT_PATH)
gen_name, gen_fn = find_single_generator_function(mod)

print("Loaded generator:", gen_name)
print("Signature:", inspect.signature(gen_fn))

# 1) построить деталь с параметрами по умолчанию (ничего не передаём)
part = gen_fn()

# 2) экспорт STL
from cadquery import exporters
shape = part.val() if hasattr(part, "val") else part
exporters.export(shape, str(PREVIEW_STL_PATH))
print("Exported:", PREVIEW_STL_PATH.resolve())

# 3) просмотр через trimesh (если установлен)
try:
    import trimesh
    m = trimesh.load(PREVIEW_STL_PATH, force="mesh")
    print("Mesh:", m.vertices.shape, m.faces.shape, "watertight:", m.is_watertight)
    m.show()
except Exception as e:
    print("trimesh preview skipped:", e)

Loaded generator: ergoshell_chair
Signature: (seat_width: float = 420.0, seat_depth: float = 420.0, seat_rise: float = 60.0, wall_thickness: float = 4.0, top_shrink: float = 30.0, backrest_height: float = 260.0, backrest_thickness: float = 20.0, backrest_width_ratio: float = 0.85, leg_height: float = 420.0, leg_radius: float = 16.0, leg_inset: float = 55.0, mount_hole_radius: float = 4.0, mount_hole_dx: float = 90.0, mount_hole_dy: float = 70.0, mount_hole_z: float = 20.0, fillet_radius: float = 1.5, leg_fillet_radius: float = 0.0) -> 'cq.Workplane'
Exported: /Users/zhemchuzhnikov/CADEvolve seminar/llm-based generation/preview.stl
Mesh: (2296, 3) (4584, 3) watertight: True


## 8) Basic validation (optional)

In [None]:
def basic_mesh_checks(m) -> List[str]:
    problems = []
    if m is None:
        return ["mesh is None"]
    if getattr(m, "faces", None) is None or len(m.faces) == 0:
        problems.append("no faces")
    if getattr(m, "vertices", None) is None or len(m.vertices) == 0:
        problems.append("no vertices")
    if hasattr(m, "bounds"):
        b = np.array(m.bounds)
        if not np.isfinite(b).all():
            problems.append("non-finite bounds")
        span = b[1] - b[0]
        if (span <= 1e-6).any():
            problems.append(f"degenerate span: {span}")
    if hasattr(m, "is_watertight") and not m.is_watertight:
        problems.append("not watertight (may be ok)")
    return problems

assert trimesh is not None, "Install trimesh for validation"
m = trimesh.load("preview.stl", force="mesh")
basic_mesh_checks(m)

[]

## 9) Repair loop (iterate)

### 9A) Repair via API 

In [None]:
REPAIR_NOTES = "\n".join([
    "Observed issues:",
    "- ...",
    "Spec mismatches:",
    "- ...",
    "Requested fixes:",
    "- ...",
])

def llm_repair_generator_api(old_code: str, repair_notes: str) -> str:
    instructions = "\n".join([
        "You are fixing a CadQuery parametric generator.",
        "",
        "HARD RULES:",
        "- Output ONLY python code (no markdown fences, no explanations).",
        "- The file must define EXACTLY ONE top-level generator function (snake_case).",
        "- The generator must return cq.Workplane.",
        "- No classes. No DEFAULT_PARAMS. No make_part().",
        "- Do NOT use try/except to hide failures; ensure geometry is robust instead.",
        "- Prefer reliability over complexity; simplify if needed.",
        "",
        "ROBUSTNESS:",
        "- Keep boolean ops stable (use small eps/fudge overlaps when needed).",
        "- Avoid fragile fillets/edge selectors; omit fillets unless clearly safe.",
        "- Include basic parameter validation (raise ValueError).",
    ])

    user_input = "\n".join([
        "PART SPEC",
        f"Name: {NEW_PART_NAME}",
        "",
        "Abstract:",
        NEW_PART_ABSTRACT,
        "",
        "Detailed description:",
        NEW_PART_DETAILED,
        "",
        "REPAIR NOTES:",
        repair_notes,
        "",
        "CURRENT GENERATOR CODE:",
        "<CODE>",
        old_code,
        "</CODE>",
        "",
        "Now output the fixed python code.",
    ])

    resp = client.responses.create(
        model="gpt-5-mini",
        input=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": user_input},
        ],
    )
    return resp.output_text

old = GENERATOR_OUTPUT_PATH.read_text(encoding="utf-8")
fixed = llm_repair_generator_api(old, REPAIR_NOTES)

GENERATOR_OUTPUT_PATH.write_text(fixed.strip() + "\n", encoding="utf-8")
print("Patched:", GENERATOR_OUTPUT_PATH.resolve())

### 9B) Manual repair in chat (ChatGPT / Gemini / Claude)

### Manual generator repair (no API)

1) Open any LLM chat (ChatGPT / Gemini / Claude).
2) Paste the prompt below (spec + repair notes + current code).
3) The model must return **ONLY Python code** (no explanations, no markdown fences).
4) Replace the content of `../data/generators/client_part.py` with the returned code.
5) Re-run cell **7** (run + export + preview).

Hard rules:
- The file must contain **exactly one** top-level generator function `def <name>(...) -> cq.Workplane`.
- No classes, no `DEFAULT_PARAMS`, no `make_part`.
- Do **not** use try/except to mask failures — simplify geometry instead.

Prompt template (EN):

You are fixing a CadQuery parametric generator.

HARD RULES:
- Output ONLY python code (no markdown fences, no explanations).
- The file must define EXACTLY ONE top-level generator function (snake_case).
- The generator must return cq.Workplane.
- No classes. No DEFAULT_PARAMS. No make_part().
- Do NOT use try/except to hide failures; ensure geometry is robust instead.
- Prefer reliability over complexity; simplify if needed.

PART SPEC
Name: <NEW_PART_NAME>

Abstract:
<NEW_PART_ABSTRACT>

Detailed description:
<NEW_PART_DETAILED>

REPAIR NOTES:
<REPAIR_NOTES>

CURRENT GENERATOR CODE:
<CODE>
... paste current client_part.py ...
</CODE>

Now output the fixed python code.