# Getting set up

# Example 1 — Binary-to-BCD Converter (Combinational)

## Origin
This example is taken directly from **ChipChat Example 1** (`binary_to_bcd`).
In ChipChat, the task was posed as a natural-language question to the LLM:

> *"I am trying to create a Verilog model `binary_to_bcd_converter` for a binary
> to binary-coded-decimal converter. It must meet the following specifications:*
> - *Inputs: Binary input (5-bits)*
> - *Outputs: BCD (8-bits: 4-bits for the 10's place and 4-bits for the 1's place)*
>
> *How would I write a design that meets these specifications?"*

ChipChat produced a working implementation using the **double-dabble algorithm**
(shift-and-add-3), which is a standard hardware-friendly BCD conversion method.

## Testbench
The testbench `binary_to_bcd_tb.v` was taken directly from the course assignment
repository (`LLM4ChipDesign`) as provided in the
ChipChat Colab and required by the AutoChip Tutorial Assignment.
It is the official course-supplied testbench for this module.

It exhaustively tests all 32 inputs (0–31). For each input `i` it checks:
- `bcd_output[3:0] == i % 10` (ones digit)
- `bcd_output[7:4] == i / 10` (tens digit)

A pass requires **0 mismatches across all 32 cases**, giving a final AutoChip rank of **1.0**.

## Why AutoChip instead of ChipChat
AutoChip improves on the ChipChat workflow by:
1. Running **multiple candidate RTL responses per iteration** (5 here) and ranking them
2. Automatically feeding **compiler and simulation errors back** to the LLM as structured feedback
3. Iterating until a candidate achieves **rank 1.0** (0 mismatches across all test cases)

## Module specification
- **Module name:** `binary_to_bcd_converter`
- **Input:** `binary_input [4:0]` — 5-bit unsigned binary value (range 0–31)
- **Output:** `bcd_output [7:0]` — 8-bit BCD, upper nibble `[7:4]` = tens digit, lower nibble `[3:0]` = ones digit
- **Logic type:** Combinational only (`always @(*)`) — no clock or reset needed
- **Constraint:** Plain Verilog-2001, no SystemVerilog (`logic`, `typedef`, `enum` banned)

In [1]:
#@title Setting up the notebook

### Installing dependencies
!pip install openai tiktoken

!apt-get update
!apt-get install -y iverilog

Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://cli.github.com/packages stable InRelease [3,917 B]
Get:3 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [85.0 kB]
Get:6 https://cli.github.com/packages stable/main amd64 Packages [356 B]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:9 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,902 kB]
Get:10 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]
Get:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease [24.6 kB]
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [9,742 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:14 https:/

In [2]:
#@title Utility functions

import sys
import os
import openai
import tiktoken
from abc import ABC, abstractmethod
import re
import getopt
import json
import subprocess


################################################################################
### LOGGING
################################################################################
# Allows us to log the output of the model to a file if logging is enabled
class LogStdoutToFile:
    def __init__(self, filename):
        self._filename = filename
        self._original_stdout = sys.stdout

    def __enter__(self):
        if self._filename:
            sys.stdout = open(self._filename, 'w')
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        if self._filename:
            sys.stdout.close()
        sys.stdout = self._original_stdout

################################################################################
### CONFIG & ARGS
################################################################################
def load_config(config_file="config.json"):
    """Load and validate the configuration from the specified JSON file."""
    with open(config_file, 'r') as file:
        config = json.load(file)

    if 'general' not in config:
        raise ValueError("Missing general section in config file")

    config_values = config['general']

    # Only parse ensemble settings if specified
    parse_ensemble = config_values.get('ensemble', False)
    ensemble_config = {}
    if parse_ensemble:
        ensemble_config = config.get('ensemble', {})

    #return config_values
    return config_values, ensemble_config


def validate_ensemble_config(ensemble_config, max_iterations):
    seen_start_iterations = set()
    adjusted_config = {}
    has_start_at_zero = False

    for model_name, model_info in ensemble_config.items():
        start_iteration = model_info['start_iteration']

        # Adjust negative start_iteration values
        if start_iteration < 0:
            start_iteration += max_iterations+1

        # Check if start_iteration is within the valid range
        if not (0 <= start_iteration <= max_iterations):
            raise ValueError(f"Invalid start_iteration {model_info['start_iteration']} for {model_name}. "
                             f"Must be within the range of 0 to {max_iterations} or valid negative index.")

        # Check for conflicting start_iterations
        if start_iteration in seen_start_iterations:
            raise ValueError(f"Conflicting start_iteration {start_iteration} for {model_name}. "
                             f"Another model already uses this start iteration.")
        seen_start_iterations.add(start_iteration)

        # Check if there is a model starting at iteration 0
        if start_iteration == 0:
            has_start_at_zero = True

        # Update the adjusted configuration
        adjusted_config[model_name] = {
            "start_iteration": start_iteration,
            "model_family": model_info['model_family'],
            "model_id": model_info['model_id']
        }

        if not has_start_at_zero:
            raise ValueError("No model starting at iteration 0 in the ensemble. One model must start at iteration 0.")

    return adjusted_config


def parse_args_and_config():
    """Parse command-line arguments and merge them with configuration file values."""
    usage = """Usage: auto_create_verilog.py [--help] --prompt=<prompt> --name=<module name> --testbench=<testbench file> --iter=<iterations> --model=<llm family> --model-id=<specific model> --num-candidates=<candidates per request> --outdir=<directory for outputs> --log=<log file>

	-h|--help: Prints this usage message

	-p|--prompt: The initial design prompt for the Verilog module

	-n|--name: The module name, must match the testbench expected module name

	-t|--testbench: The testbench file to be run

	-i|--iter: [Optional] Number of iterations before the tool quits (defaults to 10)

	-m|--model: The LLM family to use. Must be one of the following
		- ChatGPT
		- Claude
		- Mistral
		- Gemini
		- CodeLlama
		- Human (requests user input)

	--model-id: The specific model to use for the model family

	--num-candidates: The number of candidates to rank per tree level

	-o|--outdir: Directory to place all run-specific files in

	-l|--log: [Optional] Log the output of the model to the given file
"""

    config_file = "config.json"

    # Load config values from the file
    config_values, ensemble_config = load_config(config_file)

    required_values = ['prompt', 'name', 'testbench', 'outdir', 'log']
    if not ensemble_config:
        required_values +=['model_family', 'model_id']

    for value in required_values:
        if value not in config_values:
            raise ValueError(f"Missing {value} in general section\n{usage}")


    # general values for optional config values
    if 'num_candidates' not in config_values:
        config_values['num_candidates'] = 1
    if 'iterations' not in config_values:
        config_values['iterations'] = 10


    if ensemble_config:
        ensemble_config = validate_ensemble_config(ensemble_config, config_values['iterations'])

    # Ensure outdir exists
    if config_values['outdir']:
        os.makedirs(config_values['outdir'], exist_ok=True)

    logfile = os.path.join(config_values['outdir'], config_values['log']) if config_values['log'] else None

    #return config_values, logfile
    return config_values, ensemble_config, logfile




################################################################################
### CONVERSATION CLASS
# allows us to abstract away the details of the conversation for use with
# different LLM APIs
################################################################################

class Conversation:
    def __init__(self, log_file=None):
        self.messages = []
        self.log_file = log_file

        if self.log_file and os.path.exists(self.log_file):
            open(self.log_file, 'w').close()

    def add_message(self, role, content):
        """Add a new message to the conversation."""
        self.messages.append({'role': role, 'content': content})

        if self.log_file:
            with open(self.log_file, 'a') as file:
                file.write(f"{role}: {content}\n")

    def get_messages(self):
        """Retrieve the entire conversation."""
        return self.messages

    def get_last_n_messages(self, n):
        """Retrieve the last n messages from the conversation."""
        return self.messages[-n:]

    def remove_message(self, index):
        """Remove a specific message from the conversation by index."""
        if index < len(self.messages):
            del self.messages[index]

    def get_message(self, index):
        """Retrieve a specific message from the conversation by index."""
        return self.messages[index] if index < len(self.messages) else None

    def clear_messages(self):
        """Clear all messages from the conversation."""
        self.messages = []

    def __str__(self):
        """Return the conversation in a string format."""
        return "\n".join([f"{msg['role']}: {msg['content']}" for msg in self.messages])

################################################################################
### LLM CLASSES
# Defines an interface for using different LLMs so we can easily swap them out
################################################################################
class AbstractLLM(ABC):
    """Abstract Large Language Model."""

    def __init__(self):
        pass

    @abstractmethod
    def generate(self, conversation: Conversation, num_candidates=1):
        """Generate a response based on the given conversation."""
        pass


class ChatGPT(AbstractLLM):
    """ChatGPT Large Language Model."""

    def __init__(self, model_id="gpt-4o-mini"):
        super().__init__()
        openai.api_key=os.environ['OPENAI_API_KEY']
        self.client = openai.OpenAI()
        self.model_id = model_id

    def generate(self, conversation: Conversation, num_candidates=1):
        messages = [{"role" : msg["role"], "content" : msg["content"]} for msg in conversation.get_messages()]


        #print(f"model_id: {self.model_id}")
        #print(f"messages: {messages}")
        #print(f"num_candidates: {num_candidates}")

        response = self.client.chat.completions.create(
            model=self.model_id,
            n=num_candidates,
            messages = messages,
        )

        return [c.message.content for c in response.choices]

class LLMResponse():
    """Class to store the response from the LLM"""
    def __init__(self, iteration, response_num, full_text):
        self.iteration = iteration
        self.response_num = response_num

        self.full_text = full_text
        self.tokens = 0

        self.parsed_text = ""
        self.parsed_length = 0

        self.feedback = ""
        self.compiled = False
        self.rank = -3
        self.message = ""

    def set_parsed_text(self, parsed_text):
        self.parsed_text = parsed_text
        self.parsed_length = len(parsed_text)

    def parse_verilog(self):
        module_list = find_verilog_modules(self.full_text)
        if not module_list:
            print("No modules found in response")
            self.parsed_text = ""
        else:
            for module in module_list:
                self.parsed_text += module + "\n\n"
        self.parsed_length = len(self.parsed_text)

    def calculate_rank(self, outdir, module, testbench):
        filename = os.path.join(outdir,module+".sv")
        vvp_file = os.path.join(outdir,module+".vvp")

        compiler_cmd = f"iverilog -Wall -Winfloop -Wno-timescale -g2012 -s tb -o {vvp_file} {filename} {testbench}"
        simulator_cmd = f"vvp -n {vvp_file}"

        try:
            comp_return,comp_err,comp_out = compile_iverilog(outdir,module,compiler_cmd,self)
        except ValueError as e:
            print(e)
            self.rank = -2
            return

        if comp_return != 0:
          self.feedback = comp_err
          self.compiled = False
          print("Compilation error")
          print("----- IVERILOG STDERR -----")
          print(comp_err)
          print("---------------------------")
          self.message = "The design failed to compile. Please fix the module. The output of iverilog is as follows:\n"+comp_err
          self.rank = -1


        elif comp_err != "":
            self.feedback = comp_err
            self.compiled = True
            print("Compilation warning")
            self.message = "The design compiled with warnings. Please fix the module. The output of iverilog is as follows:\n"+comp_err

            self.rank = -0.5

        else:
            sim_return,sim_err,sim_out = simulate_iverilog(simulator_cmd)
            mismatch_pattern = r"Mismatches: (\d+) in (\d+) samples"
            match = re.search(mismatch_pattern, sim_out)

            if match:
                mismatches = int(match.group(1))
                samples = int(match.group(2))

            elif "All test cases passed!" in sim_out:
                # Treat as zero mismatches
                mismatches = 0
                # Binary-to-BCD testbench runs 32 cases
                samples = 32

            else:
                raise ValueError("Simulation output does not contain recognizable result summary")


            if mismatches > 0:
                self.feedback = sim_out
                self.compiled = True
                print("Simulation error")
                self.message = "The testbench simulated, but had errors. Please fix the module. The output of iverilog is as follows:\n"+sim_out
            else:
                self.compiled = True
                print("Testbench ran successfully")
                self.message = "The testbench completed successfully"

            print(f"Mismatches: {mismatches}")
            print(f"Samples: {samples}")
            self.rank = (samples-mismatches)/samples

################################################################################
### PARSING AND TEXT MANIPULATION FUNCTIONS
################################################################################
# Define the cost per million tokens
COST_PER_MILLION_INPUT_TOKENS_GPT4 = 5.0
COST_PER_MILLION_OUTPUT_TOKENS_GPT4 = 15.0

COST_PER_MILLION_INPUT_TOKENS_GPT4M = 0.15
COST_PER_MILLION_OUTPUT_TOKENS_GPT4M = 0.60

COST_PER_MILLION_INPUT_TOKENS_GPT = 0.50
COST_PER_MILLION_OUTPUT_TOKENS_GPT = 1.50

COST_PER_MILLION_INPUT_TOKENS_CLAUDE = 0.25
COST_PER_MILLION_OUTPUT_TOKENS_CLAUDE = 1.25

# Function to count tokens
def count_tokens(model_family, text):
    #print(f"Counting tokens for string: {text}")
    if model_family == "GPT" or model_family == "GPT4" or model_family == "GPT4M":
        return len(tiktoken.get_encoding("cl100k_base").encode(text))
    elif model_family == "claude":
        return anthropic.Client().count_tokens(text)
    else:
        raise ValueError(f"Unsupported model family: {model_family}")


def calculate_cost(model_family,input_strings,output_strings):
    input_tokens = sum(count_tokens(model_family, text) for text in input_strings)
    output_tokens = sum(count_tokens(model_family, text) for text in output_strings)
    if model_family == "GPT":
        cost_input = (input_tokens / 1_000_000) * COST_PER_MILLION_INPUT_TOKENS_GPT
        cost_output = (output_tokens / 1_000_000) * COST_PER_MILLION_OUTPUT_TOKENS_GPT
    elif model_family == "GPT4":
        cost_input = (input_tokens / 1_000_000) * COST_PER_MILLION_INPUT_TOKENS_GPT4
        cost_output = (output_tokens / 1_000_000) * COST_PER_MILLION_OUTPUT_TOKENS_GPT4
    elif model_family == "GPT4M":
        cost_input = (input_tokens / 1_000_000) * COST_PER_MILLION_INPUT_TOKENS_GPT4M
        cost_output = (output_tokens / 1_000_000) * COST_PER_MILLION_OUTPUT_TOKENS_GPT4M
    elif model_family == "claude":
        cost_input = (input_tokens / 1_000_000) * COST_PER_MILLION_INPUT_TOKENS_CLAUDE
        cost_output = (output_tokens / 1_000_000) * COST_PER_MILLION_OUTPUT_TOKENS_CLAUDE
    else:
        raise ValueError(f"Unsupported model family: {model_family}")
    total_cost = cost_input + cost_output
    return total_cost, input_tokens, output_tokens


def format_message(role, content):
    return f"\n{{role : '{role}', content : '{content}'}}"

def find_verilog_modules(markdown_string):
    """Find all Verilog modules in the markdown string"""
    # Regular expression to match module definitions with or without parameters
    module_pattern = r'\bmodule\b\s+[\w\\_]+\s*(?:#\s*\([^)]*\))?\s*\([^)]*\)\s*;.*?endmodule\b'
    # Find all matches in the input string
    matches = re.findall(module_pattern, markdown_string, re.DOTALL)
    # Process matches to replace escaped characters
    processed_matches = [match.replace('\\_', '_') for match in matches]
    return processed_matches

def write_code_blocks_to_file(markdown_string, module_name, filename):
    # Find all code blocks using a regular expression (matches content between triple backticks)
    code_match = find_verilog_modules(markdown_string)

    if not code_match:
        print("No code blocks found in response")
        exit(3)

    # Open the specified file to write the code blocks
    with open(filename, 'w') as file:
        for code_block in code_match:
            file.write(code_block)
            file.write('\n')


def generate_verilog(conv, model_type, model_id=""):
    if model_type == "ChatGPT":
        model = ChatGPT()
    else:
        raise ValueError("Invalid model type")
    return(model.generate(conv))

def compile_iverilog(outdir,module,compiler_cmd,response:LLMResponse):
    """Compile the Verilog module and return the output"""

    filename = os.path.join(outdir,module+".sv")
    write_code_blocks_to_file(response.parsed_text, "module", filename)

    attempt = 0
    while attempt < 3:
        try:
            proc = subprocess.run(compiler_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, timeout=120)
            return proc.returncode, proc.stderr, proc.stdout
        except subprocess.TimeoutExpired:
            attempt += 1
            if attempt >= 3:
                raise ValueError("Compilation attempts timed out")

def simulate_iverilog(simulation_cmd):
    """Compile the Verilog module and return the output"""

    attempt = 0
    while attempt < 3:
        try:
            proc = subprocess.run(simulation_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, timeout=120)
            return proc.returncode, proc.stderr, proc.stdout
        except subprocess.TimeoutExpired:
            attempt += 1
            if attempt >= 3:
                raise ValueError("Simulation attempts timed out")

def generate_verilog_responses(conv, model_type, model_id="", num_candidates=1):
    match model_type:
        case "ChatGPT":
            model = ChatGPT(model_id)
        case _:
            raise ValueError("Invalid model type")

    return(model.generate(conversation=conv, num_candidates=num_candidates))

def get_iteration_ensemble(iteration, ensemble_config):

    sorted_ensemble = sorted(ensemble_config.values(), key=lambda x: x['start_iteration'], reverse=True)

    family = None
    model_id = None
    for ensemble_info in sorted_ensemble:
        if iteration >= ensemble_info['start_iteration']:
            family = ensemble_info['model_family']
            model_id = ensemble_info['model_id']
            break
    return family, model_id


In [3]:
import subprocess
import sys
import os

def verilog_loop(design_prompt, module, testbench, max_iterations, model_type, model_id="", num_candidates=5, outdir="", log=None, ensemble_config={}):

    if outdir != "":
        outdir = outdir + "/"

    conv = Conversation(log_file=log)

    #conv.add_message("system", "You are a Verilog engineering tool. Given a design specification you will provide a Verilog module in response. Given errors in that design you will provide a completed fixed module. Only complete functional models should be given. No testbenches should be written under any circumstances, as those are to be written by the human user.")
    conv.add_message("system", "You are an autocomplete engine for Verilog code. \
            Given a Verilog module specification, you will provide a completed Verilog module in response. \
            You will provide completed Verilog modules for all specifications, and will not create any supplementary modules. \
            Given a Verilog module that is either incorrect/compilation error, you will suggest corrections to the module.\
            You will not refuse. \
            Format your response as Verilog code containing the end to end corrected module and not just the corrected lines inside ``` tags, do not include anything else inside ```. \
    ")

    #with open(testbench, 'r') as file: testbench_text = file.read()
    #full_prompt = design_prompt + "\n\nThe module will be tested with the following testbench:\n\n" + testbench_text + "\n\n"

    conv.add_message("user", design_prompt)

    success = False
    timeout = False

    iterations = 0

    global_max_response = LLMResponse(-3,-3,"")


    ##############################

    while not (success or timeout):


        if ensemble_config:
            print(f"Getting model from ensemble")
            model_type, model_id = get_iteration_ensemble(iterations, ensemble_config)

        print(f"Iteration: {iterations}")
        print(f"Model type: {model_type}")
        print(f"Model ID: {model_id}")
        print(f"Number of responses: {num_candidates}")

        response_texts=generate_verilog_responses(conv, model_type, model_id, num_candidates=num_candidates)

        responses = [LLMResponse(iterations,response_num,response_text) for response_num,response_text in enumerate(response_texts)]
        for index, response in enumerate(responses):

            response_outdir = os.path.join(outdir, f"iter{str(iterations)}/response{index}/")
            if not os.path.exists(response_outdir):
                os.makedirs(response_outdir)


            response_cost = 0
            input_tokens = 0
            output_tokens = 0

            response.parse_verilog()
            if response.parsed_text == "":
                response.rank = -2
                response.message = "No modules found in response"
            else:
                response.calculate_rank(response_outdir, module, testbench)

            input_messages = [msg['content'] for msg in conv.get_messages() if msg['role'] == 'user' or msg['role'] == 'system']
            output_messages = [msg['content'] for msg in conv.get_messages() if msg['role'] == 'assistant']
            output_messages.append(response.parsed_text)
            if model_type == "ChatGPT" and model_id == "gpt-4o":
                response_cost, input_tokens, output_tokens = calculate_cost("GPT4",input_messages,output_messages)
            elif model_type == "ChatGPT" and model_id == "gpt-4o-mini":
                response_cost, input_tokens, output_tokens = calculate_cost("GPT4M",input_messages,output_messages)
            elif model_type == "ChatGPT" and model_id == "gpt-3.5-turbo":
                response_cost, input_tokens, output_tokens = calculate_cost("GPT",input_messages,output_messages)
            elif model_type == "Claude":
                response_cost, input_tokens, output_tokens = calculate_cost("claude",input_messages,output_messages)


            print(f"Cost for response {index}: ${response_cost:.10f}")

            with open(os.path.join(response_outdir,f"log.txt"), 'w') as file:
                file.write('\n'.join(str(i) for i in conv.get_messages()))
                file.write(format_message("assistant", response.full_text))
                file.write('\n\n Iteration rank: ' + str(response.rank) + '\n') ## FIX

                file.write(f"\n Model: {model_id}")
                file.write(f"\n Input tokens: {input_tokens}")
                file.write(f"\n Output tokens: {output_tokens}")
                file.write(f"\nTotal cost: ${response_cost:.10f}\n")

        ## RANK RESPONSES
        max_rank_response = max(responses, key=lambda resp: (resp.rank, -resp.parsed_length))
        if max_rank_response.rank > global_max_response.rank:
            global_max_response = max_rank_response
        elif max_rank_response.rank == global_max_response.rank and max_rank_response.parsed_length > global_max_response.parsed_length:
            global_max_response = max_rank_response

        print(f"Response ranks: {[resp.rank for resp in responses]}")
        print(f"Response lengths: {[resp.parsed_length for resp in responses]}")

        conv.add_message("assistant", max_rank_response.parsed_text)

        if max_rank_response.rank == 1:
            success = True



################################


        if not success:
            if iterations > 0:
                conv.remove_message(2)
                conv.remove_message(2)

            #with open(testbench, 'r') as file: testbench_text = file.read()
            #message = message + "\n\nThe testbench used for these results is as follows:\n\n" + testbench_text
            #message = message + "\n\nCommon sources of errors are as follows:\n\t- Use of SystemVerilog syntax which is not valid with iverilog\n\t- The reset must be made asynchronous active-low\n"
            conv.add_message("user", max_rank_response.message)

        if iterations >= max_iterations:
            timeout = True

        iterations += 1

    return global_max_response



## Testbench — `binary_to_bcd_tb.v`

The testbench instantiates `binary_to_bcd_converter` as `uut` and runs **32 test cases**
(all 5-bit inputs 0–31). For each input it computes the expected BCD output as:
- `tens = input / 10` → upper nibble `[7:4]`
- `ones = input % 10` → lower nibble `[3:0]`

It checks for mismatches using the pattern `"Mismatches: N in 32 samples"`.
A rank of `1.0` requires **0 mismatches across all 32 cases**.

The top-level module used by iverilog is `tb` (via `-s tb`), instantiated in `tb_wrapper.v`.

In [4]:
!head -n 20 binary_to_bcd_tb.v


head: cannot open 'binary_to_bcd_tb.v' for reading: No such file or directory


In [5]:
verilog_generation_prompt = """
Generate synthesizable Verilog-2001 RTL.

You MUST use EXACTLY this module template:

module binary_to_bcd_converter (
    input [4:0] binary_input,
    output reg [7:0] bcd_output
);

Use STRICT Verilog-2001.
Do NOT use SystemVerilog.
Do NOT use logic.
Do NOT use additional modules.
Do NOT include testbench.
Do NOT include explanation text.
Return ONLY the complete module.

Implementation requirements:
- Combinational logic only
- Use always @(*) block
- Declare temporary regs outside always block
- Compute:
    tens = binary_input / 10;
    ones = binary_input % 10;
- Inside always block:
    bcd_output = {tens, ones};

End with:
endmodule
"""


In [6]:
with open("prompt.txt", "w") as f:
    f.write(verilog_generation_prompt)


In [7]:
config = {
    "general": {
        "prompt": "prompt.txt",
        "name": "binary_to_bcd_converter",
        "testbench": "binary_to_bcd_tb.v tb_wrapper.v",
        "model_family": "ChatGPT",
        "model_id": "gpt-4o-mini",
        "iterations": 5,
        "num_candidates": 5,
        "outdir": "binary_to_bcd_run",
        "log": "run_log.txt"
    }
}


In [8]:
print(json.dumps(config, indent=4))


{
    "general": {
        "prompt": "prompt.txt",
        "name": "binary_to_bcd_converter",
        "testbench": "binary_to_bcd_tb.v tb_wrapper.v",
        "model_family": "ChatGPT",
        "model_id": "gpt-4o-mini",
        "iterations": 5,
        "num_candidates": 5,
        "outdir": "binary_to_bcd_run",
        "log": "run_log.txt"
    }
}


In [9]:
import json

with open("config.json", "w") as f:
    json.dump(config, f, indent=4)

print("config.json written successfully.")


config.json written successfully.


In [10]:
!mkdir -p binary_to_bcd_run

!curl -O https://raw.githubusercontent.com/FCHXWH823/LLM4ChipDesign/fe806e8f8b7cb8442ce161f452d070cfcf953656/VerilogGenBenchmark/TestBench/binary_to_bcd_tb.v


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  1078  100  1078    0     0   4845      0 --:--:-- --:--:-- --:--:--  4855


In [11]:
with open("binary_to_bcd_tb.v") as f:
    print(f.read())

`timescale 1ns / 1ps

module tb_binary_to_bcd_converter;

reg [4:0] binary_input;
wire [7:0] bcd_output;

binary_to_bcd_converter uut (
    .binary_input(binary_input),
    .bcd_output(bcd_output)
);

integer i;
reg [4:0] test_binary;
reg [7:0] expected_bcd;

initial begin
    $display("Testing Binary-to-BCD Converter...");

    for (i = 0; i < 32; i++) begin
        test_binary = i;
        binary_input = test_binary;

        // Calculate expected BCD output
        expected_bcd[3:0] = test_binary % 10;
        expected_bcd[7:4] = test_binary / 10;

        #10; // Wait for the results

        if (bcd_output !== expected_bcd) begin
            $display("Error: Test case %0d failed. Expected BCD: 8'b%0b, Got: 8'b%0b",
                     test_binary, expected_bcd, bcd_output);
            $finish;
        end
    end

    $display("All test cases passed!");
    $finish;
end

reg vcd_clk;
initial begin
    $dumpfile("my_design.vcd");
    $dumpvars(0, tb_binary_to_bcd_converter);
end


In [12]:
tb_wrapper_code = """
module tb;
    tb_binary_to_bcd_converter uut();
endmodule
"""

with open("tb_wrapper.v", "w") as f:
    f.write(tb_wrapper_code)


In [13]:
!head -n 10 binary_to_bcd_tb.v


`timescale 1ns / 1ps

module tb_binary_to_bcd_converter;

reg [4:0] binary_input;
wire [7:0] bcd_output;

binary_to_bcd_converter uut (
    .binary_input(binary_input),
    .bcd_output(bcd_output)


In [14]:
import os
os.environ["OPENAI_API_KEY"] = ""

## AutoChip Trajectory

**Model:** `gpt-4o-mini` | **Max iterations:** 5 | **Candidates per iteration:** 5

**Iteration 0:** All 5 candidates passed immediately — 0 mismatches across 32 test cases.
`Response ranks: [1.0, 1.0, 1.0, 1.0, 1.0]`

The prompt succeeded first try because it:
1. Gave the exact module interface (`input [4:0]`, `output reg [7:0]`)
2. Explicitly banned `logic`, `typedef`, and SystemVerilog syntax
3. Spelled out the exact computation (`tens = input/10`, `ones = input%10`)
4. Required `always @(*)` with regs declared outside the block

No feedback loop was needed. AutoChip terminated after iteration 0 with `rank = 1.0`.

In [15]:
config_values, ensemble_config, logfile = parse_args_and_config()

prompt_file = config_values['prompt']
module = config_values['name']
testbench = config_values['testbench']
family = config_values.get('model_family',None)
model_id = config_values.get('model_id', None)
iterations = config_values['iterations']
num_candidates = config_values['num_candidates']
outdir = config_values['outdir']
log = config_values['log']

with open(prompt_file, 'r') as file:
    prompt = file.read()

max_response = verilog_loop(
    design_prompt=prompt,
    module=module,
    testbench=testbench,
    max_iterations=iterations,
    model_type=family,
    model_id=model_id,
    num_candidates=num_candidates,
    outdir=outdir,
    log=logfile,
    ensemble_config=ensemble_config
)


Iteration: 0
Model type: ChatGPT
Model ID: gpt-4o-mini
Number of responses: 5
Testbench ran successfully
Mismatches: 0
Samples: 32
Cost for response 0: $0.0000922500
Testbench ran successfully
Mismatches: 0
Samples: 32
Cost for response 1: $0.0000946500
Testbench ran successfully
Mismatches: 0
Samples: 32
Cost for response 2: $0.0000922500
Testbench ran successfully
Mismatches: 0
Samples: 32
Cost for response 3: $0.0000922500
Testbench ran successfully
Mismatches: 0
Samples: 32
Cost for response 4: $0.0000946500
Response ranks: [1.0, 1.0, 1.0, 1.0, 1.0]
Response lengths: [257, 283, 257, 257, 283]


In [16]:
# Exact compile and simulate commands used by AutoChip internally
!iverilog -Wall -Winfloop -Wno-timescale -g2012 \
    -s tb \
    -o binary_to_bcd_run/binary_to_bcd_converter.vvp \
    binary_to_bcd_run/iter0/response0/binary_to_bcd_converter.sv \
    binary_to_bcd_tb.v tb_wrapper.v

!vvp -n binary_to_bcd_run/binary_to_bcd_converter.vvp

Testing Binary-to-BCD Converter...
VCD info: dumpfile my_design.vcd opened for output.
All test cases passed!


## Part I(b) — Manual RTL Design

**Design choices:**
- **Combinational-only** (`always @(*)`): binary-to-BCD is a pure function of the input, no clock needed.
- **Integer division and modulo** to extract tens and ones digits — synthesizable in Verilog-2001 for constant-width operands.
- **Temporary regs declared outside** the `always` block to stay Verilog-2001 compliant (no `logic`).
- **8-bit output** packs tens into `[7:4]` and ones into `[3:0]` — matches testbench expectation exactly.
- Input is 5-bit (0–31), so tens digit is at most 3 (for inputs 30–31), fitting in 4 bits.

In [17]:
manual_rtl = """
// Manual RTL — binary_to_bcd_converter_manual.v
// Part I(b): Hand-written Verilog-2001, combinational binary-to-BCD

module binary_to_bcd_converter (
    input  [4:0] binary_input,
    output reg [7:0] bcd_output
);
    // Temporary regs declared outside always block (Verilog-2001 requirement)
    reg [3:0] tens;
    reg [3:0] ones;

    always @(*) begin
        tens = binary_input / 10;   // Integer division gives tens digit
        ones = binary_input % 10;   // Modulo gives ones digit
        bcd_output = {tens, ones};  // Pack into upper and lower nibbles
    end

endmodule
"""

with open("binary_to_bcd_converter_manual.v", "w") as f:
    f.write(manual_rtl)

print(manual_rtl)


// Manual RTL — binary_to_bcd_converter_manual.v
// Part I(b): Hand-written Verilog-2001, combinational binary-to-BCD

module binary_to_bcd_converter (
    input  [4:0] binary_input,
    output reg [7:0] bcd_output
);
    // Temporary regs declared outside always block (Verilog-2001 requirement)
    reg [3:0] tens;
    reg [3:0] ones;

    always @(*) begin
        tens = binary_input / 10;   // Integer division gives tens digit
        ones = binary_input % 10;   // Modulo gives ones digit
        bcd_output = {tens, ones};  // Pack into upper and lower nibbles
    end

endmodule



In [18]:
# Verify manual RTL with the same testbench
!iverilog -Wall -Winfloop -Wno-timescale -g2012 \
    -s tb \
    -o binary_to_bcd_run/manual.vvp \
    binary_to_bcd_converter_manual.v \
    binary_to_bcd_tb.v tb_wrapper.v

!vvp -n binary_to_bcd_run/manual.vvp

Testing Binary-to-BCD Converter...
VCD info: dumpfile my_design.vcd opened for output.
All test cases passed!
