## Install libraries and dependencies

In [None]:
!pip install langchain boto3 markdownify python-terraform

In [None]:
%%sh
./install-terraform.sh

## AWS Cloud Control (AWSCC) - Schema

AWSCC schemas can be found on the [Terraform Registry](https://registry.terraform.io/providers/hashicorp/awscc/latest/docs/resources). The schemas includes all required and optional attributes, including basic info such as attribute type (string, boolean, list) and sometimes additional detail about how to use the attribute. Here is another [example](https://registry.terraform.io/providers/hashicorp/awscc/latest/docs/resources/cassandra_table#schema).

AWSCC resources has specific naming convention: `awscc_<service-name>_<resource-name>`

For example: `awscc_iam_role` or `awscc_apigateway_request_validator`. Later on, we'll compare this with the underlying CloudFormation resource name.

### Retrieving AWSCC Resource

Let's assume we want to retrieve specific AWSCC resource and generate an example from the same resource in CloudFormation documentation. First, we'll specify our target as `AWSCC_RESOURCE_NAME`. For this demonstration, we'll use use `awscc_iam_user`

In [None]:
AWSCC_RESOURCE_NAME = "awscc_iam_role" 

print ("Input AWSCC Resource name : {}".format(AWSCC_RESOURCE_NAME))

The schema information from Terraform Registry comes directly from [AWSCC Provider repo](https://github.com/hashicorp/terraform-provider-awscc/blob/main/docs/resources/amplify_app.md). Since it's on Markdown format, we can download this file and parse it.

In [None]:
AWSCC_REGISTRY_RESOURCE_NAME = AWSCC_RESOURCE_NAME.strip("awscc").strip("_")
print ("AWSCC Registry Resource Name: {}".format(AWSCC_REGISTRY_RESOURCE_NAME))
AWSCC_REGISTRY_RESOURCE_PATH = "https://raw.githubusercontent.com/hashicorp/terraform-provider-awscc/main/docs/resources/{}.md".format(AWSCC_REGISTRY_RESOURCE_NAME)
print ("Path in AWSCC repo: {}".format(AWSCC_REGISTRY_RESOURCE_PATH))

In [None]:
import urllib.request
response=urllib.request.urlopen(AWSCC_REGISTRY_RESOURCE_PATH)
AWSCC_RESOURCE_DOC=response.read().decode('utf-8')
# print (AWSCC_RESOURCE_DOC)

### Parsing the Markdown for AWSCC Resource

This Markdown document contains a lot of information, however our point of interest is just the **Schema** section. We'll need to parse the HTML document and load only the **Schema** section as shown in the print out below.

```
# awscc_iam_role (Resource)

Resource Type definition for AWS::IAM::Role
...

## Schema

### Required

- `assume_role_policy_document` (String) The trust policy that is associated with this role.

### Optional

- `description` (String) A description of the role that you provide.
- `managed_policy_arns` (Set of String) A list of Amazon Resource Names (ARNs) of the IAM managed policies that you want to attach to the role.
- `max_session_duration` (Number) The maximum session duration (in seconds) that you want to set for the specified role. If you do not specify a value for this setting, the default maximum of one hour is applied. This setting can have a value from 1 hour to 12 hours.
- `path` (String) The path to the role.
...

```

We could use [LangChain MarkdownHeaderTextSplitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/markdown_header_metadata) as helper tool. We'll parse this markdown document, split the headers accordingly and load only the schema section.

In [None]:
from langchain.text_splitter import MarkdownHeaderTextSplitter

awscc_headers_to_split_on = [
    ("#", "Header 1"), 
    ("##", "Header 2"), # we know that Schema is on header 2
]
awscc_markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=awscc_headers_to_split_on)
awscc_md_header_splits = awscc_markdown_splitter.split_text(AWSCC_RESOURCE_DOC)

for section in awscc_md_header_splits:
    if "Header 2" in section.metadata and "Schema" in section.metadata["Header 2"]:
        AWSCC_RESOURCE_SCHEMA = section.page_content

if AWSCC_RESOURCE_SCHEMA:
    print("Schema for {} is found in the registry documentation".format(AWSCC_RESOURCE_NAME))
else:
    print("Schema for {} is not found, check the resource name".format(AWSCC_RESOURCE_NAME))

## CloudFormation Resource - Example

For the same resource name, the Cloudformation example can be found in the [AWS CloudFormation documentation page](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-role.html). Since this is a HTML document, we can also parse the **Examples** section and provide some inspiration for the LLM to use.

Note, we don't care much about other section of this page such as Properties or Return Values. Why? because we already have this context from the AWSCC resource schema and the LLM model could use it as-is.


### Retrieving CloudFormation Resource Example

In the ideal world, we can just look up the same resource name such as `awscc_iam_role` or `awscc_apigateway_request_validator` in the CloudFormation documentation. However, both document has it's own format and we need to translate it.

CloudFormation resources has specific naming convention: `aws-resource-<service-name>-<short-resource-name>`

For example: 

* `awscc_iam_role` => `aws-resource-iam-role`
* `awscc_apigateway_request_validator` => `aws-resource-apigateway-requestvalidator`

Notice how the `request_validator` must be transformed to short name `requestvalidator`

In [None]:
SERVICE_NAME = AWSCC_RESOURCE_NAME.split("_")[1]
SHORT_RESOURCE_NAME = AWSCC_RESOURCE_NAME.split(SERVICE_NAME)[1].split("_")
SHORT_RESOURCE_NAME = "".join(SHORT_RESOURCE_NAME)

CFN_DOC_RESOURCE_NAME = "aws-resource-{}-{}".format(SERVICE_NAME, SHORT_RESOURCE_NAME)
print ("CloudFormation Doc File Name : {}".format(CFN_DOC_RESOURCE_NAME))

CFN_DOC_RESOURCE_PATH = "https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/{}.html".format(CFN_DOC_RESOURCE_NAME)
print ("Path in CloudFormation Doc: {}".format(CFN_DOC_RESOURCE_PATH))

### Parsing the HTML for CloudFormation Resource Example

The HTML doc for CloudFormation resource example is quite simple, it contains headers called **Examples** with one ore more examples under `h3` header section.

While most users prefer looking at examples in `yaml` format for readibility (arguably), it's actually better to use `json` format for LLM, since it's strictly typed.


In [None]:
from bs4 import BeautifulSoup
import requests

page = BeautifulSoup(requests.get(CFN_DOC_RESOURCE_PATH).text, )
print ("Parsing : {}".format(CFN_DOC_RESOURCE_PATH))

CFN_DOC_EXAMPLES = []

target = page.find('h2',string='Examples')

if target:
    for sib in target.find_next_siblings():
        if sib.name == "h3": 
            # not all Examples section has sub-header 3 title
            TARGET_H3_HEADER = ""
            TARGET_H3_ID = ""
            if sib.text.strip() != "":            
                print ("Example found for: {}".format(sib.text.strip()))
                TARGET_H3_HEADER = sib.text
            else:
                print ("Example found for: {}".format(sib.attrs['id']))
                TARGET_H3_ID = sib.attrs['id']
            
            CFN_EXAMPLE = {}
            EXAMPLE_JSON = ""
            EXAMPLE_INTRO = ""
            EXAMPLE_TITLE = sib.text.strip()
            
            # inconsistency in CloudFormation document for header 
            if TARGET_H3_HEADER != "": 
                #print ("Using Header: {}".format(TARGET_H3_HEADER))
                example_target = page.find('h3',string=TARGET_H3_HEADER)
            elif TARGET_H3_ID != "": 
                #print ("Using ID: {}".format(TARGET_H3_ID))
                example_target = page.find('h3',attrs={'id':TARGET_H3_ID})
            
            for sib in example_target.find_next_siblings():

                if sib.name == "div" and 'id' in sib.attrs:
                    if sib.attrs['id'] != "YAML" or sib.attrs['id'] == "JSON":
                        EXAMPLE_JSON = sib.text.strip()
                if sib.name == "p" and len(sib.get_text(strip=True)) > 0:                
                    EXAMPLE_INTRO = sib.text.strip()
                    EXAMPLE_INTRO = " ".join(EXAMPLE_INTRO.split()) # strip extra whitespace
                if sib.name == "h3":
                    break;
                    
            CFN_EXAMPLE = {
                "title" : EXAMPLE_TITLE,
                "intro" : EXAMPLE_INTRO,
                "json" : EXAMPLE_JSON
            }
            CFN_DOC_EXAMPLES.append(CFN_EXAMPLE)
            
print ("Number of CloudFormation resource example found : {}".format(len(CFN_DOC_EXAMPLES)))

We have succesfully locate the CloudFormation resource example. Next, we will create a prompt for the LLM that uses these examples as inspiration

## Using LLM 

First, let's initialize the LLM using Bedrock.

We use [LangChain Bedrock Chat](https://python.langchain.com/docs/integrations/chat/bedrock) as helper to interface with Bedrock API using chat-like function.

In [None]:
import boto3
boto3_bedrock = boto3.client('bedrock')
boto3_bedrock_runtime = boto3.client('bedrock-runtime')

In [None]:
import langchain
from langchain.llms import Bedrock
from langchain.chat_models import BedrockChat
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate

chat = BedrockChat(
    model_id="anthropic.claude-v2:1", 
    model_kwargs={"temperature":0.1, "max_tokens_to_sample":4096},
    client=boto3_bedrock_runtime
)

### Defining the prompt 

At high-level, we want to prompt the LLM with the following: 

* The full schema of the AWSCC resource
* Example taken from CloudFormation documentation
* Some rules to follow
* Generate example in AWSCC Terraform provider for the same AWSCC resource

We use [LangChain Prompt template](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/) as tool to assist this.

In [None]:
prompt_template = PromptTemplate.from_template("""
You are Terraform expert. 

Only answer using the provided context. If you dont know the answer, simply say you dont know.

Here is the schema for {awscc_resource_name} in the AWS Cloud Control Terraform provider :

{awscc_resource_schema}

Here is the example CloudFormation for your inspiration:

{cfn_resource_example}

Couple rules to follow:

- The attribute `policy_document` is a map of string, please use json encode.
- Change any reference to AWS account by using data source aws_caller_identity.
- Use data source for any policy document.
- Please don't use dynamic block.

Your task is to convert the CloudFormation example to {awscc_resource_name}.

You will provide the answer starting with text: 
### Start Example ### 

and ends it with text: 
### End Example ###. 

""")

### Implementing the prompt

Let's parse the template with the actual value and perform the inference to Bedrock, we'll store the results in a list

In [None]:
AWSCC_LLM_EXAMPLES = []
for example in CFN_DOC_EXAMPLES:
    print ("Generating AWSCC example from Cfn example: {}".format(example["intro"]))
    instruction = prompt_template.format(
        awscc_resource_name=AWSCC_RESOURCE_NAME, 
        awscc_resource_schema=AWSCC_RESOURCE_SCHEMA,
        cfn_resource_example= example["json"],
    )

    messages = [
        HumanMessage(
            content=instruction
        )
    ]

    AWSCC_EXAMPLE = {
        "title" : example["title"],
        "intro" : example["intro"],
        "result_latest" : chat(messages).content.split("### Start Example ###")[1].split("### End Example ###")[0]
    }
    
    AWSCC_LLM_EXAMPLES.append(AWSCC_EXAMPLE)

### Validating Terraform

It's important for us to generate valid Terraform configuration, which help accelerate the creation of the documentation. We can output the file from LLM and run `terraform validate` to compare.

In [None]:
from python_terraform import Terraform
from pathlib import Path

for result in AWSCC_LLM_EXAMPLES:
    print ("Validating terraform file for : {}".format(result["title"]))
    awscc_example_file_path = "./terraform_output/{}/{}/main.tf".format(AWSCC_RESOURCE_NAME, result["title"])
    awscc_example_info_file_path = "./terraform_output/{}/{}/info.txt".format(AWSCC_RESOURCE_NAME, result["title"])
    awscc_example_working_dir = "./terraform_output/{}/{}".format(AWSCC_RESOURCE_NAME, result["title"])
    
    file = Path(awscc_example_file_path)
    file.parent.mkdir(parents=True, exist_ok=True)

    file1 = open(awscc_example_info_file_path, 'w')
    file1.write(result["title"])
    file1.close()
    
    file1 = open(awscc_example_file_path, 'w')
    file1.write(result["result_latest"])
    file1.close()
      
    tf = Terraform(working_dir=awscc_example_working_dir)
    tf.fmt(diff=True)
    tf.init()
    return_code, stdout, stderr = tf.validate()
    
    if return_code == 0:
        print (stdout)
        result["validate"] = True
        result["validate-result"] = stdout
    else:
        print (stderr)
        result["validate"] = False
        result["validate-result"] = stderr

### Correcting the result

LLM sometimes can make error, for example using exported attribute that doesn't exist.

```
  monitor_arn_list = [
    awscc_ce_anomaly_monitor.custom.arn, <-- "arn" is not available
    awscc_ce_anomaly_monitor.service.arn,
  ]
```

or mistaken attribute type, for example `policies` should be declared as list.


```
resource "awscc_iam_role" "root" {
  assume_role_policy_document = data.aws_iam_policy_document.assume_role_policy.json
  path                        = "/"

  policies {
    policy_name = "root"
    policy_document = data.aws_iam_policy_document.root_policy.json
  }
}
```

sometimes it forgot about Terraform syntax, for example when dealing with nested attribute:

```
  instance_configuration {  <-- missing "=" sign
    cpu    = "1 vCPU"
    memory = "3 GB" 
  }
```

on another case, it might uses resource in AWS standard provider while the same resource already existed in AWSCC.

or in extreme example, the resource `awscc_iam_instance_profile` is not yet supported in AWSCC.

```
resource "awscc_iam_instance_profile" "root" {
  path  = "/"
  roles = [awscc_iam_role.root.id]
}
```

We can run another prompt, asking the LLM to reflect on the previous output and made any corrections as needed.

Note that certain validation is only possible by using live-data, i.e. validating if `awscc_iam_instance_profile` is exist.

### Using terraform validate results

We can tell the LLM to look at `terraform validate` results and reasoning how to fix the problem

In [None]:
optimize_prompt_template = PromptTemplate.from_template("""
You are Terraform expert. Forget about the previous review, only examine the following information.

Here is the schema for {awscc_resource_name} in the AWS Cloud Control Terraform provider, examine it carefully.

{awscc_resource_schema}

Here is the Terraform configuration that you need to analyze:

{awscc_resource_example}

Here is the error that you need to analyze:

{awscc_resource_error}

Please add fix for each error in the resources accordingly.

You will provide the answer starting with text: 
### Start Example ### 

and ends it with text: 
### End Example ###. 

""")


In [None]:
for result in AWSCC_LLM_EXAMPLES:
    if result["validate"] == False:
        print ("Re-running inference for : {}".format(result["title"]))
        
        instruction = optimize_prompt_template.format(
            awscc_resource_name=AWSCC_RESOURCE_NAME, 
            awscc_resource_schema=AWSCC_RESOURCE_SCHEMA,
            awscc_resource_example=result["result_latest"],
            awscc_resource_error=result["validate-result"],
        )
      
        messages = [
            HumanMessage(
                content=instruction
            )
        ]
        result["result_previous"] = result["result_latest"]
        result["result_latest"] = chat(messages).content.split("### Start Example ###")[1].split("### End Example ###")[0]


In [None]:
for result in AWSCC_LLM_EXAMPLES:
    if result["validate"] == False:
        print ("Re-validating terraform file for : {}".format(result["title"]))
        awscc_example_file_path = "./terraform_output/{}/{}/main.tf".format(AWSCC_RESOURCE_NAME, result["title"])
        awscc_example_old_file_path = "./terraform_output/{}/{}/main.v1".format(AWSCC_RESOURCE_NAME, result["title"])
        awscc_example_working_dir = "./terraform_output/{}/{}".format(AWSCC_RESOURCE_NAME, result["title"])

        file = Path(awscc_example_file_path)
        file.parent.mkdir(parents=True, exist_ok=True)

        file1 = open(awscc_example_file_path, 'w')
        file1.write(result["result_latest"])
        file1.close()

        file1 = open(awscc_example_old_file_path, 'w')
        file1.write(result["result_previous"])
        file1.close()

        tf = Terraform(working_dir=awscc_example_working_dir)
        tf.fmt(diff=True)
        tf.init()
        return_code, stdout, stderr = tf.validate()

        if return_code == 0:
            print (stdout)
            result["validate"] = True
            result["validate-result"] = stdout
        else:
            print (stderr)
            result["validate"] = False
            result["validate-result"] = stderr

### Re-running validation

If we found another error from `terraform validate`, we could use the same process to fix it again

In [None]:
for result in AWSCC_LLM_EXAMPLES:
    if result["validate"] == False:
        print ("Re-running inference for : {}".format(result["title"]))
        
        instruction = optimize_prompt_template.format(
            awscc_resource_name=AWSCC_RESOURCE_NAME, 
            awscc_resource_schema=AWSCC_RESOURCE_SCHEMA,
            awscc_resource_example=result["result_latest"],
            awscc_resource_error=result["validate-result"],
        )
      
        messages = [
            HumanMessage(
                content=instruction
            )
        ]
        result["result_original"] = result["result_previous"] # we keep the original results for comparison
        result["result_previous"] = result["result_latest"]
        result["result_latest"] = chat(messages).content.split("### Start Example ###")[1].split("### End Example ###")[0]


In [None]:
for result in AWSCC_LLM_EXAMPLES:
    if result["validate"] == False:
        print ("Re-validating terraform file for : {}".format(result["title"]))
        awscc_example_file_path = "./terraform_output/{}/{}/main.tf".format(AWSCC_RESOURCE_NAME, result["title"])
        awscc_example_old_file_path = "./terraform_output/{}/{}/main.v2".format(AWSCC_RESOURCE_NAME, result["title"])
        awscc_example_original_file_path = "./terraform_output/{}/{}/main.v1".format(AWSCC_RESOURCE_NAME, result["title"])

        awscc_example_working_dir = "./terraform_output/{}/{}".format(AWSCC_RESOURCE_NAME, result["title"])

        file = Path(awscc_example_file_path)
        file.parent.mkdir(parents=True, exist_ok=True)

        file1 = open(awscc_example_file_path, 'w')
        file1.write(result["result_latest"])
        file1.close()

        file1 = open(awscc_example_old_file_path, 'w')
        file1.write(result["result_previous"])
        file1.close()

        file1 = open(awscc_example_original_file_path, 'w')
        file1.write(result["result_original"])
        file1.close()
        
        tf = Terraform(working_dir=awscc_example_working_dir)
        tf.fmt(diff=True)
        tf.init()
        return_code, stdout, stderr = tf.validate()

        if return_code == 0:
            print (stdout)
            result["validate"] = True
            result["validate-result"] = stdout
        else:
            print (stderr)
            result["validate"] = False
            result["validate-result"] = stderr