# Tutorial 0: The Core of TaskGen - StrictJSON

- JSON is the default output format for Functions, and is generated by `strict_json()`
- Works for JSON outputs with multiple ' or " or { or } or \ or unmatched braces/brackets that may break a json.loads()
- Reference Repo: https://github.com/tanchongmin/strictjson
- Note: `strictjson` is already natively included in `taskgen-ai`

## FAQ
- Q: Why not use a type-defined structured framework like Pydantic?
- A: Pydantic is very verbose in terms of defining the description and type for each field in the JSON, and can affect performance of LLM for longer context. Moreover, StrictJSON has very flexible checks which can be incorporated when generating the JSON.

# Setup Guide

## Step 1: Install TaskGen

In [7]:
# !pip install taskgen-ai
# %pip install python-dotenv
# %pip install numpy
# %pip install dill

Note: you may need to restart the kernel to use updated packages.


## Step 2: Set up OpenAI API Key

In [1]:
#Python way to set up OpenAI API Keys
import os
# os.environ['OPENAI_API_KEY'] = '<YOUR API KEY HERE>'

from dotenv import load_dotenv
load_dotenv(dotenv_path=".env")

True

## Step 3: Import required functions

In [2]:
from taskgen import *

# 1. Basic Generation

- **system_prompt**: Write in whatever you want GPT to become. "You are a \<purpose in life\>"
- **user_prompt**: The user input. Later, when we use it as a function, this is the function input
- **output_format**: JSON of output variables in a dictionary, with the key as the output key, and the value as the output description
    - The output keys will be preserved exactly, while GPT will generate content to match the description of the value as best as possible

#### Example Usage
```python
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'})
                                    
print(res)
```

#### Example Output
```{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}```

In [3]:
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'})
print(res)

{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6}


In [9]:
subQnPrompt = """You are an AI language model assistant. Your task is to generate Five
    different versions of the given user question to retrieve relevant documents from a vector
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search.
    Provide these alternative questions seperated by newlines only.

    For example:
    User question: "What is the conclusion of the paper?"

    Generated questions:
    What is the main takeaway from the paper?
    What are the key findings of the paper?
    What is the summary of the paper?
    What is the final thought of the paper?
    What is the ending of the paper?
    
    Output format should be in JSON as follows:

    {'Questions': ['What is the process of LLM red teaming in LLAMA 3?', 'What are the steps involved in LLM red teaming in LLAMA 3?', 'How does LLM red teaming function in LLAMA 3?', 'What is the operation of LLM red teaming in LLAMA 3?', 'What is the mechanism of LLM red teaming in LLAMA 3?'], 'Questions Generated': 5}

    And not this format:

    '1. What is Bill Gates known for?'
│   "2. Can you provide information about Bill Gates' background?"
    """

In [10]:
res = strict_json(system_prompt = subQnPrompt,
                    user_prompt = 'How does LLM red teaming work in LLAMA 3?',
                    output_format = {'Questions': 'Type of Sentiment, type: Array[str]',
                                     'Questions Generated': 'Number of questions generated, type: int'
                                     }
                  )
print(res)

{'Questions': ['What is the process of LLM red teaming in LLAMA 3?', 'What are the steps involved in LLM red teaming in LLAMA 3?', 'How does LLM red teaming function in LLAMA 3?', 'What is the operation of LLM red teaming in LLAMA 3?', 'What is the mechanism of LLM red teaming in LLAMA 3?'], 'Questions Generated': 5}


## Easy to split into corresponding elements

In [11]:
res['Sentiment']

'Positive'

In [12]:
res['Adjectives']

['beautiful', 'sunny']

In [13]:
res['Words']

6

# 2. Advanced Generation
- More advanced demonstration involving code that would typically break ```json.loads()```

#### Example Usage
```python
res = strict_json(system_prompt = 'You are a code generator, generating code to fulfil a task',
                    user_prompt = 'Given array p, output a function named func_sum to return its sum',
                    output_format = {'Elaboration': 'How you would do it',
                                     'C': 'Code',
                                    'Python': 'Code'})
                                    
print(res)
```

#### Example Output
```{'Elaboration': 'Use a loop to iterate through each element in the array and add it to a running total.', ```

```'C': 'int func_sum(int p[], int size) {\n    int sum = 0;\n    for (int i = 0; i < size; i++) {\n        sum += p[i];\n    }\n    return sum;\n}', ```

```'Python': 'def func_sum(p):\n    sum = 0\n    for num in p:\n        sum += num\n    return sum'}```


In [6]:
res = strict_json(system_prompt = 'You are a code generator, generating code to fulfil a task',
                    user_prompt = 'Given array p, output a function named func_sum to return its sum',
                    output_format = {'Elaboration': 'How you would do it',
                                     'C': 'Code',
                                    'Python': 'Code'})
                                    
print(res)

{'Elaboration': 'Define a function that takes an array as input and returns the sum of its elements', 'C': 'int func_sum(int p[], int size) { int sum = 0; for (int i = 0; i < size; i++) { sum += p[i]; } return sum; }', 'Python': 'def func_sum(p): return sum(p)'}


## Easy to split into corresponding elements

In [5]:
res['Elaboration']

KeyError: 'Elaboration'

In [16]:
print(res['C'])

int func_sum(int p[], int size) { int sum = 0; for (int i = 0; i < size; i++) { sum += p[i]; } return sum; }


In [17]:
print(res['Python'])

def func_sum(p): return sum(p)


In [11]:
code = '''
def greet(name):
    print(f"Hello, {name}!")
'''
exec(code) # only defines function like when code is executed, no output generated
# use eval() for execution

In [12]:
def execute_function(function_code, function_name, *args):
    # Execute the function code
    exec(function_code)
    
    # Construct the function call with the provided arguments
    function_call = f"{function_name}({','.join(map(repr, args))})"
    
    # Execute the function call and return the result
    return eval(function_call)

# Example usage
function_code = """
def greet(name):
    return f"Hello, {name}!"
"""

function_code2 = """
def multiply(a, b):
    return a * b
"""

# Call the 'greet' function with a single argument
result1 = execute_function(function_code, "greet", "Alice")
print(result1)  # Output: Hello, Alice!

# Call the 'multiply' function with two arguments
result2 = execute_function(function_code2, "multiply", 5, 3)
print(result2)  # Output: 15

Hello, Alice!
15


In [10]:
## we can even run the Python code (potentially risky due to prompt injection attacks when running unverified code)
exec()

code = res['Python'] + '''
p = [1, 2, 3, 4, 5]

'''
# try:
#     print('The output sum is', func_sum(p))
# except Exception as e:
#     print('An exception occured')

# 3. Type forcing output variables
- Generally, ```strict_json``` will infer the data type automatically for you for the output fields
- However, if you would like very specific data types, you can do data forcing using ```type: <data_type>``` at the last part of the output field description
- ```<data_type>``` must be of the form `int`, `float`, `str`, `dict`, `list`, `array`, `Dict[]`, `List[]`, `Array[]`, `Enum[]`, `bool` for type checking to work
- The `Enum` and `List` are not case sensitive, so `enum` and `list` works just as well
- For `Enum[list_of_category_names]`, it is best to give an "Other" category in case the LLM fails to classify correctly with the other options.
- If `list` or `List[]` is not formatted correctly in LLM's output, we will correct it by asking the LLM to list out the elements line by line
- For `dict`,  we can further check whether keys are present using `Dict[list_of_key_names]`
- Other types will first be forced by rule-based conversion, any further errors will be fed into LLM's error feedback mechanism
- If `<data_type>` is not the specified data types, it can still be useful to shape the output for the LLM. However, no type checking will be done.
- Note: GPT understands the word `Array` better than `List` since `Array` is the official JSON object type, so backend, any type with the word `List` will be converted to `Array`. It is also recommended that you mention `Array` instead of `List` in your `output_format` free text description

### LLM-based checks
- If you would like the LLM to ensure that the type is being met, use `type: ensure <requirement>`
- This will run a LLM to check if the requirement is met. If requirement is not met, the LLM will generate what needs to be done to meet the requirement, which will be fed into the error-correcting loop of `strict_json`

#### Example Usage 1
```python
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment, type: Enum["Pos", "Neg", "Other"]',
                                    'Adjectives': 'Array of adjectives, type: List[str]',
                                    'Words': 'Number of words, type: int',
                                    'In English': 'Whether sentence is in English, type: bool'})
                                    
print(res)
```

#### Example Output 1
```{'Sentiment': 'Pos', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7, 'In English': True}```

#### Example Usage 2
```python
res = strict_json(system_prompt = 'You are an expert at organising birthday parties',
                    user_prompt = 'Give me some information on how to organise a birthday',
                    output_format = {'Famous Quote about Age': 'quote with name, type: ensure quote contains the word age',
                                    'Lucky draw numbers': '3 numbers from 1-50, type: List[int]',
                                    'Sample venues': 'Describe two venues, type: List[Dict["Venue", "Description"]]'})

print(res)
```

#### Example Output 2
`Using LLM to check "The secret of staying young is to live honestly, eat slowly, and lie about your age. - Lucille Ball" to see if it adheres to "quote contains the word age" Requirement Met: True`


```{'Famous Quote about Age': 'The secret of staying young is to live honestly, eat slowly, and lie about your age. - Lucille Ball',```
```'Lucky draw numbers': [7, 21, 35],```

```'Sample venues': [{'Venue': 'Beachside Resort', 'Description': 'A beautiful resort with stunning views of the beach. Perfect for a summer birthday party.'}, {'Venue': 'Indoor Trampoline Park', 'Description': 'An exciting venue with trampolines and fun activities. Ideal for an active and energetic birthday celebration.'}]}```

In [19]:
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment, type: Enum["Pos", "Neg", "Other"]',
                                    'Adjectives': 'Array of Adjectives, type: List[str]',
                                    'Words': 'Number of words, type: int',
                                    'In English': 'Whether sentence is in English, type: bool'})

print(res)

{'Sentiment': 'Pos', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6, 'In English': True}


In [13]:
# Example of multiple nested inputs with multiple types of checks
# Quotes may fail json parsing, so let us use LLMs to fix it
res = strict_json(system_prompt = 'You are an expert at organising birthday parties',
                    user_prompt = 'Give me some information on how to organise a birthday',
                    output_format = {'Famous Quote about Age': 'quote with name, type: ensure quote includes word age',
                                    'Lucky draw numbers': '3 numbers from 1-50, type: List[int]',
                                    'Sample venues': 'Describe two venues, type: List[Dict["Venue", "Description"]]'})

print(res)

Using LLM to check "Age is a case of mind over matter. If you don't mind, it doesn't matter. - Mark Twain" to see if it adheres to "quote includes word age"
Requirement Met: True


{'Famous Quote about Age': "Age is a case of mind over matter. If you don't mind, it doesn't matter. - Mark Twain", 'Lucky draw numbers': [7, 21, 35], 'Sample venues': [{'Venue': 'Outdoor Park', 'Description': 'A spacious park with picnic areas and playgrounds, perfect for a fun and relaxed outdoor birthday party.'}, {'Venue': 'Indoor Activity Center', 'Description': 'An indoor venue with various activities like trampolines, arcade games, and laser tag, ideal for an action-packed birthday celebration.'}]}


# 4. Functions
- Enhances ```strict_json()``` with a function-like interface for repeated use of modular LLM-based functions (or wraps external functions)
- Use angle brackets <> to enclose input variable names. First input variable name to appear in `fn_description` will be first input variable and second to appear will be second input variable. For example, `fn_description = 'Adds up two numbers, <var1> and <var2>'` will result in a function with first input variable `var1` and second input variable `var2`
- (Optional) If you would like greater specificity in your function's input, you can describe the variable after the : in the input variable name, e.g. `<var1: an integer from 10 to 30>`. Here, `var1` is the input variable and `an integer from 10 to 30` is the description.
- (Optional) If your description of the variable is one of `int`, `float`, `str`, `dict`, `list`, `array`, `Dict[]`, `List[]`, `Array[]`, `Enum[]`, `bool`, we will enforce type checking when generating the function inputs in `get_next_subtask` method of the `Agent` class. Example: `<var1: int>`. Refer to Section 3. Type Forcing Output Variables for details.
- Inputs (primary):
    - **fn_description**: String. Function description to describe process of transforming input variables to output variables. Variables must be enclosed in <> and listed in order of appearance in function input.
        - New feature: If `external_fn` is provided and no `fn_description` is provided, then we will automatically parse out the fn_description based on docstring of `external_fn`. Only requirement is that the docstring must contain the names of all compulsory input variables
    - **output_format**: Dict. Dictionary containing output variables names and description for each variable.
    
- Inputs (optional):
    - **examples** - Dict or List[Dict]. Examples in Dictionary form with the input and output variables (list if more than one)
    - **external_fn** - Python Function. If defined, instead of using LLM to process the function, we will run the external function. 
        If there are multiple outputs of this function, we will map it to the keys of `output_format` in a one-to-one fashion
    - **fn_name** - String. If provided, this will be the name of the function. Otherwise, if `external_fn` is provided, it will be the name of `external_fn`. Otherwise, we will use LLM to generate a function name from the `fn_description`
    - **kwargs** - Dict. Additional arguments you would like to pass on to the strict_json function
        
- Outputs:
    JSON of output variables in a dictionary (similar to ```strict_json```)
    
#### Example Usage 1 (Description only)
```python
# basic configuration with variable names (in order of appearance in fn_description)
fn = Function(fn_description = 'Output a sentence with <obj> and <entity> in the style of <emotion>', 
                     output_format = {'output': 'sentence'})

# Use the function
fn('ball', 'dog', 'happy') #obj, entity, emotion
```

#### Example Output 1
```{'output': 'The happy dog chased the ball.'}```

#### Example Usage 2 (Examples only)
```python
# Construct the function: infer pattern from just examples without description (here it is multiplication)
fn = Function(fn_description = 'Map <var1> and <var2> to output based on examples', 
                     output_format = {'output': 'final answer'}, 
                     examples = [{'var1': 3, 'var2': 2, 'output': 6}, 
                                 {'var1': 5, 'var2': 3, 'output': 15}, 
                                 {'var1': 7, 'var2': 4, 'output': 28}])

# Use the function
fn(2, 10) #var1, var2
```

#### Example Output 2
```{'output': 20}```

#### Example Usage 3 (Description and Examples)
```python
# Construct the function: description and examples with variable names
# variable names will be referenced in order of appearance in fn_description
fn = Function(fn_description = 'Output the sum and difference of <num1> and <num2>', 
                 output_format = {'sum': 'sum of two numbers', 
                                  'difference': 'absolute difference of two numbers'},
                 examples = {'num1': 2, 'num2': 4, 'sum': 6, 'difference': 2})

# Use the function
fn(3, 4) #num1, num2
```

#### Example Output 3
```{'sum': 7, 'difference': 1}```

#### Example Usage 4 (External Function with Variable Description)
```python
def binary_to_decimal(x):
    return int(str(x), 2)

# an external function with a single output variable, with an expressive variable description
fn = Function(fn_description = 'Convert input <x: a binary number in base 2> to base 10', 
            output_format = {'output1': 'x in base 10'},
            external_fn = binary_to_decimal)

# Use the function
fn(10) #x
```

#### Example Output 4
```{'output1': 2}```

#### Example Usage 5 (fn_description inferred from type hints and docstring of External Function)
```python
# Docstring must provide all compulsory input variables
# We will ignore shared_variables, *args and **kwargs
def add_number_to_list(num1: int, num_list: list, other_var: bool = True, *args, **kwargs):
    '''Adds num1 to num_list'''
    num_list.append(num1)
    return num_list

fn = Function(external_fn = add_number_to_list, 
    output_format = {'num_array': 'Array of numbers'})

# Show the processed function docstring
print(str(fn))

# Use the function
fn(3, [2, 4, 5])
```

#### Example Output 5
`Description: Adds <num1: int> to <num_list: list>`

`Input: ['num1', 'num_list']`

`Output: {'num_list': 'Array of numbers'}`

`{'num_list': [2, 4, 5, 3]}`

In [21]:
# basic configuration with variable names (in order of appearance in fn_description)
fn = Function(fn_description = 'Output a sentence with <obj> and <entity> in the style of <emotion>', 
                     output_format = {'output': 'sentence'})
fn('ball', 'dog', 'happy') #obj, entity, emotion

{'output': 'The dog happily chased after the ball.'}

In [22]:
# infer pattern from just examples without description (here it is multiplication)
fn = Function(fn_description = 'Map <var1> and <var2> to output based on examples', 
                     output_format = {'output': 'final answer'}, 
                     examples = [{'var1': 3, 'var2': 2, 'output': 6}, 
                                 {'var1': 5, 'var2': 3, 'output': 15}, 
                                 {'var1': 7, 'var2': 4, 'output': 28}])
fn(2, 10) #var1, var2

{'output': 20}

In [23]:
# multiple outputs and examples with variable names (recommended)
fn = Function(fn_description = 'Output the sum and difference of <num1> and <num2>', 
                 output_format = {'sum': 'sum of two numbers', 
                                  'difference': 'absolute difference of two numbers'},
                 examples = {'num1': 2, 'num2': 4, 'sum': 6, 'difference': 2})
fn(3, 4) #num1, num2

{'sum': 7, 'difference': 1}

In [24]:
# multiple outputs with variable names
fn = Function(fn_description = '''Output the integer sum of <num1: int or str> and <num2: int or str>
generate a poem in style of <poem_style> and code in <prog_language>''', 
                 output_format = {'sum': 'sum of two numbers', 
                'poem': 'poem about two numbers',
                'code': 'code to do the sum of any two numbers num1 and num2'})
fn('three', 4, 'happy', 'Python') #num1, num2, poem_style, prog_language

{'sum': 7,
 'poem': 'Three and four, together they soar, bringing joy forevermore',
 'code': 'def sum_numbers(num1, num2):\n    return int(num1) + int(num2)'}

## External Function Examples

In [25]:
def consecutive_sum(x):
    return x, x+1, x+2

# an external function with multiple output variables
fn = Function(fn_description = 'Given input <x: int>, output x, x+1, x+8', 
            output_format = {'output1': 'x', 'output2': 'x+8', 'output3': 'x+2'},
            external_fn = consecutive_sum)

# Use the function
fn(4) #x

{'output1': 4, 'output2': 5, 'output3': 6}

In [26]:
def binary_to_decimal(x):
    return int(str(x), 2)

# an external function with a single output variable, with an expressive variable description
fn = Function(fn_description = 'Convert input <x: a binary number in base 2> to base 10', 
            output_format = {'output1': 'x in base 10'},
            external_fn = binary_to_decimal)

# Use the function
fn(10) #x

{'output1': 2}

## Example inferring of fn_description from docstring and type hints

In [27]:
# Docstring must provide all compulsory input variables
# We will ignore shared_variables, *args and **kwargs
def add_number_to_list(num1: int, num_list: list, other_var: bool = True, *args, **kwargs):
    '''Adds num1 to num_list'''
    num_list.append(num1)
    return num_list

fn = Function(external_fn = add_number_to_list, 
    output_format = {'num_list': 'Array of numbers'})

# Show the processed function docstring
print(str(fn))

# Use the function
fn(3, [2, 4, 5])

Description: Adds <num1: int> to <num_list: list>
Input: ['num1', 'num_list']
Output: {'num_list': 'Array of numbers'}



{'num_list': [2, 4, 5, 3]}

# 5. Integrating with your own LLM
- StrictJSON has native support for OpenAI LLMs (you can put the LLM API parameters inside `strict_json` or `Function` directly)
- If your LLM is not from OpenAI, it is really easy to integrate with your own Custom LLM
- Simply pass your custom LLM function inside the `llm` parameter of `strict_json` or `Function`
    - Inputs:
        - system_prompt: String. Write in whatever you want the LLM to become. e.g. "You are a \<purpose in life\>"
        - user_prompt: String. The user input. Later, when we use it as a function, this is the function input
    - Output:
        - res: String. The response of the LLM call

#### Example Custom LLM
```python
def llm(system_prompt: str, user_prompt: str):
    ''' Here, we use OpenAI for illustration, you can change it to your own LLM '''
    # ensure your LLM imports are all within this function
    from openai import OpenAI
    
    # define your own LLM here
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-3.5-turbo',
        temperature = 0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content
```

#### Example Usage with `strict_json`
```python
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                                     llm = llm) # set this to your own LLM

print(res)
```

#### Example Output
```{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}```

In [28]:
def llm(system_prompt: str, user_prompt: str):
    ''' Here, we use OpenAI for illustration, you can change it to your own LLM '''
    # ensure your LLM imports are all within this function
    from openai import OpenAI
    
    # define your own LLM here
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-3.5-turbo',
        temperature = 0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content

    # toggle this to true to test out whether the llm input variable is working for strict_json, Function, Agent
    # return "{'Sentiment': 'hello', 'Adjectives': ['hello'], 'Words': 7}"

In [29]:
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                                     llm = llm) # set this to your own LLM
print(res)

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

# 6. Integrating with OpenAI JSON Mode
- If you want to use the OpenAI JSON Mode (which is pretty good btw), you can simply add in ```openai_json_mode = True``` in ```strict_json``` or ```Function```
- Note that the model must be one of ```gpt-4-1106-preview``` or ```gpt-3.5-turbo-1106```. We will set it to ```gpt-3.5-turbo-1106``` by default if you provide an invalid model
- Note that type checking does not work with OpenAI JSON Mode

#### Example Usage
```python
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                    openai_json_mode = True) # Toggle this to True
                                    
print(res)
```

#### Example Output
```{'Sentiment': 'positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6}```

In [None]:
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                   openai_json_mode = True) # Toggle this to True
print(res)

{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6}


In [None]:
fn = Function(fn_description = 'Output a sentence with words var1 and var2 in the style of var3', 
                     output_format = {'output': 'sentence'},
                    openai_json_mode = True) # Toggle this to True
fn('ball', 'dog', 'happy')

{'output': 'The ball made the dog happy.'}

# 7. Nested Outputs
- StrictJSON supports nested outputs like nested lists and dictionaries

#### Example Input
```python
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': ['Type of Sentiment', 
                                                   'Strength of Sentiment, type: Enum[1, 2, 3, 4, 5]'],
                                    'Adjectives': "Name and Description, type: List[Dict['Name', 'Description']]",
                                    'Words': {
                                        'Number of words': 'Word count', 
                                        'Language': {
                                              'English': 'Whether it is English, type: bool',
                                              'Chinese': 'Whether it is Chinese, type: bool'
                                                  },
                                        'Proper Words': 'Whether the words are proper in the native language, type: bool'
                                        }
                                    })

print(res)
```

#### Example Output
`{'Sentiment': ['Positive', 3],`

`'Adjectives': [{'Name': 'beautiful', 'Description': 'pleasing to the senses'}, {'Name': 'sunny', 'Description': 'filled with sunshine'}],`

`'Words':`

`     {'Number of words': 6,`
    
`     'Language': {'English': True, 'Chinese': False},`

`     'Proper Words': True}`
    
`}`

In [None]:
res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': ['Type of Sentiment', 
                                                   'Strength of Sentiment, type: Enum[1, 2, 3, 4, 5]'],
                                    'Adjectives': "Name and Description, type: List[Dict['Name', 'Description']]",
                                    'Words': {
                                        'Number of words': 'Word count', 
                                        'Language': {
                                              'English': 'Whether it is English, type: bool',
                                              'Chinese': 'Whether it is Chinese, type: bool'
                                                  },
                                        'Proper Words': 'Whether the words are proper in the native language, type: bool'
                                        }
                                    })
print(res)

{'Sentiment': ['Positive', 3], 'Adjectives': [{'Name': 'beautiful', 'Description': 'describes the day'}, {'Name': 'sunny', 'Description': 'describes the day'}], 'Words': {'Number of words': 6, 'Language': {'English': True, 'Chinese': False}, 'Proper Words': True}}


# 8. Additional Output Field Checks (Advanced)

- You can also specify your own custom check function that will be used to check the output field (which will be in `str`, `int`, `float`, `list` or `dict` format inferred by LLM or specified in `type: <data type>`)
- Ensure that what you are checking for is implied in the output field's description in `output_format` of `strict_json` or `Function`
- Your custom check function must take in: `output_field`
- Your custom check function must output: 
    - `requirement` (str): The requirement you are checking for
    - `requirement_met` (bool): Whether condition is met, True or False
    - `action_needed` (str): What needs to be done to meet requirement if requirement_met is False
- If `requirement_met` is False, the `requirement` and `action_needed` message will be used for the `strict_json` error correcting mechanism. Otherwise, the error correcting mechanism will not be triggered
- `action_needed` is used to tell the LLM what it needs to do to meet your requirements (LLM is not able to self-correct without guidance for most cases). Try to be as specific as possible to improve error correction success rate.
- Pass in your custom check function inside `custom_checks` variable of `strict_json` or `Function` under the same key as that in `output_format`
- You can add multiple check functions for one variable by putting it inside the same list
- Example custom check function named `hello_world_check` which checks for the presence of hello world
- You can also use information in the variable `check_data` for checks (input via `strict_json` or `Function`)

#### Example Custom Check Functions
```python
def hello_world_check(output_field, check_data) -> (str, bool, str):
    ''' Example function 1: Checks whether hello world is present in output_field. '''
    requirement = 'Check whether hello world is present in output field'
    requirement_met = True
    action_needed = ''
    # do a check for requirement of having 'hello'
    if 'hello' not in str(output_field):
        requirement_met = False
        action_needed += 'Add in the word hello into output field, '
    if 'world' not in str(output_field):
        requirement_met = False
        action_needed += 'Add in the word world into output field, '
    return (requirement, requirement_met, action_needed)
```

```python
def function_name_check(output_field, check_data) -> (str, bool, str):
    ''' Example function 2: Checks whether function name is present in output_field
    Uses additional information from the check_data variable of strict_json'''
    function_name = check_data['Function name']
    requirement = f'Check whether {function_name} is present in output field'
    requirement_met = True
    action_needed = ''
    
    # do a check for requirement of having 'myprint'
    if function_name not in str(output_field):
        requirement_met = False
        action_needed += f'Ensure that function name "{function_name}" is used, '
    return (requirement, requirement_met, action_needed)
```

#### Example Usage 1 (in strict_json)
```python
# we can input our custom_checks as a list of check functions, and check_data is the additional information for these check functions
res = strict_json(system_prompt = 'You are a code generator',
                    user_prompt = 'Print out hello world',
                    output_format = {'Thoughts': 'How to do it',
                                    'Python Code': 'Function beginning with def myprint() -> str:'},
                    custom_checks = {'Python Code': [hello_world_check, function_name_check]},
                    check_data = {'Function name:' 'myprint'})
                                    
print(res)
```
#### Example Output 1
`Running check for "Check whether hello world is present in output field" on output field of "Python Code"
Requirement met`


`Running check for "Check whether myprint is present in output field" on output field of "Python Code"
Requirement met`


`{'Thoughts': 'To print out "hello world", use the print() function in Python.',`
`'Python Code': 'def myprint() -> str:\n    return "hello world"'}`

#### Example Usage 2 (in Function)

```python
fn = Function(fn_description = 'Output code to print hello world in a function named <var1>', 
                     output_format = {'Python code': 'Python function named <var1> to print hello world'},
                     custom_checks = {'Python code': [function_name_check]})

# in runtime of function, we can input what we would want to check in check_data if we are not sure what it will be beforehand
fn('hello world', 'myprint', check_data = {'Function name': 'myprint'})
```

#### Example Output 2

`Running check for "Check whether myprint is present in output field" on output field of "Python code"
Requirement met`

`{'Python code': 'def myprint():\n    print("hello world")'}`

In [None]:
def hello_world_check(output_field, check_data) -> (str, bool, str):
    ''' Example function 1: Checks whether hello world is present in output_field. '''
    requirement = 'Check whether hello world is present in output field'
    requirement_met = True
    action_needed = ''
    # do a check for requirement of having 'hello'
    if 'hello' not in str(output_field):
        requirement_met = False
        action_needed += 'Add in the word hello into output field, '
    if 'world' not in str(output_field):
        requirement_met = False
        action_needed += 'Add in the word world into output field, '
    return (requirement, requirement_met, action_needed)

def function_name_check(output_field, check_data) -> (str, bool, str):
    ''' Example function 2: Checks whether function name is present in output_field
    Uses additional information from the check_data variable of strict_json'''
    function_name = check_data['Function name']
    requirement = f'Check whether {function_name} is present in output field'
    requirement_met = True
    action_needed = ''
    
    # do a check for requirement of having 'myprint'
    if function_name not in str(output_field):
        requirement_met = False
        action_needed += f'Ensure that function name "{function_name}" is used, '
    return (requirement, requirement_met, action_needed)

In [None]:
# we can input our custom_checks as a list of check functions, and check_data is the additional information for these check functions
res = strict_json(system_prompt = 'You are a code generator',
                    user_prompt = 'Print out hello world',
                    output_format = {'Thoughts': 'How to do it',
                                    'Python Code': 'Function beginning with def myprint() -> str:'},
                    custom_checks = {'Python Code': [hello_world_check, function_name_check]},
                    check_data = {'Function name': 'myprint'})
                                    
print(res)

Running check for "Check whether hello world is present in output field" on output field of "Python Code"
Requirement met


Running check for "Check whether myprint is present in output field" on output field of "Python Code"
Requirement met


{'Thoughts': 'Use the print() function in Python', 'Python Code': 'def myprint() -> str: \n    print("hello world")'}


In [None]:
fn = Function(fn_description = 'Output code to print hello world in a function named <var1>', 
                     output_format = {'Python code': 'Python function named <var1> to print hello world'},
                     custom_checks = {'Python code': [function_name_check]})

# in runtime of function, we can input what we would want to check in check_data if we are not sure what it will be beforehand
fn('hello world', 'myprint', check_data = {'Function name': 'myprint'})

Running check for "Check whether myprint is present in output field" on output field of "Python code"
Requirement met




{'Python code': 'def myprint():\n    print("hello world")'}

# Optional: Under the hood (Explanation of how strict_json works)
- When given the output JSON format, it adds a delimiter (default: ###) to enclose the key of the JSON.
- Example Output JSON provided: ```{'Sentiment': 'Type of Sentiment'}```
- Example Output JSON interpreted by Strict JSON: ```{'###Sentiment###': 'Type of Sentiment'}```
- We then process the JSON format by using regex to search for the delimiter to extract the keys and values
- Works for nested data structures as well by extracting recursively
- Note: Change the delimiter to whatever is not present in your dataset

In [None]:
# a very difficult chunk of text for json.loads() to parse (it will fail)
res = '''{
'###Question of the day###': 'What is the 'x' in dx/dy?', 
'###Code Block 1###': '#include <stdio.h>\nint main(){\nint x = 'a'; return 0;\n}',
'###Another Code###': 'import numpy as np
### Oh what is this doing here
print("It can handle so many quotations ' \\" and backslashes and unexpected curly braces { } You don't even need to match }!")',
'###Some characters###': '~!@#$%^&*()_+-'"{}[];?><,.'
}'''

In [None]:
# change this to whatever is not common in your dataset
delimiter = '###'

In [None]:
import re
# Use regular expressions to extract keys and values
pattern = fr",*\s*['|\"]{delimiter}([^#]*){delimiter}['|\"]: "

matches = re.split(pattern, str(res[1:-1]).strip())

# remove null matches
my_matches = [match for match in matches if match !='']

print(my_matches)

['Question of the day', "'What is the 'x' in dx/dy?'", 'Code Block 1', "'#include <stdio.h>\nint main(){\nint x = 'a'; return 0;\n}'", 'Another Code', '\'import numpy as np\n### Oh what is this doing here\nprint("It can handle so many quotations \' \\" and backslashes and unexpected curly braces { } You don\'t even need to match }!")\'', 'Some characters', '\'~!@#$%^&*()_+-\'"{}[];?><,.\'']


In [None]:
# remove the ' from the value matches
curated_matches = [match[1:-1] if match[0] in '\'"' else match for match in my_matches]

print(curated_matches)

['Question of the day', "What is the 'x' in dx/dy?", 'Code Block 1', "#include <stdio.h>\nint main(){\nint x = 'a'; return 0;\n}", 'Another Code', 'import numpy as np\n### Oh what is this doing here\nprint("It can handle so many quotations \' \\" and backslashes and unexpected curly braces { } You don\'t even need to match }!")', 'Some characters', '~!@#$%^&*()_+-\'"{}[];?><,.']


In [None]:
len(curated_matches)

8

In [None]:
# create a dictionary
end_dict = {}
for i in range(0, len(curated_matches), 2):
    end_dict[curated_matches[i]] = curated_matches[i+1]
    
print(end_dict)

{'Question of the day': "What is the 'x' in dx/dy?", 'Code Block 1': "#include <stdio.h>\nint main(){\nint x = 'a'; return 0;\n}", 'Another Code': 'import numpy as np\n### Oh what is this doing here\nprint("It can handle so many quotations \' \\" and backslashes and unexpected curly braces { } You don\'t even need to match }!")', 'Some characters': '~!@#$%^&*()_+-\'"{}[];?><,.'}


In [None]:
for key, value in end_dict.items():
    print('Key:', key)
    print('Value:', value)
    print('#####')

Key: Question of the day
Value: What is the 'x' in dx/dy?
#####
Key: Code Block 1
Value: #include <stdio.h>
int main(){
int x = 'a'; return 0;
}
#####
Key: Another Code
Value: import numpy as np
### Oh what is this doing here
print("It can handle so many quotations ' \" and backslashes and unexpected curly braces { } You don't even need to match }!")
#####
Key: Some characters
Value: ~!@#$%^&*()_+-'"{}[];?><,.
#####
