# Automation Evaluation

This tutorial demonstrates how you can use the [automation framework](https://rpc.cfainstitute.org/research/the-automation-ahead-content-series/introduction) to quickly assess the automation potential of an investment task. This notebook guides you through a programmatic approach to running this process with the following components:

1. **Create an Interactive Form**  
   Use `ipywidgets` in Jupyter Notebook to collect user inputs, such as task descriptions and ratings for various attributes related to automation suitability.

2. **Format Inputs into a Prompt**  
   Leverage LangChain's `PromptTemplate` to structure the collected inputs into a well-organized format optimized for GPT processing.

3. **Integrate OpenAI Chat Completion**  
   Use the OpenAI API to send the formatted prompt to a GPT model and retrieve a detailed evaluation or recommendation.

4. **Save GPT Outputs**  
   Save the model's response to a markdown file for documentation, sharing, or further analysis.

In [1]:
pip install openai

Note: you may need to restart the kernel to use updated packages.


## OpenAI Chat Completion
This section integrates the OpenAI API to send the formatted gpt prompt and retrieve the GPT response.

In [None]:
from openai import OpenAI
from dotenv import load_dotenv


#create a .env file with your openai_api_key
load_dotenv()
client = OpenAI()

# Or set your OpenAI API key with the below
# openai_api_key = "your_openai_api_key"
# client = OpenAI(api_key=openai_api_key)


def run_gpt_chat_completion(input_text,model='gpt-4o'):
    """
    Send the input text to OpenAI GPT and get a chat completion.
    
    Args:
        input_text (str): The input prompt to send to the GPT model.
    
    Returns:
        str: The response from the GPT model.
    """
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are an automation framework evaluator."},
                {"role": "user", "content": input_text}
            ],
            max_tokens=1500,
            temperature=0  # Lower temperature for deterministic outputs
        )
        return response.choices[0].message
    except Exception as e:
        print(f"Error with OpenAI API: {e}")
        return None


## Form for Input
Here, we create an interactive form using `ipywidgets` to collect user inputs, such as task description and ratings for automation suitability.

In [None]:
import ipywidgets as widgets
from IPython.display import display, HTML

# Styling for the table
table_style = """
<style>
    table {
        border-collapse: collapse;
        width: 100%;
    }
    th, td {
        border: 1px solid #ddd;
        padding: 8px;
        text-align: left;
    }
    th {
        background-color: #f2f2f2;
    }
</style>
"""

# Input for task description
task_description = widgets.Textarea(
    value="",
    placeholder="Describe the task for automation",
    description="Task Description:",
    layout=widgets.Layout(width="600px", height="100px"),
    style={'description_width': '150px'}
)

# Define the attributes and their descriptions
attributes = [
    ("Task Complexity", "Is the task repetitive (1) or highly variable (5)?"),
    ("Output Objectivity", "Are the outputs objective (1) or subjective (5)?"),
    ("Data Structure", "Is the data structured (1) or unstructured (5)?"),
    ("Risk Level", "What is the potential risk of automation failure? Low (1), High (5)"),
    ("Human Oversight Requirement", "Does the task need human validation or sign-off? No (1), Yes (5)"),
    ("Impact on Efficiency", "How much time or effort can automation save? Little (1), a lot (5)")
]

# Create widgets for each row
rows = []
for attribute, description in attributes:
    slider = widgets.IntSlider(
        min=1, max=5, value=3, 
        description=description, 
        style={'description_width': '400px'}, 
        layout=widgets.Layout(width="600px")
    )
    rows.append((attribute, slider))

# Global variable to store inputs
collected_inputs = {}

# Function to display the form
def display_table(rows):
    # Render table headers
    header_html = f"""
    {table_style}
    <table>
        <tr>
            <th>Attribute</th>
            <th>Rating (1-5)</th>
        </tr>
    """
    display(HTML(header_html))

    # Display each row
    for attribute, slider in rows:
        row_box = widgets.HBox([
            widgets.Label(attribute, layout=widgets.Layout(width="200px")),
            slider
        ])
        display(row_box)

# Output area to capture form submission
output = widgets.Output()

# Function to collect inputs and update the global variable
def handle_submit(b):
    global collected_inputs
    with output:
        output.clear_output()  # Clear previous output
        # Collect task description
        task_desc = task_description.value

        # Collect scorecard values
        scorecard_values = {}
        for attribute, slider in rows:
            scorecard_values[attribute] = slider.value

        # Store the collected inputs in the global dictionary
        collected_inputs = {
            "task_description": task_desc,
            "scorecard": scorecard_values
        }

        # Display the collected inputs for debugging or confirmation
        print("=== Collected Inputs ===")
        print(collected_inputs)

# Attach the function to the Submit button
submit_button = widgets.Button(
    description="Submit",
    button_style="success"
)
submit_button.on_click(handle_submit)

# Display the form
print("Fill out the following form to assess automation suitability:")
display(task_description)
display_table(rows)
display(submit_button, output)

# Access the collected inputs after submission
def get_collected_inputs():
    return collected_inputs

Fill out the following form to assess automation suitability:


Textarea(value='', description='Task Description:', layout=Layout(height='100px', width='600px'), placeholder=…

Attribute,Rating (1-5)


HBox(children=(Label(value='Task Complexity', layout=Layout(width='200px')), IntSlider(value=3, description='I…

HBox(children=(Label(value='Output Objectivity', layout=Layout(width='200px')), IntSlider(value=3, description…

HBox(children=(Label(value='Data Structure', layout=Layout(width='200px')), IntSlider(value=3, description='Is…

HBox(children=(Label(value='Risk Level', layout=Layout(width='200px')), IntSlider(value=3, description='What i…

HBox(children=(Label(value='Human Oversight Requirement', layout=Layout(width='200px')), IntSlider(value=3, de…

HBox(children=(Label(value='Impact on Efficiency', layout=Layout(width='200px')), IntSlider(value=3, descripti…

Button(button_style='success', description='Submit', style=ButtonStyle())

Output()

## Prompt Template
We define a structured prompt template using LangChain's `PromptTemplate` to format the user inputs into a well-organized structure for GPT.

In [41]:
from langchain.prompts import PromptTemplate

# Define the prompt template
template = """
Using this framework developed for GenAI task automation:
| Attribute | GenAI Automation | 
|-----------|------------------| 
| Data Type | Unstructured data (e.g., earnings transcripts, market news) | 
| Task Variability | Repetitive and Variable tasks (e.g., customizing strategies, client communications) | 
| Input Objectivity | Handles subjective, ambiguous inputs (e.g., free-form text) | 
| Output Objectivity | Stochastic, probabilistic outputs (e.g., personalized reports) | 
| Scalability | Scalable but with potential high computational cost | 

I would like to assess the following task: 
{task_description}

I have rated the task across various attributes using the following scorecard:
Scorecard:
- Task Complexity: {task_complexity} (1: repetitive, 5: highly variable)
- Output Objectivity: {output_objectivity} (1: objective, 5: subjective)
- Data Structure: {data_structure} (1: structured, 5: unstructured)
- Risk Level: {risk_level} (1: low risk, 5: high risk)
- Human Oversight Requirement: {human_oversight} (1: no, 5: yes)
- Impact on Efficiency: {impact_efficiency} (1: little impact, 5: significant impact)

Based on this scorecard, please provide a score for the overall GenAI automation suitability, 
a hybrid approach fit score, and the recommended hybrid approach, if appropriate, for automating 
this task, specifying where traditional automation, GenAI, and human intervention would be most effective.
 Consider the task’s variability, output objectivity, and potential risks in your recommendation
"""

# Create a PromptTemplate instance
prompt_template = PromptTemplate(
    input_variables=[
        "task_description",
        "task_complexity",
        "output_objectivity",
        "data_structure",
        "risk_level",
        "human_oversight",
        "impact_efficiency"
    ],
    template=template
)

In [42]:
# Get the collected inputs from the form
inputs = get_collected_inputs()

# Format the prompt using the collected inputs
formatted_prompt = prompt_template.format(
    task_description=inputs["task_description"],
    task_complexity=inputs["scorecard"]["Task Complexity"],
    output_objectivity=inputs["scorecard"]["Output Objectivity"],
    data_structure=inputs["scorecard"]["Data Structure"],
    risk_level=inputs["scorecard"]["Risk Level"],
    human_oversight=inputs["scorecard"]["Human Oversight Requirement"],
    impact_efficiency=inputs["scorecard"]["Impact on Efficiency"]
)

print("Formatted GPT Prompt:\n")
print(formatted_prompt)

Formatted GPT Prompt:


Using this framework developed for GenAI task automation:
| Attribute | GenAI Automation | 
|-----------|------------------| 
| Data Type | Unstructured data (e.g., earnings transcripts, market news) | 
| Task Variability | Repetitive and Variable tasks (e.g., customizing strategies, client communications) | 
| Input Objectivity | Handles subjective, ambiguous inputs (e.g., free-form text) | 
| Output Objectivity | Stochastic, probabilistic outputs (e.g., personalized reports) | 
| Scalability | Scalable but with potential high computational cost | 

I would like to assess the following task: 
The task is to conduct performance attribution for a client’s overall portfolio, which includes multiple mutual funds. This involves breaking down the portfolio's performance to identify the contributions from stock selection and asset allocation. Additionally, it provides context by analyzing macroeconomic and sector events that may have influenced performance over the qu

## Display Output
Finally, we run the formatted input and display the GPT response.

In [45]:
from IPython.display import Markdown, display

response = run_gpt_chat_completion(formatted_prompt)

markdown_response = f'# GPT Response\n\n{response.content}'

display(Markdown(markdown_response))


# GPT Response

To assess the suitability of GenAI automation for the task of conducting performance attribution for a client's portfolio, we need to consider the task's complexity, output objectivity, data structure, risk level, human oversight requirement, and impact on efficiency. Here's a breakdown of the scores and recommendations:

### Overall GenAI Automation Suitability Score

1. **Task Complexity (4/5):** The task is highly variable, involving analysis and interpretation of financial data, which suits GenAI's ability to handle complex, variable tasks.
   
2. **Output Objectivity (4/5):** The task requires subjective analysis and interpretation, aligning well with GenAI's strength in generating probabilistic outputs.

3. **Data Structure (2/5):** The task involves semi-structured data (financial data) and unstructured data (macroeconomic analysis), which GenAI can handle but may require additional preprocessing.

4. **Risk Level (2/5):** The task has a moderate risk level, as financial analysis impacts decision-making. GenAI can assist but should be used cautiously.

5. **Human Oversight Requirement (1/5):** Minimal human oversight is required, suggesting that GenAI can automate significant portions of the task.

6. **Impact on Efficiency (4/5):** Automating this task with GenAI can significantly improve efficiency by quickly analyzing large datasets and generating insights.

**Overall GenAI Suitability Score:** 3.5/5

### Hybrid Approach Fit Score

Given the complexity and variability of the task, a hybrid approach is likely more suitable. This approach combines traditional automation, GenAI, and human intervention to balance efficiency and accuracy.

**Hybrid Approach Fit Score:** 4/5

### Recommended Hybrid Approach

1. **Traditional Automation:**
   - **Data Collection and Preprocessing:** Use traditional automation to gather and preprocess structured data from financial databases and reports. This includes extracting performance data of mutual funds and portfolio allocations.

2. **GenAI:**
   - **Performance Analysis and Attribution:** Utilize GenAI to analyze the performance data, identify contributions from stock selection and asset allocation, and generate insights on macroeconomic and sector events. GenAI can handle the subjective interpretation of how these factors influenced performance.

3. **Human Intervention:**
   - **Validation and Contextual Analysis:** Financial analysts should review GenAI-generated reports to ensure accuracy and provide additional context. Human expertise is crucial for interpreting nuanced market conditions and making final recommendations.

By leveraging traditional automation for data handling, GenAI for complex analysis, and human oversight for validation, this hybrid approach maximizes efficiency while minimizing risks associated with subjective financial analysis.