# Deploying AI
## Assignment 1: Evaluating Summaries

A key application of LLMs is to summarize documents. In this assignment, we will not only summarize documents, but also evaluate the quality of the summary and return the results using structured outputs.

**Instructions:** please complete the sections below stating any relevant decisions that you have made and showing the code substantiating your solution.

## Select a Document

Please select one out of the following articles:

+ [Managing Oneself, by Peter Druker](https://www.thecompleteleader.org/sites/default/files/imce/Managing%20Oneself_Drucker_HBR.pdf)  (PDF)
+ [The GenAI Divide: State of AI in Business 2025](https://www.artificialintelligence-news.com/wp-content/uploads/2025/08/ai_report_2025.pdf) (PDF)
+ [What is Noise?, by Alex Ross](https://www.newyorker.com/magazine/2024/04/22/what-is-noise) (Web)

# Load Secrets

In [2]:
%load_ext dotenv
%dotenv ../05_src/.secrets

%reload_ext dotenv

## Load Document

Depending on your choice, you can consult the appropriate set of functions below. Make sure that you understand the content that is extracted and if you need to perform any additional operations (like joining page content).

### PDF

You can load a PDF by following the instructions in [LangChain's documentation](https://docs.langchain.com/oss/python/langchain/knowledge-base#loading-documents). Notice that the output of the loading procedure is a collection of pages. You can join the pages by using the code below.

```python
document_text = ""
for page in docs:
    document_text += page.page_content + "\n"
```

### Web

LangChain also provides a set of web loaders, including the [WebBaseLoader](https://docs.langchain.com/oss/python/integrations/document_loaders/web_base). You can use this function to load web pages.

In [3]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "../05_src/documents/Managing Oneself_Drucker_HBR.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

document_text = ""
for page in docs:
    document_text += page.page_content + "\n"


print(document_text[:500])  # Print the first 500 characters of the document

13
www.hbr.org
B
 
EST  
 
OF  HBR 1999
 
Managing Oneself
 
by Peter F . Drucker
 
•
 
Included with this full-text 
 
Harvard Business Review
 
 article:
The Idea in Brief—the core idea
The Idea in Practice—putting the idea to work
 
1
 
Article Summary
 
2
 
Managing Oneself
A list of related materials, with annotations to guide further
exploration of the article’s ideas and applications
 
12
 
Further Reading
Success in the knowledge 
economy comes to those who 
know themselves—their 
strengths


## Generation Task

Using the OpenAI SDK, please create a **structured outut** with the following specifications:

+ Use a model that is NOT in the GPT-5 family.
+ Output should be a Pydantic BaseModel object. The fields of the object should be:

    - Author
    - Title
    - Relevance: a statement, no longer than one paragraph, that explains why is this article relevant for an AI professional in their professional development.
    - Summary: a concise and succinct summary no longer than 1000 tokens.
    - Tone: the tone used to produce the summary (see below).
    - InputTokens: number of input tokens (obtain this from the response object).
    - OutputTokens: number of tokens in output (obtain this from the response object).
       
+ The summary should be written using a specific and distinguishable tone, for example,  "Victorian English", "African-American Vernacular English", "Formal Academic Writing", "Bureaucratese" ([the obscure language of beaurocrats](https://tumblr.austinkleon.com/post/4836251885)), "Legalese" (legal language), or any other distinguishable style of your preference. Make sure that the style is something you can identify. 
+ In your implementation please make sure to use the following:

    - Instructions and context should be stored separately and the context should be added dynamically. Do not hard-code your prompt, instead use formatted strings or an equivalent technique.
    - Use the developer (instructions) prompt and the user prompt.


In [4]:
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model = 'gpt-4o-mini',
    input = document_text[:2000]  # Use the first 2000 characters as input
)
print(response.to_json())

{
  "id": "resp_0ac310385a7a086c0068ffeaeb52c48194865a4bb6e41d1780",
  "created_at": 1761602283.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-4o-mini-2024-07-18",
  "object": "response",
  "output": [
    {
      "id": "msg_0ac310385a7a086c0068ffeaecd53c8194a38b1d87dce127fb",
      "content": [
        {
          "annotations": [],
          "text": "**Managing Oneself by Peter F. Drucker - Summary**\n\n### The Idea in Brief\nIn today’s knowledge economy, individual success hinges on a profound understanding of oneself—acknowledging personal strengths, values, and preferred working styles. Unlike earlier times when companies managed careers, individuals must now take charge of their professional trajectories.\n\n### The Key Points\n- **Self-Leadership:** Emphasizes the importance of being your own chief executive officer in managing your career.\n- **Self-Awareness:** Recognizing your strengths, weaknesses, learning habits,

In [10]:
from typing import Dict, Any
import json
from openai import OpenAI

# Keep prompts separate and be explicit about the required format
SYSTEM_PROMPT = """You are a document analysis assistant skilled in Victorian English. 
Please analyze the document and return ONLY a JSON object with these exact fields:
{
    "author": "Name of the document's author",
    "title": "Title of the document",
    "relevance": "A single paragraph explaining why this article matters for AI professionals",
    "summary": "A summary written in proper Victorian English style, maximum 1000 tokens"
}
Do not include any text outside the JSON structure."""

USER_PROMPT_TEMPLATE = """Please analyze this document and provide the structured summary:

{text}

Remember to:
1. Format output as pure JSON
2. Write the summary in Victorian English
3. Keep the relevance to one paragraph
4. Focus on professional development aspects"""

client = OpenAI()

def get_structured_summary(document_text: str, max_chars: int = 7000) -> Dict[str, Any]:
    """Get a structured summary using the Responses API."""
    # Combine prompts with the document text
    full_prompt = USER_PROMPT_TEMPLATE.format(text=document_text[:max_chars])
    
    # Make API call using responses API
    response = client.responses.create(
        model = 'gpt-4o-mini',
        input=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": full_prompt}
        ]
    )

    # Debug: print raw response
    print("Raw Response:")
    print(response.output_text)
    
    try:
        # Find and extract JSON from the response
        text = response.output_text
        start = text.find('{')
        end = text.rfind('}') + 1
        if start >= 0 and end > start:
            json_str = text[start:end]
            result = json.loads(json_str)
        else:
            raise ValueError("No JSON found in response")
            
    except Exception as e:
        print(f"Failed to parse response: {e}")
        return {}

    # Get token counts from response object
    try:
        in_tokens = getattr(response.usage, "prompt_tokens", 0)
        out_tokens = getattr(response.usage, "completion_tokens", 0)
    except AttributeError:
        in_tokens = out_tokens = 0

    return {
        "Author": result.get("author", ""),
        "Title": result.get("title", ""),
        "Relevance": result.get("relevance", ""),
        "Summary": result.get("summary", ""),
        "Tone": "Victorian English",
        "InputTokens": in_tokens,
        "OutputTokens": out_tokens
    }

# Test the function
try:
    result = get_structured_summary(document_text[:7000])
    print("\nStructured Output:")
    print(json.dumps(result, indent=2))
except Exception as e:
    print(f"Error: {e}")

Raw Response:
```json
{
    "author": "Peter F. Drucker",
    "title": "Managing Oneself",
    "relevance": "This esteemed article serves as a pivotal compass for AI professionals who navigate the complexities of self-management and personal development amidst the rapidly evolving knowledge economy. Drucker astutely underscores the significance of self-awareness, urging professionals to cultivate a comprehensive understanding of their strengths, values, and work styles, thereby enabling them to thrive in a landscape where they must chart their own destinies.",
    "summary": "In this most enlightening treatise, Mr. Drucker elucidates the paramount importance of self-management in an era teeming with unprecedented opportunities for those possessed of both ambition and discernment. He posits that in the contemporary milieu, wherein corporations do little to shepherd the careers of their knowledge workers, it falls upon the individual to assume the mantle of chief executive officer of the

# Evaluate the Summary

Use the DeepEval library to evaluate the **summary** as follows:

+ Summarization Metric:

    - Use the [Summarization metric](https://deepeval.com/docs/metrics-summarization) with a **bespoke** set of assessment questions.
    - Please use, at least, five assessment questions.

+ G-Eval metrics:

    - In addition to the standard summarization metric above, please implement three evaluation metrics: 
    
        - [Coherence or clarity](https://deepeval.com/docs/metrics-llm-evals#coherence)
        - [Tonality](https://deepeval.com/docs/metrics-llm-evals#tonality)
        - [Safety](https://deepeval.com/docs/metrics-llm-evals#safety)

    - For each one of the metrics above, implement five assessment questions.

+ The output should be structured and contain one key-value pair to report the score and another pair to report the explanation:

    - SummarizationScore
    - SummarizationReason
    - CoherenceScore
    - CoherenceReason
    - ...

In [None]:
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.metrics import SummarizationMetric
...

test_case = LLMTestCase(input=input, actual_output=actual_output)
metric = SummarizationMetric(
    threshold=0.5,
    model='gpt-4o',
    assessment_questions=[
        "Is the coverage score based on a percentage of 'yes' answers?",
        "Does the score ensure the summary's accuracy with the source?",
        "Does a higher score mean a more comprehensive summary?"
    ]
)


# Enhancement

Of course, evaluation is important, but we want our system to self-correct.  

+ Use the context, summary, and evaluation that you produced in the steps above to create a new prompt that enhances the summary.
+ Evaluate the new summary using the same function.
+ Report your results. Did you get a better output? Why? Do you think these controls are enough?

Please, do not forget to add your comments.


# Submission Information

🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.

## Submission Parameters

- The Submission Due Date is indicated in the [readme](../README.md#schedule) file.
- The branch name for your repo should be: assignment-1
- What to submit for this assignment:
    + This Jupyter Notebook (assignment_1.ipynb) should be populated and should be the only change in your pull request.
- What the pull request link should look like for this assignment: `https://github.com/<your_github_username>/production/pull/<pr_id>`
    + Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.

## Checklist

+ Created a branch with the correct naming convention.
+ Ensured that the repository is public.
+ Reviewed the PR description guidelines and adhered to them.
+ Verify that the link is accessible in a private browser window.

If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.
