# Summarization of financial data using a Large Language Model (LLM)

This notebook aims to provide an introduction to documenting an LLM using the ValidMind Developer Framework. The use case presented is a summarization of financial news (https://huggingface.co/datasets/cnn_dailymail).

- Initializing the ValidMind Developer Framework
- Running a test various tests to quickly generate documentation about the data and model

## Before you begin

To use the ValidMind Developer Framework with a Jupyter notebook, you need to install and initialize the client library first, along with getting your Python environment ready.

If you don't already have one, you should also [create a documentation project](https://docs.validmind.ai/guide/create-your-first-documentation-project.html) on the ValidMind platform. You will use this project to upload your documentation and test results.

## Install the client library

In [1]:
%pip install -q validmind

Note: you may need to restart the kernel to use updated packages.


## Initialize the client library

In a browser, go to the **Client Integration** page of your documentation project and click **Copy to clipboard** next to the code snippet. This code snippet gives you the API key, API secret, and project identifier to link your notebook to your documentation project.

::: {.column-margin}
::: {.callout-tip}
This step requires a documentation project. [Learn how you can create one](https://docs.validmind.ai/guide/create-your-first-documentation-project.html).
:::
:::

Next, replace this placeholder with your own code snippet:

In [2]:
import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "2494c3838f48efe590d531bfe225d90b",
  api_secret = "4f692f8161f128414fef542cab2a4e74834c75d01b3a8e088a1834f2afcfe838",
  project = "clmotr3oa000aesy6c54wqb6g"
)

2023-10-05 21:29:45,065 - INFO(validmind.api_client): Connected to ValidMind. Project: [12] Credit Risk Scorecard - Initial Validation (clmotr3oa000aesy6c54wqb6g)


## Helper functions

Let's define the following functions to help visualize datasets with long text fields.

In [3]:
import textwrap

from IPython.display import display, HTML
from tabulate import tabulate

def _format_cell_text(text, width=50):  
    """Private function to format a cell's text."""
    return '\n'.join([textwrap.fill(line, width=width) for line in text.split('\n')])

def _format_dataframe_for_tabulate(df):
    """Private function to format the entire DataFrame for tabulation."""
    df_out = df.copy()
    
    # Format all string columns
    for column in df_out.columns:
        if df_out[column].dtype == object:  # Check if column is of type object (likely strings)
            df_out[column] = df_out[column].apply(_format_cell_text)
    return df_out

def _dataframe_to_html_table(df):
    """Private function to convert a DataFrame to an HTML table."""
    headers = df.columns.tolist()
    table_data = df.values.tolist()
    return tabulate(table_data, headers=headers, tablefmt="html")

def display_formatted_dataframe(df, num_rows=None):
    """Primary function to format and display a DataFrame."""
    if num_rows is not None:
        df = df.head(num_rows)
    formatted_df = _format_dataframe_for_tabulate(df)
    html_table = _dataframe_to_html_table(formatted_df)
    display(HTML(html_table))

## Load data

### CNN dataset

The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

In [4]:
from datasets import load_dataset

cnn_dataset = load_dataset('cnn_dailymail', '3.0.0')

train_df = cnn_dataset.data['train'].to_pandas()
test_df = cnn_dataset.data['test'].to_pandas()
val_df = cnn_dataset.data['validation'].to_pandas()

train_df = train_df[['article','highlights']]
test_df = test_df[['article','highlights']]

train_df = train_df.head(5)
test_df = test_df.head(5)

In [5]:
display_formatted_dataframe(train_df, num_rows=2)

article,highlights
"LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in ""Harry Potter and the Order of the Phoenix"" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. ""I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar,"" he told an Australian interviewer earlier this month. ""I don't think I'll be particularly extravagant. ""The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs."" At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film ""Hostel: Part II,"" currently six places below his number one movie on the UK box office chart. Details of how he'll mark his landmark birthday are under wraps. His agent and publicist had no comment on his plans. ""I'll definitely have some sort of party,"" he said in an interview. ""Hopefully none of you will be reading about it."" Radcliffe's earnings from the first five Potter films have been held in a trust fund which he has not been able to touch. Despite his growing fame and riches, the actor says he is keeping his feet firmly on the ground. ""People are always looking to say 'kid star goes off the rails,'"" he told reporters last month. ""But I try very hard not to go that way because it would be too easy for them."" His latest outing as the boy wizard in ""Harry Potter and the Order of the Phoenix"" is breaking records on both sides of the Atlantic and he will reprise the role in the last two films. Watch I-Reporter give her review of Potter's latest » . There is life beyond Potter, however. The Londoner has filmed a TV movie called ""My Boy Jack,"" about author Rudyard Kipling and his son, due for release later this year. He will also appear in ""December Boys,"" an Australian film about four boys who escape an orphanage. Earlier this year, he made his stage debut playing a tortured teenager in Peter Shaffer's ""Equus."" Meanwhile, he is braced for even closer media scrutiny now that he's legally an adult: ""I just think I'm going to be more sort of fair game,"" he told Reuters. E-mail to a friend . Copyright 2007 Reuters. All rights reserved.This material may not be published, broadcast, rewritten, or redistributed.",Harry Potter star Daniel Radcliffe gets £20M fortune as he turns 18 Monday . Young actor says he has no plans to fritter his cash away . Radcliffe's earnings from first five Potter films have been held in trust fund .
"Editor's note: In our Behind the Scenes series, CNN correspondents share their experiences in covering news and analyze the stories behind the events. Here, Soledad O'Brien takes users inside a jail where many of the inmates are mentally ill. An inmate housed on the ""forgotten floor,"" where many mentally ill inmates are housed in Miami before trial. MIAMI, Florida (CNN) -- The ninth floor of the Miami-Dade pretrial detention facility is dubbed the ""forgotten floor."" Here, inmates with the most severe mental illnesses are incarcerated until they're ready to appear in court. Most often, they face drug charges or charges of assaulting an officer --charges that Judge Steven Leifman says are usually ""avoidable felonies."" He says the arrests often result from confrontations with police. Mentally ill people often won't do what they're told when police arrive on the scene -- confrontation seems to exacerbate their illness and they become more paranoid, delusional, and less likely to follow directions, according to Leifman. So, they end up on the ninth floor severely mentally disturbed, but not getting any real help because they're in jail. We toured the jail with Leifman. He is well known in Miami as an advocate for justice and the mentally ill. Even though we were not exactly welcomed with open arms by the guards, we were given permission to shoot videotape and tour the floor. Go inside the 'forgotten floor' » . At first, it's hard to determine where the people are. The prisoners are wearing sleeveless robes. Imagine cutting holes for arms and feet in a heavy wool sleeping bag -- that's kind of what they look like. They're designed to keep the mentally ill patients from injuring themselves. That's also why they have no shoes, laces or mattresses. Leifman says about one-third of all people in Miami-Dade county jails are mentally ill. So, he says, the sheer volume is overwhelming the system, and the result is what we see on the ninth floor. Of course, it is a jail, so it's not supposed to be warm and comforting, but the lights glare, the cells are tiny and it's loud. We see two, sometimes three men -- sometimes in the robes, sometimes naked, lying or sitting in their cells. ""I am the son of the president. You need to get me out of here!"" one man shouts at me. He is absolutely serious, convinced that help is on the way -- if only he could reach the White House. Leifman tells me that these prisoner-patients will often circulate through the system, occasionally stabilizing in a mental hospital, only to return to jail to face their charges. It's brutally unjust, in his mind, and he has become a strong advocate for changing things in Miami. Over a meal later, we talk about how things got this way for mental patients. Leifman says 200 years ago people were considered ""lunatics"" and they were locked up in jails even if they had no charges against them. They were just considered unfit to be in society. Over the years, he says, there was some public outcry, and the mentally ill were moved out of jails and into hospitals. But Leifman says many of these mental hospitals were so horrible they were shut down. Where did the patients go? Nowhere. The streets. They became, in many cases, the homeless, he says. They never got treatment. Leifman says in 1955 there were more than half a million people in state mental hospitals, and today that number has been reduced 90 percent, and 40,000 to 50,000 people are in mental hospitals. The judge says he's working to change this. Starting in 2008, many inmates who would otherwise have been brought to the ""forgotten floor"" will instead be sent to a new mental health facility -- the first step on a journey toward long-term treatment, not just punishment. Leifman says it's not the complete answer, but it's a start. Leifman says the best part is that it's a win-win solution. The patients win, the families are relieved, and the state saves money by simply not cycling these prisoners through again and again. And, for Leifman, justice is served. E-mail to a friend .","Mentally ill inmates in Miami are housed on the ""forgotten floor"" Judge Steven Leifman says most are there as a result of ""avoidable felonies"" While CNN tours facility, patient shouts: ""I am the son of the president"" Leifman says the system is unjust and he's fighting for change ."


In [6]:
display_formatted_dataframe(test_df, num_rows=2)

article,highlights
"(CNN)The Palestinian Authority officially became the 123rd member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories. The formal accession was marked with a ceremony at The Hague, in the Netherlands, where the court is based. The Palestinians signed the ICC's founding Rome Statute in January, when they also accepted its jurisdiction over alleged crimes committed ""in the occupied Palestinian territory, including East Jerusalem, since June 13, 2014."" Later that month, the ICC opened a preliminary examination into the situation in Palestinian territories, paving the way for possible war crimes investigations against Israelis. As members of the court, Palestinians may be subject to counter-charges as well. Israel and the United States, neither of which is an ICC member, opposed the Palestinians' efforts to join the body. But Palestinian Foreign Minister Riad al-Malki, speaking at Wednesday's ceremony, said it was a move toward greater justice. ""As Palestine formally becomes a State Party to the Rome Statute today, the world is also a step closer to ending a long era of impunity and injustice,"" he said, according to an ICC news release. ""Indeed, today brings us closer to our shared goals of justice and peace."" Judge Kuniko Ozaki, a vice president of the ICC, said acceding to the treaty was just the first step for the Palestinians. ""As the Rome Statute today enters into force for the State of Palestine, Palestine acquires all the rights as well as responsibilities that come with being a State Party to the Statute. These are substantive commitments, which cannot be taken lightly,"" she said. Rights group Human Rights Watch welcomed the development. ""Governments seeking to penalize Palestine for joining the ICC should immediately end their pressure, and countries that support universal acceptance of the court's treaty should speak out to welcome its membership,"" said Balkees Jarrah, international justice counsel for the group. ""What's objectionable is the attempts to undermine international justice, not Palestine's decision to join a treaty to which over 100 countries around the world are members."" In January, when the preliminary ICC examination was opened, Israeli Prime Minister Benjamin Netanyahu described it as an outrage, saying the court was overstepping its boundaries. The United States also said it ""strongly"" disagreed with the court's decision. ""As we have said repeatedly, we do not believe that Palestine is a state and therefore we do not believe that it is eligible to join the ICC,"" the State Department said in a statement. It urged the warring sides to resolve their differences through direct negotiations. ""We will continue to oppose actions against Israel at the ICC as counterproductive to the cause of peace,"" it said. But the ICC begs to differ with the definition of a state for its purposes and refers to the territories as ""Palestine."" While a preliminary examination is not a formal investigation, it allows the court to review evidence and determine whether to investigate suspects on both sides. Prosecutor Fatou Bensouda said her office would ""conduct its analysis in full independence and impartiality."" The war between Israel and Hamas militants in Gaza last summer left more than 2,000 people dead. The inquiry will include alleged war crimes committed since June. The International Criminal Court was set up in 2002 to prosecute genocide, crimes against humanity and war crimes. CNN's Vasco Cotovio, Kareem Khadder and Faith Karimi contributed to this report.","Membership gives the ICC jurisdiction over alleged crimes committed in Palestinian territories since last June . Israel and the United States opposed the move, which could open the door to war crimes investigations against Israelis ."
"(CNN)Never mind cats having nine lives. A stray pooch in Washington State has used up at least three of her own after being hit by a car, apparently whacked on the head with a hammer in a misguided mercy killing and then buried in a field -- only to survive. That's according to Washington State University, where the dog -- a friendly white-and-black bully breed mix now named Theia -- has been receiving care at the Veterinary Teaching Hospital. Four days after her apparent death, the dog managed to stagger to a nearby farm, dirt- covered and emaciated, where she was found by a worker who took her to a vet for help. She was taken in by Moses Lake, Washington, resident Sara Mellado. ""Considering everything that she's been through, she's incredibly gentle and loving,"" Mellado said, according to WSU News. ""She's a true miracle dog and she deserves a good life."" Theia is only one year old but the dog's brush with death did not leave her unscathed. She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity -- and still requires surgery to help her breathe. The veterinary hospital's Good Samaritan Fund committee awarded some money to help pay for the dog's treatment, but Mellado has set up a fundraising page to help meet the remaining cost of the dog's care. She's also created a Facebook page to keep supporters updated. Donors have already surpassed the $10,000 target, inspired by Theia's tale of survival against the odds. On the fundraising page, Mellado writes, ""She is in desperate need of extensive medical procedures to fix her nasal damage and reset her jaw. I agreed to foster her until she finally found a loving home."" She is dedicated to making sure Theia gets the medical attention she needs, Mellado adds, and wants to ""make sure she gets placed in a family where this will never happen to her again!"" Any additional funds raised will be ""paid forward"" to help other animals. Theia is not the only animal to apparently rise from the grave in recent weeks. A cat in Tampa, Florida, found seemingly dead after he was hit by a car in January, showed up alive in a neighbor's yard five days after he was buried by his owner. The cat was in bad shape, with maggots covering open wounds on his body and a ruined left eye, but remarkably survived with the help of treatment from the Humane Society.","Theia, a bully breed mix, was apparently hit by a car, whacked with a hammer and buried in a field . ""She's a true miracle dog and she deserves a good life,"" says Sara Mellado, who is looking for a home for Theia ."


## Model selection

In [7]:
from validmind.models import FoundationModel, Prompt

In [8]:
import os

import dotenv
dotenv.load_dotenv()

if os.getenv("OPENAI_API_KEY") is None:
    raise Exception("OPENAI_API_KEY not found")

In [9]:
import openai

def call_model(prompt):
    return openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "user", "content": prompt},
        ]
    ).choices[0].message["content"]

## Prompt engineering

In [10]:
prompt_template = """
You are an AI with expertise in summarizing financial news. 
Your task is to provide a concise summary of the specific news article provided below.
Before proceeding, take a moment to understand the context and nuances of the financial terminology used in the article.

Article to Summarize:

```
{article}
```

Please respond with a concise summary of the article's main points.
Ensure that your summary is based on the content of the article and not on external information or assumptions.
""".strip()

prompt_variables = ["article"]

## Initialize ValidMind datasets and models

In [11]:
# In this context 'vm_test_ds' is the model evaluation dataset. It is used for predictions
vm_test_ds = vm.init_dataset(
    dataset=test_df,
    text_column="article",
    target_column="highlights",
)

vm_model = FoundationModel(
    predict_fn=call_model,
    prompt=Prompt(
        template=prompt_template,
        variables=prompt_variables,
    ),
    test_ds=vm_test_ds,
)

2023-10-05 21:29:47,970 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2023-10-05 21:29:47,980 - INFO(validmind.models.foundation): Running predict() for `test_ds`... This may take a while


In [12]:
vm.test_plans.describe_plan("summarization_metrics")

ID,Name,Description,Tests
summarization_metrics,SummarizationMetrics,Test plan for Summarization metrics,RougeMetrics (Metric) TokenDisparity (Metric) BleuScore (Metric) BertScore (Metric) ContextualRecall (Metric) SummarizationPredictions (Metric)


In [13]:
params = {
    "display_limit": 1,
    "text_columns": None
}

metric = vm.tests.run_test("validmind.model_validation.DisplayTextDataset", 
                  dataset=vm_test_ds, 
                  params=params)

VBox(children=(HTML(value='<p>Detailed description coming soon...!</p>'), HTML(value='<style type="text/css">\…

In [14]:
params = {
    "display_limit": 1
}

metric = vm.tests.run_test("validmind.model_validation.SummarizationPredictions", 
                  model=vm_model, 
                  params=params)

VBox(children=(HTML(value='<p>Detailed description coming soon...!</p>'), HTML(value='<style type="text/css">\…

In [None]:
#config={
#    "rouge_metric": {
#        "rouge_metrics": ["rouge-1","rouge-2", "rouge-l"],
#    },
#}
#summarization_metrics = vm.run_test_plan("summarization_metrics", 
#                                             model=vm_model,
#                                             config=config)