<a href="https://colab.research.google.com/github/volynsal/Earnings-Call-Summarization/blob/main/Earnings_Call_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install openai -q

[?25l[K     |███████▎                        | 10 kB 9.5 MB/s eta 0:00:01[K     |██████████████▋                 | 20 kB 14.0 MB/s eta 0:00:01[K     |██████████████████████          | 30 kB 8.1 MB/s eta 0:00:01[K     |█████████████████████████████▏  | 40 kB 4.6 MB/s eta 0:00:01[K     |████████████████████████████████| 44 kB 1.9 MB/s 
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
[K     |████████████████████████████████| 147 kB 9.4 MB/s 
[?25h  Building wheel for openai (PEP 517) ... [?25l[?25hdone


Once openai package is installed, we need to import it.

In [None]:
import openai

Now that we have an API key, we need to provide it to the OpenAI Python package. Since I want to input my API key without showing it here, I am going to import getpass and prompt myself for the API Key:

In [None]:
from getpass import getpass
openai.api_key = getpass()

··········


First, let's read the earnings call text into a variable. We can use the requests library to fetch the raw text of the NVIDIA earnings call:

In [None]:
import requests

url = "https://pastebin.com/raw/PbTrYEPg"
response = requests.get(url)
transcript = response.text

transcript

"Apple Inc. (NASDAQ:AAPL) Q4 2022 Earnings Conference Call October 27, 2022 5:00 PM ET\r\n\r\nCompany Participants\r\n\r\nTejas Gala - Director-Investor Relations & Corporate Finance\r\n\r\nTim Cook - Chief Executive Officer\r\n\r\nLuca Maestri - Chief Financial Officer\r\n\r\nConference Call Participants\r\n\r\nShannon Cross - Credit Suisse\r\n\r\nErik Woodring - Morgan Stanley\r\n\r\nBen Bollin - Cleveland Research\r\n\r\nKyle McNealy - Jefferies\r\n\r\nJim Suva - Citigroup\r\n\r\nAmit Daryanani - Evercore\r\n\r\nHarsh Kumar - Piper Sandler\r\n\r\nKrish Sankar - Cowen & Company\r\n\r\nOperator\r\n\r\nGood day, and welcome to the Apple Q4 Fiscal Year 2022 Earnings Conference Call. For your information, today's call is being recorded.\r\n\r\nAt this time, for opening remarks and introductions, I would like to turn the call over to Tejas Gala, Director of Investor Relations and Corporate Finance. Please go ahead.\r\n\r\nTejas Gala\r\n\r\nSpeaking first today is Apple's CEO, Tim Cook; an

Now that we have the earnings text in a data variable, let's make this into a summarization prompt. To do this, we can simply append "tldr;" and some carriage returns to the data:

In [None]:
prompt = f"{transcript}\n\ntl;dr:"

prompt

"Operator\n\nGood afternoon. My name is Emma, and I will be your conference operator today. At this time, I would like to welcome everyone to the NVIDIA's third quarter earnings call. [Operator instructions] Simona Jankowski, you may begin your conference.\n\nSimona Jankowski -- Vice President, Investor Relations\n\nThank you. Good afternoon, everyone, and welcome to NVIDIA's conference call for the third quarter of fiscal 2023. With me today from NVIDIA are Jen-Hsun Huang, president and chief executive officer; and Colette Kress, executive vice president and chief financial officer. I'd like to remind you that our call is being webcast live on NVIDIA's investor relations website.\n\n\nThe webcast will be available for replay until the conference call to discuss our financial results for the fourth quarter and fiscal 2023. The content of today's call is NVIDIA's property. It can't be reproduced or transcribed without our prior written consent. During this call, we may make forward-look

Now that we have a prompt, let's call OpenAI using this prompt:

In [None]:
response = openai.Completion.create(
    engine="text-davinci-003", 
    prompt=prompt,
    temperature=0.3, # The temperature controls the randomness of the response, represented as a range from 0 to 1. A lower value of temperature means the API will respond with the first thing that the model sees; a higher value means the model evaluates possible responses that could fit into the context before spitting out the result.
    max_tokens=140,
    top_p=1, # Top P controls how many random results the model should consider for completion, as suggested by the temperature dial, thus determining the scope of randomness. Top P’s range is from 0 to 1. A lower value limits creativity, while a higher value expands its horizons.
    frequency_penalty=0,
    presence_penalty=1
)

Notice that we receive an error. This is because the API can not accept the entire document at once: there is a token limit for the input. To get around this, let's process our data. We'll split up the document into smaller chunks, generate a prompt for each chunk, then ask OpenAI to summarize each chunk. We'll then collate / concatenate the responses into a single summary at the end.

There are many ways to slice and dice text. Let's try first splitting up our text into a list of words, then dividing that list of words into equal chunks using numpy.

In [None]:
import numpy as np

words = transcript.split(" ")
chunks = np.array_split(words, 6)

chunks

[array(['Apple', 'Inc.', '(NASDAQ:AAPL)', ..., 'Yoga', 'for', 'Every'],
       dtype='<U32'),
 array(['Runner', 'featuring', 'and', ..., 'year-over-year,', 'driven',
        'by'], dtype='<U32'),
 array(['the', 'launch', 'of', ..., 'will', 'be', 'until'], dtype='<U32'),
 array(['we', 'can', 'satisfy', ..., 'been', 'fairly', 'stable,'],
       dtype='<U32'),
 array(['and', 'I', 'think', ..., 'please.\r\n\r\nOperator\r\n\r\nYes,',
        'sir.', "We'll"], dtype='<U32'),
 array(['now', 'move', 'on', ..., 'appreciate', 'your', 'participation.'],
       dtype='<U32')]

Let's now loop through all of the chunks and save the responses in a list called summary_responses. At the end we'll join all of the responses into a single summary and check out the output.

In [None]:
summary_responses = []

for chunk in chunks:
    
    sentences = ' '.join(list(chunk))

    prompt = f"{sentences}\n\ntl;dr:"

    response = openai.Completion.create(
        engine="text-davinci-003", 
        prompt=prompt,
        temperature=0.3, # The temperature controls the randomness of the response, represented as a range from 0 to 1. A lower value of temperature means the API will respond with the first thing that the model sees; a higher value means the model evaluates possible responses that could fit into the context before spitting out the result.
        max_tokens=150,
        top_p=1, # Top P controls how many random results the model should consider for completion, as suggested by the temperature dial, thus determining the scope of randomness. Top P’s range is from 0 to 1. A lower value limits creativity, while a higher value expands its horizons.
        frequency_penalty=0,
        presence_penalty=1
    )

    response_text = response["choices"][0]["text"]
    summary_responses.append(response_text)

full_summary = "".join(summary_responses)

print("full summary\n")
print(full_summary)

full summary



Apple Inc. reported record revenue of $90.1 billion for the September quarter, despite foreign currency headwinds. The company set records for iPhone, Mac, Wearables, Home and Accessories and Services while growing double digits in emerging markets and setting records in the vast majority of markets it tracks. Apple achieved revenue of $394 billion for fiscal 2022, representing 8% annual growth. The company released its iPhone 14 lineup, tenth generation iPad, M2-powered MacBook Air and MacBook Pro, and second generation AirPods Pro. Apple also released iOS 16, watchOS 9, iPadOS 16 and macOS Ventura. Finally, Apple TV+ hit stores next week and earned 9 Emmys at the 74th Primetime Emmy Awards. Apple reported record financial results for the September quarter, with new September quarter records in the Americas, Europe, Greater China and rest of Asia Pacific. Products revenue was $71 billion, up 9% over last year despite FX headwinds and a record for the September quarter.