# Earnings Call Transcript Retrieval & Processing

This document describes the workflow for retrieving, consolidating, and downloading earnings call transcripts using API Ninja in a structured format.

---

## 1. Fetching Earnings Call Transcripts
### Description
- Retrieves earnings call transcripts using API Ninja.
- Requests transcripts for multiple companies, years, and quarters dynamically.
- Implements rate-limiting protection to prevent API throttling.
- Ensures secure API key handling using an encrypted input method.
- Progress bar integration provides real-time feedback on data retrieval.

---



In [3]:
import requests
import pandas as pd
import time
import os
from google.colab import files
from tqdm import tqdm  # Import tqdm for progress bar

# Enter API key securely (this prevents exposure of the key)
API_KEY = "your api key here"

# Function to fetch transcript
def fetch_transcript(ticker, year, quarter, api_key):
    """
    Fetches earnings call transcript for a given company ticker, year, and quarter.

    Parameters:
        ticker (str): Stock ticker symbol.
        year (int): Year of the earnings call.
        quarter (int): Quarter number (1-4).
        api_key (str): Your API key for the earnings transcript service.

    Returns:
        dict: Transcript data if successful, None otherwise.
    """
    api_url = f"https://api.api-ninjas.com/v1/earningstranscript?ticker={ticker}&year={year}&quarter={quarter}"
    headers = {"X-Api-Key": api_key}

    response = requests.get(api_url, headers=headers)
    time.sleep(2)  # Avoid API rate limits

    if response.status_code == 200:
        return response.json()
    else:
        return None  # No print, to keep the output clean

# List of top 10 AI-focused tech companies
ai_company_tickers = ["AAPL", "MSFT", "NVDA", "AMZN", "GOOGL", "META", "IBM", "INTC", "ORCL", "TSM"]

# Additional companies from various industries
additional_companies = {
    "Financials": ["JPM", "BAC", "WFC", "C", "GS"],
    "Health Care": ["JNJ", "PFE", "MRK", "ABBV", "TMO"],
    "Consumer Discretionary": ["HD", "NKE", "MCD", "SBUX", "LOW"],
    "Industrials": ["BA", "CAT", "MMM", "GE", "UNP"],
    "Energy": ["XOM", "CVX", "COP", "SLB", "PSX"],
    "Utilities": ["NEE", "DUK", "SO", "D", "AEP"],
    "Communication Services": ["VZ", "T", "CMCSA", "NFLX", "DIS"],
    "Materials": ["LIN", "DOW", "DD", "NEM", "FCX"],
    "Real Estate": ["AMT", "PLD", "CCI", "EQIX", "PSA"]
}

# Combine AI companies with additional companies
company_tickers = {"AI-focused Tech": ai_company_tickers}
company_tickers.update(additional_companies)

# Define the year and quarters
year = 2024
quarters = [1, 2, 3, 4]  # Four quarters for 2024

# Store structured transcript data
transcript_list = []

# Count total requests for progress bar
total_requests = sum(len(tickers) * len(quarters) for tickers in company_tickers.values())

# Progress bar setup
with tqdm(total=total_requests, desc="Fetching Transcripts", unit="call") as pbar:
    for industry, tickers in company_tickers.items():
        for ticker in tickers:
            for quarter in quarters:
                transcript_data = fetch_transcript(ticker, year, quarter, API_KEY)

                if transcript_data and "transcript_split" in transcript_data:
                    for segment in transcript_data["transcript_split"]:
                        transcript_list.append({
                            "Ticker": ticker,
                            "Industry": industry,
                            "Year": year,
                            "Quarter": quarter,
                            "Date": transcript_data.get("date", "Unknown"),
                            "Speaker": segment["speaker"],
                            "Text": segment["text"]
                        })

                pbar.update(1)  # Update progress bar after each API call

# Convert to DataFrame
df_transcripts = pd.DataFrame(transcript_list)




Fetching Transcripts: 100%|██████████| 220/220 [10:10<00:00,  2.77s/call]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

 Transcripts have been saved and downloaded: Earnings_Call_Transcripts_2024.csv


---

## 2. Structuring & Consolidating Transcripts
### Description
- Groups earnings call data by:
  - Ticker
  - Industry
  - Year & Quarter
  - Call Date
- Speaker-attributed format ensures clarity:

---

In [4]:
df_transcripts.head(100)

Unnamed: 0,Ticker,Industry,Year,Quarter,Date,Speaker,Text
0,AAPL,AI-focused Tech,2024,1,2024-02-01,Operator,"Good day, and welcome to the Apple Q1 Fiscal Y..."
1,AAPL,AI-focused Tech,2024,1,2024-02-01,Suhasini Chandramouli,Thank you for joining us. Speaking first today...
2,AAPL,AI-focused Tech,2024,1,2024-02-01,Tim Cook,"Thank you. Suhasini. Good afternoon, everyone,..."
3,AAPL,AI-focused Tech,2024,1,2024-02-01,Luca Maestri,"Thank you, Tim, and good afternoon, everyone. ..."
4,AAPL,AI-focused Tech,2024,1,2024-02-01,Suhasini Chandramouli,"Thank you, Luca. We ask that you limit yoursel..."
...,...,...,...,...,...,...,...
95,AAPL,AI-focused Tech,2024,2,2024-05-02,Suhasini Chandramouli,"Thanks, Ben. Can we have the next question, pl..."
96,AAPL,AI-focused Tech,2024,2,2024-05-02,Operator,Thank you. Our next question is from Krish San...
97,AAPL,AI-focused Tech,2024,2,2024-05-02,Krish Sankar,"Yes, hi. Thanks for taking my question. Again,..."
98,AAPL,AI-focused Tech,2024,2,2024-05-02,Tim Cook,Our focus on enterprise has been and you know ...


- Creates a single transcript text block per earnings call, improving readability and storage efficiency.

---

## 3. Saving & Downloading Consolidated Data
### Description
- Saves structured transcripts into a CSV file.
- Enables automated file download within Google Colab.
- Ensures file is correctly named and stored for easy access.
- Final confirmation message verifies completion.

---

## Usage Flow
1. Run transcript retrieval for selected companies and quarters.
2. Consolidate and structure transcripts into a clean format.
3. Download the final dataset for further analysis.

This pipeline ensures efficient transcript processing, providing clean, structured, and easily accessible financial data for analysis.


In [6]:
# Consolidate text by quarter with structured Speaker: Text format
df_consolidated = df_transcripts.groupby(["Ticker", "Industry", "Year", "Quarter", "Date"]) \
    .apply(lambda x: " ".join(f"[{row['Speaker']}] {row['Text']}" for _, row in x.iterrows())) \
    .reset_index(name="Transcript")

# Display the new structure
df_consolidated.head()

  .apply(lambda x: " ".join(f"[{row['Speaker']}] {row['Text']}" for _, row in x.iterrows())) \


Unnamed: 0,Ticker,Industry,Year,Quarter,Date,Transcript
0,AAPL,AI-focused Tech,2024,1,2024-02-01,"[Operator] Good day, and welcome to the Apple ..."
1,AAPL,AI-focused Tech,2024,2,2024-05-02,"[Suhasini Chandramouli] Good Afternoon, and we..."
2,AAPL,AI-focused Tech,2024,3,2024-08-01,[Suhasini Chandramouli] Good afternoon and wel...
3,AAPL,AI-focused Tech,2024,4,2024-10-31,"[Suhasini Chandramouli] Good afternoon, and we..."
4,ABBV,Health Care,2024,1,2024-04-26,[Operator] Good morning and thank you for stan...


In [7]:
# Define the output file path for the consolidated transcripts
consolidated_output_file = "Consolidated_Earnings_Call_Transcripts_2024.csv"

# Save the consolidated DataFrame to CSV
df_consolidated.to_csv(consolidated_output_file, index=False)

# Download the file to the local machine
files.download(consolidated_output_file)

print(f"Consolidated transcripts have been saved and downloaded: {consolidated_output_file}")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Consolidated transcripts have been saved and downloaded: Consolidated_Earnings_Call_Transcripts_2024.csv
