# [Stocks] Earnings Surprise Analysis for Amazon (AMZN)

## Calculate the median 2-day percentage change in stock prices following positive earnings surprises days.

Steps:

1. Load earnings data from CSV (ha1_Amazon.csv) containing earnings dates, EPS estimates, and actual EPS
2. Download complete historical price data using yfinance
3. Calculate 2-day percentage changes for all historical dates: for each sequence of 3 consecutive trading days (Day 1, Day 2, Day 3), compute the return as Close_Day3 / Close_Day1 - 1. (Assume Day 2 may correspond to the earnings announcement.)
4. Identify positive earnings surprises (where "actual EPS > estimated EPS" OR "Surprise (%)>0")
5. Calculate 2-day percentage changes following positive earnings surprises
6. Compare the median 2-day percentage change for positive surprises vs. all historical dates

Context: Earnings announcements, especially when they exceed analyst expectations, can significantly impact stock prices in the short term.

Reference: Yahoo Finance earnings calendar - https://finance.yahoo.com/calendar/earnings?symbol=AMZN

Additional: Is there a correlation between the magnitude of the earnings surprise and the stock price reaction? Does the market react differently to earnings surprises during bull vs. bear markets?)

In [42]:
import yfinance as yf
import pandas_datareader as pdr
import pandas as pd
import time
import datetime
from datetime import date
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.window import Window

In [43]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
.master("local[1]") \
.appName("SparkByExamples.com") \
.getOrCreate()

In [54]:
spark.sparkContext.setLogLevel("ERROR")

## 1. Getting AMZN expected and actual EPS

Welcome to Alpha Vantage! Your API key is: LVFCF3SBA9N86Z9P. Please record this API key at a safe place for future data access.

In [None]:
pip install alpha-vantage

In [40]:
import requests
import pandas as pd

api_key = 'LVFCF3SBA9N86Z9P'
symbol = "AMZN"
url = f"https://www.alphavantage.co/query?function=EARNINGS&symbol={symbol}&apikey={api_key}"

response = requests.get(url)
data = response.json()

# Extract quarterly earnings
quarterly = data.get("quarterlyEarnings", [])

df = pd.DataFrame(quarterly)

# Convert pandas DataFrame to Spark DataFrame
amzn_eps_spark = spark.createDataFrame(df)

# Select only the 3 columns in Spark DataFrame
amzn_eps_selected = amzn_eps_spark.select("reportedDate", "reportedEPS", "estimatedEPS") \
    .withColumn("surprise", F.round((F.col("reportedEPS") / F.col("estimatedEPS")) - 1, 5))

In [55]:
amzn_eps_selected.show()

+------------+-----------+------------+--------+
|reportedDate|reportedEPS|estimatedEPS|surprise|
+------------+-----------+------------+--------+
|  2025-05-01|       1.59|        1.36| 0.16912|
|  2025-02-06|       1.86|      1.4837| 0.25362|
|  2024-10-31|       1.43|        1.14| 0.25439|
|  2024-08-01|       1.26|        1.03|  0.2233|
|  2024-04-30|       0.98|        0.82| 0.19512|
|  2024-02-01|          1|         0.8|    0.25|
|  2023-10-26|       0.94|        0.58| 0.62069|
|  2023-08-03|       0.65|        0.35| 0.85714|
|  2023-04-27|       0.31|        0.21| 0.47619|
|  2023-02-02|       0.25|        0.18| 0.38889|
|  2022-10-27|       0.17|        0.22|-0.22727|
|  2022-07-28|       0.18|        0.14| 0.28571|
|  2022-04-28|       0.37|        0.42|-0.11905|
|  2022-02-03|       0.29|        0.18| 0.61111|
|  2021-10-28|       0.31|        0.45|-0.31111|
|  2021-07-29|       0.76|        0.62| 0.22581|
|  2021-04-29|       0.79|        0.48| 0.64583|
|  2021-02-02|      

## 2. Download historical price using yfinance

In [56]:
amzn = yf.download("AMZN", start="1997-05-15")[['Close']].reset_index()
amzn.columns = ['Date', 'Close']
amzn = spark.createDataFrame(amzn)


[*********************100%***********************]  1 of 1 completed


## 3. Calculate 2-day percentage changes for all historical dates: for each sequence of 3 consecutive trading days (Day 1, Day 2, Day 3), compute the return as Close_Day3 / Close_Day1 - 1. (Assume Day 2 may correspond to the earnings announcement.)

In [65]:
w = Window.orderBy("Date").rowsBetween(0, 2)

# Collect Close prices over the 3-day window into an array
amzn_with_window = amzn.withColumn("close_window", F.collect_list("Close").over(w))
print(amzn_with_window.select("close_window").collect()[0])
# Filter only rows where the window has exactly 3 closing prices
amzn_filtered = amzn_with_window.filter(F.size("close_window") == 3)

# Calculate 2-day return: Close_Day3 / Close_Day1 - 1
amzn_result = amzn_filtered.withColumn(
    "two_day_return",
    (F.col("close_window")[2] / F.col("close_window")[0]) - 1
).select("Date", "two_day_return")

amzn_result.show(truncate=False)

Row(close_window=[0.09791699796915054, 0.0864579975605011, 0.0854170024394989])
+-------------------+---------------------+
|Date               |two_day_return       |
+-------------------+---------------------+
|1997-05-15 00:00:00|-0.12765909687702903 |
|1997-05-16 00:00:00|-0.054211252550370626|
|1997-05-19 00:00:00|-0.16463936075229257 |
|1997-05-20 00:00:00|-0.14649446138073408 |
|1997-05-21 00:00:00|0.05109736146247479  |
|1997-05-22 00:00:00|0.13432769059765626  |
|1997-05-23 00:00:00|0.020839968489012373 |
|1997-05-27 00:00:00|-0.0493514022482463  |
|1997-05-28 00:00:00|-0.020414530320416957|
|1997-05-29 00:00:00|0.0034679948750775402|
|1997-05-30 00:00:00|-0.013893345439698446|
|1997-06-02 00:00:00|-0.062075478157272435|
|1997-06-03 00:00:00|0.04225367010948     |
|1997-06-04 00:00:00|0.16913027855977103  |
|1997-06-05 00:00:00|0.09459936080990494  |
|1997-06-06 00:00:00|-0.044026917564262136|
|1997-06-09 00:00:00|-0.08642373108998525 |
|1997-06-10 00:00:00|0.01314945064356276

## 4. Identify positive earnings surprises (where "actual EPS > estimated EPS" OR "Surprise (%)>0")

In [66]:
positive_surprises = amzn_eps_selected.filter(
     (F.col("surprise") > 0)
).select("reportedDate", "reportedEPS", "estimatedEPS", "surprise")
positive_surprises.show()

+------------+-----------+------------+--------+
|reportedDate|reportedEPS|estimatedEPS|surprise|
+------------+-----------+------------+--------+
|  2025-05-01|       1.59|        1.36| 0.16912|
|  2025-02-06|       1.86|      1.4837| 0.25362|
|  2024-10-31|       1.43|        1.14| 0.25439|
|  2024-08-01|       1.26|        1.03|  0.2233|
|  2024-04-30|       0.98|        0.82| 0.19512|
|  2024-02-01|          1|         0.8|    0.25|
|  2023-10-26|       0.94|        0.58| 0.62069|
|  2023-08-03|       0.65|        0.35| 0.85714|
|  2023-04-27|       0.31|        0.21| 0.47619|
|  2023-02-02|       0.25|        0.18| 0.38889|
|  2022-07-28|       0.18|        0.14| 0.28571|
|  2022-02-03|       0.29|        0.18| 0.61111|
|  2021-07-29|       0.76|        0.62| 0.22581|
|  2021-04-29|       0.79|        0.48| 0.64583|
|  2021-02-02|        0.7|        0.36| 0.94444|
|  2020-10-29|       0.62|        0.37| 0.67568|
|  2020-07-30|       0.52|        0.07| 6.42857|
|  2020-01-30|      

## 5. Calculate 2-day percentage changes following positive earnings surprises

In [74]:
# Rename columns for clarity before join
price_df = amzn.select(
    F.col("Date").alias("tradeDate"),
    F.col("Close").alias("tradeClose")
)

# Join positive surprises with price_df on announcement date = tradeDate (Day 1)
pos_eps_with_close = positive_surprises.join(
    price_df,
    positive_surprises.reportedDate == price_df.tradeDate,
    how='inner'
).select(
    positive_surprises["*"],
    price_df["tradeClose"].alias("Close_Day1")
)
pos_eps_with_close.show(10)
# For each reportedDate, get Close prices for the next 3 trading days using a window on tradeDate
w = Window.orderBy("tradeDate")

# Add row numbers to the stock prices dataframe to easily find Day3 (2 days after Day1)
price_with_row = price_df.withColumn("row_num", F.row_number().over(w))

# Join again to get row_num for Day1
pos_eps_with_rownum = pos_eps_with_close.join(
    price_with_row.withColumnRenamed("tradeDate", "tradeDate_Day1").withColumnRenamed("row_num", "row_num_Day1"),
    pos_eps_with_close.reportedDate == F.col("tradeDate_Day1"),
    how="inner"
)
pos_eps_with_rownum.show(10)
# Get Close for Day3 by joining price_with_row on row_num = row_num_Day1 + 2
day3_close = price_with_row.withColumnRenamed("tradeDate", "tradeDate_Day3").withColumnRenamed("tradeClose", "Close_Day3")

pos_eps_with_day3 = pos_eps_with_rownum.join(
    day3_close,
    day3_close.row_num == pos_eps_with_rownum.row_num_Day1 + 2,
    how="inner"
)
pos_eps_with_day3.show(10)
# Calculate 2-day return following positive surprise announcement
pos_eps_with_return = pos_eps_with_day3.withColumn(
    "two_day_return",
    (F.col("Close_Day3") / F.col("Close_Day1")) - 1
).select(
    "reportedDate", "reportedEPS", "estimatedEPS", "surprise", "two_day_return"
)
pos_eps_with_return.show(10)

+------------+-----------+------------+--------+------------------+
|reportedDate|reportedEPS|estimatedEPS|surprise|        Close_Day1|
+------------+-----------+------------+--------+------------------+
|  2019-04-25|       0.35|        0.24| 0.45833| 95.11250305175781|
|  2006-10-24|       0.05|        0.03| 0.66667|  1.68149995803833|
|  2020-01-30|       0.32|         0.2|     0.6| 93.53399658203125|
|  2002-10-24|    -0.0924|       -0.04|    1.31|0.9929999709129333|
|  2016-04-28|       0.05|        0.03| 0.66667|30.100000381469727|
|  2014-10-23|      -0.05|       -0.04|    0.25|15.659000396728516|
|  2024-02-01|          1|         0.8|    0.25|159.27999877929688|
|  2024-08-01|       1.26|        1.03|  0.2233|184.07000732421875|
|  2023-02-02|       0.25|        0.18| 0.38889|112.91000366210938|
|  2022-02-03|       0.29|        0.18| 0.61111| 138.8455047607422|
+------------+-----------+------------+--------+------------------+
only showing top 10 rows

+------------+--------

## 6.Compare median 2-day returns for positive surprises vs all historical dates

In [52]:
# Create window of 3 consecutive days ordered by date
w3 = Window.orderBy("Date").rowsBetween(0, 2)

# Collect Close prices over the 3-day window
all_with_window = amzn.withColumn("close_window", F.collect_list("Close").over(w3))

# Filter full windows of length 3
all_full = all_with_window.filter(F.size("close_window") == 3)

# Calculate 2-day return
all_returns = all_full.withColumn(
    "two_day_return",
    (F.col("close_window")[2] / F.col("close_window")[0]) - 1
).select("Date", "two_day_return")

In [76]:
median_pos_surprise = pos_eps_with_return.approxQuantile("two_day_return", [0.5], 0.01)[0]
median_all = all_returns.approxQuantile("two_day_return", [0.5], 0.01)[0]

print(f"Median 2-day return following positive surprise: {median_pos_surprise:.4f}")
print(f"Median 2-day return for all historical dates: {median_all:.4f}")

Median 2-day return following positive surprise: 0.0190
Median 2-day return for all historical dates: 0.0015
