# Accuracy Analysis

In [95]:
import eikon as ek
import pandas as pd
import numpy as np
import datetime
import plotly
import plotly.express as px
import plotly.graph_objs as go
ek.set_app_key("f47c330480d74c598b7e8ebc2539424e91764dd8")

https://community.developers.refinitiv.com/questions/73493/get-eps-historical-data-for-stocks.html

### Accuracy Variables  

**TR.EPSActValue** - The company's actual value normalized to reflect the I/B/E/S default currency and corporate actions (e.g. stock splits). Earnings Per Share is defined as the EPS that the contributing analyst considers to be that with which to value a security. This figure may include or exclude certain items depending on the contributing analyst's specific model.  

**TR.EPSMean** - The statistical average of all broker estimates determined to be on the majority accounting basis. Earnings Per Share is defined as the EPS that the contributing analyst considers to be that with which to value a security. This figure may include or exclude certain items depending on the contributing analyst's specific model.  

--> this is a analyst forecast variable

**TR.EPSActSurprise** - The difference between the actual and the last mean of the period, expressed as a percentage. Earnings Per Share is defined as the EPS that the contributing analyst considers to be that with which to value a security. This figure may include or exclude certain items depending on the contributing analyst's specific model.  

--> forecast error between actual EPS and TR.EPSMean  


### DataFrames

**df_accuracy** - basic dataframe containing quaterly data for all S&P 500 companies on EPS Actual, EPS Mean (the analyst forcast) and EPS Surprise (the forecast error in %)

**df_accuracy_new** - df_accuracy without extremely high or low values in the EPS Surprise column, so without outliers

**df_averages** - dataframe containing mean values of EPS Actual, EPS Mean and EPS Suprise over the entire time grouped by Instrument, takes df_accuracy_new as basis, so no outliers

**df_accuracy_yearly** - dataframe grouping quaterly datapoints into yearly data points

In [105]:
accuracy_variables = ['TR.EPSactValue.date', 'TR.EPSActValue', "TR.EPSMean", "TR.EPSActSurprise"]
df_accuracy, e = ek.get_data('0#.SPX',accuracy_variables, parameters = {'SDate':'0','EDate':'-40','Period':'FQ0','Frq':'FQ'})
df_accuracy["Date"] = pd.to_datetime(df_accuracy["Date"])
df_accuracy = df_accuracy.dropna()
df_accuracy

Unnamed: 0,Instrument,Date,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
0,POOL.OQ,2023-02-16 07:00:00+00:00,1.82,1.987,-8.405
1,POOL.OQ,2022-10-20 07:00:00+00:00,4.78,4.5875,4.196
2,POOL.OQ,2022-07-21 07:00:00+00:00,7.63,7.517,1.503
3,POOL.OQ,2022-04-21 07:00:00+00:00,4.23,3.14867,34.342
4,POOL.OQ,2022-02-17 07:00:00+00:00,2.63,1.875,40.267
...,...,...,...,...,...
20578,AVY.N,2014-01-31 08:30:00+00:00,0.69,0.68,1.471
20579,AVY.N,2013-10-25 08:30:00+00:00,0.69,0.63833,8.095
20580,AVY.N,2013-07-23 08:30:00+00:00,0.71,0.7025,1.068
20581,AVY.N,2013-04-24 08:30:00+00:00,0.59,0.57571,2.482


In [106]:
df_accuracy.dtypes

Instrument                                           string
Date                                    datetime64[ns, UTC]
Earnings Per Share - Actual                         Float64
Earnings Per Share - Mean                           Float64
Earnings Per Share - Actual Surprise                Float64
dtype: object

### Exploratory Data Analysis of Analyst Forcast Accuracy

### 1) Summary statistics for EPS Actual, EPS Mean (forecast), and EPS Surprise

In [107]:
df_accuracy.describe()

Unnamed: 0,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
count,19927.0,19927.0,19927.0
mean,1.381198,1.287166,191.020225
std,3.106833,2.898835,23393.513652
min,-16.43,-15.985,-8858.503
25%,0.5,0.4625,0.468
50%,0.92,0.8665,4.548
75%,1.6,1.50094,12.259
max,133.441,126.76571,3297926.087


Min and max values seem very high. Next step is to check for outliers and remove them for better results:  

**Removing Outliers:**

In [108]:
summary_stats = df_accuracy["Earnings Per Share - Actual Surprise"].describe()
Q1 = summary_stats.loc['25%']
Q3 = summary_stats.loc['75%']
IQR = Q3 - Q1
threshold = 7 #1.5 standard
surprise_outliers_removed = df_accuracy["Earnings Per Share - Actual Surprise"].loc[~((df_accuracy["Earnings Per Share - Actual Surprise"] < (Q1 - threshold * IQR)) | (df_accuracy["Earnings Per Share - Actual Surprise"] > (Q3 + threshold * IQR)))]
df_accuracy_new = df_accuracy.copy()
df_accuracy_new["Earnings Per Share - Actual Surprise"] = surprise_outliers_removed
df_accuracy_new

Unnamed: 0,Instrument,Date,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
0,POOL.OQ,2023-02-16 07:00:00+00:00,1.82,1.987,-8.405
1,POOL.OQ,2022-10-20 07:00:00+00:00,4.78,4.5875,4.196
2,POOL.OQ,2022-07-21 07:00:00+00:00,7.63,7.517,1.503
3,POOL.OQ,2022-04-21 07:00:00+00:00,4.23,3.14867,34.342
4,POOL.OQ,2022-02-17 07:00:00+00:00,2.63,1.875,40.267
...,...,...,...,...,...
20578,AVY.N,2014-01-31 08:30:00+00:00,0.69,0.68,1.471
20579,AVY.N,2013-10-25 08:30:00+00:00,0.69,0.63833,8.095
20580,AVY.N,2013-07-23 08:30:00+00:00,0.71,0.7025,1.068
20581,AVY.N,2013-04-24 08:30:00+00:00,0.59,0.57571,2.482


In [109]:
na_count = df_accuracy_new["Earnings Per Share - Actual Surprise"].isna().sum()
na_count

798

--> deteceted outliers at the given threshold

In [110]:
#removing NA
df_accuracy_new = df_accuracy_new.dropna()
df_accuracy_new

Unnamed: 0,Instrument,Date,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
0,POOL.OQ,2023-02-16 07:00:00+00:00,1.82,1.987,-8.405
1,POOL.OQ,2022-10-20 07:00:00+00:00,4.78,4.5875,4.196
2,POOL.OQ,2022-07-21 07:00:00+00:00,7.63,7.517,1.503
3,POOL.OQ,2022-04-21 07:00:00+00:00,4.23,3.14867,34.342
4,POOL.OQ,2022-02-17 07:00:00+00:00,2.63,1.875,40.267
...,...,...,...,...,...
20578,AVY.N,2014-01-31 08:30:00+00:00,0.69,0.68,1.471
20579,AVY.N,2013-10-25 08:30:00+00:00,0.69,0.63833,8.095
20580,AVY.N,2013-07-23 08:30:00+00:00,0.71,0.7025,1.068
20581,AVY.N,2013-04-24 08:30:00+00:00,0.59,0.57571,2.482


In [111]:
df_accuracy_new.describe()

Unnamed: 0,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
count,19129.0,19129.0,19129.0
mean,1.423885,1.333085,6.929393
std,3.151161,2.948303,17.084754
min,-16.43,-15.985,-81.69
25%,0.53,0.50207,0.506
50%,0.95,0.89964,4.376
75%,1.63,1.53832,11.482
max,133.441,126.76571,94.748


### 2) Forecast error distribution

**Surprise Distribution - with outliers**

In [119]:
fig = px.histogram(df_accuracy, x="Earnings Per Share - Actual Surprise", nbins=1000, title="EPS Surprise (%) Distribution (with outliers)")
fig.update_layout(yaxis=dict(tickformat=".2%"))
fig.show()

**Surprise Distribution - without outliers**

In [120]:
fig = px.histogram(df_accuracy_new, x="Earnings Per Share - Actual Surprise", nbins=1000, title="EPS Surprise (%) Distribution (outliers removed)")
fig.update_layout(yaxis=dict(tickformat=".2%"))
fig.show()

### 3) Mean EPS Actual, EPS Mean and EPS Surprise over all time periods per Instrument

In [122]:
df_averages = df_accuracy_new.groupby("Instrument").mean()
df_averages


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



Unnamed: 0_level_0,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
Instrument,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A.N,0.77122,0.719961,7.255585
AAL.OQ,0.257,0.187633,6.460925
AAP.N,1.98878,1.966271,1.769024
AAPL.OQ,0.792457,0.736867,7.065512
ABBV.N,1.87625,1.824545,3.27955
...,...,...,...
YUM.N,0.87561,0.84134,5.429
ZBH.N,1.7895,1.727077,4.5081
ZBRA.OQ,2.389756,2.252027,5.404537
ZION.OQ,0.85825,0.774196,9.387575


Summary statistics per instrument

In [123]:
df_averages.describe()

Unnamed: 0,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
count,502.0,502.0,502.0
mean,1.380964,1.292809,6.989945
std,2.420182,2.275438,6.364702
min,-0.291842,-0.307703,-54.155
25%,0.602805,0.577975,3.882537
50%,1.012051,0.954061,6.01172
75%,1.589085,1.492998,9.504716
max,46.97222,44.416074,48.632652


### 4) Number of Instruments per surprise percentile group for each year

In [125]:
# grouping data by year
df_accuracy_yearly, e = ek.get_data('0#.SPX', accuracy_variables, parameters = {'SDate':'0','EDate':'-40','Period':'FY0','Frq':'FY'})
df_accuracy_yearly["Date"] = pd.to_datetime(df_accuracy_yearly["Date"]).dt.year
df_accuracy_yearly = df_accuracy_yearly.dropna()

In [126]:
bins = [-100, -50, -20, -10, -5, 0, 5, 10, 20, 30, 40, 50, 60, 80, 100]

# Group the data by year and calculate the percentile counts for each year
df_percentiles = pd.DataFrame(index=range(df_accuracy_yearly["Date"].min(), df_accuracy_yearly["Date"].max()+1),
                              columns=[f"{bins[i]}-{bins[i+1]}" for i in range(len(bins)-1)])
for year in df_percentiles.index:
    df_year = df_accuracy_yearly[df_accuracy_yearly["Date"] == year]
    percentile_counts = pd.cut(df_year["Earnings Per Share - Actual Surprise"], bins=bins, labels=df_percentiles.columns).value_counts().sort_index()
    df_percentiles.loc[year] = percentile_counts.values

df_percentiles

Unnamed: 0,-100--50,-50--20,-20--10,-10--5,-5-0,0-5,5-10,10-20,20-30,30-40,40-50,50-60,60-80,80-100
1983,0,4,3,3,9,4,2,0,0,0,0,0,0,0
1984,5,10,16,16,51,45,21,9,4,3,2,1,1,3
1985,3,18,8,13,51,56,16,9,1,4,0,0,2,0
1986,7,13,14,21,61,51,8,5,0,2,1,1,0,5
1987,8,19,22,24,26,37,15,12,4,1,0,3,0,3
1988,6,17,12,15,46,59,9,11,2,1,1,0,0,0
1989,5,8,7,14,44,77,15,9,1,3,1,2,0,1
1990,8,10,10,19,55,60,19,8,1,1,0,1,0,0
1991,13,25,25,28,93,87,28,19,2,2,0,2,3,8
1992,8,16,17,20,72,82,28,6,5,1,0,2,0,3


### 5) Average forecast error (surprise) over the last 10 years

In [127]:
df_accuracy_yearly

Unnamed: 0,Instrument,Date,Earnings Per Share - Actual,Earnings Per Share - Mean,Earnings Per Share - Actual Surprise
0,POOL.OQ,2023,18.7,18.78,-0.426
1,POOL.OQ,2022,15.92,15.19,4.806
2,POOL.OQ,2021,9.13,8.45,8.047
3,POOL.OQ,2020,6.4,6.34111,0.929
4,POOL.OQ,2019,5.62,5.65,-0.531
...,...,...,...,...,...
20609,AVY.N,1996,1.34,1.32889,0.836
20610,AVY.N,1995,0.985,0.9575,2.872
20611,AVY.N,1994,0.72,0.72389,-0.537
20612,AVY.N,1993,0.665,0.69063,-3.711


In [129]:
# Group the data by year and calculate the mean error for each year
df_yearly_mean = df_accuracy_yearly.groupby("Date")["Earnings Per Share - Actual Surprise"].mean().reset_index()

# Create a line plot using Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_yearly_mean ["Date"], y=df_yearly_mean ["Earnings Per Share - Actual Surprise"], mode="lines", name="Average S&P500 Forecast Error"))

# Set the title and axis labels
fig.update_layout(title="Yearly Average Forecast Error of S&P 500 Companies",
                   xaxis_title="Year",
                   yaxis_title="Forecast Error (%)")

# Display the plot
fig.show()

In [104]:
# Group the data by year and calculate the mean error for each year
df_quaterly_mean = df_accuracy.groupby("Date")["Earnings Per Share - Actual Surprise"].mean().reset_index()

# Create a line plot using Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(x=df_accuracy ["Date"], y=df_accuracy ["Earnings Per Share - Actual Surprise"], mode="lines", name="Average S&P500 Forecast Error"))

# Set the title and axis labels
fig.update_layout(title="Quaterly Average Forecast Error of S&P 500 Companies",
                   xaxis_title="Quater",
                   yaxis_title="Forecast Error (%)")

# Display the plot
fig.show()