# Evaluating Your Forecast

So far you have prepared your data, and generated your first Forecast. Now is the time to pull down the predictions from this Predictor, and compare them to the actual observed values. This will let us know the impact of accuracy based on the Forecast.

You can extend the approaches here to compare multiple models or predictors and to determine the impact of improved accuracy on your use case.

Overview:

* Setup
* Obtaining a Prediction
* Plotting the Actual Results
* Plotting the Prediction
* Comparing the Prediction to Actual Results

## Setup

Import the standard Python Libraries that are used in this lesson.

In [1]:
import boto3
from time import sleep
import subprocess
import pandas as pd
import json
import time
from datetime import datetime
from dateutil.parser import parse

The last part of the setup process is to validate that your account can communicate with Amazon Forecast, the cell below does just that.

In [2]:
session = boto3.Session(region_name='us-east-1') 
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

## Obtaining a Prediction:

Now that your predictor is active we will query it to get a prediction that will be plotted later.

In [3]:
forecast_arn = "arn:aws:forecast:us-east-1:457927431838:forecast/cof_revenue_forecastdemo_autoML_forecast" # Obtain from your previous notebook.

In [4]:
forecastResponse = forecastquery.query_forecast(
    ForecastArn=forecast_arn,
    Filters={"metric_name":"Revenue"}
)

## Actual Results

In the first notebook we created a file of observed values, we are now going to select a given year and look at the actual revenue

In [6]:
actual_df = pd.read_csv("../data/cof-revenue-validation.csv", names=['metric_name','timestamp','metric_value'])
actual_df.head()

Unnamed: 0,metric_name,timestamp,metric_value
0,metric_name,timestamp,metric_value
1,Revenue,2018-12-31,28076000000


## Prediction:

Next we need to convert the JSON response from the Predictor to a dataframe to see the prediction.

In [7]:
# Generate DF 
prediction_df_p10 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p10'])
prediction_df_p10.head()

Unnamed: 0,Timestamp,Value
0,2018-01-01T00:00:00,26715790000.0


The above merely did the p10 values, now do the same for p50 and p90.

In [9]:
prediction_df_p50 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p50'])
prediction_df_p90 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p90'])

frames = [prediction_df_p10, prediction_df_p50, prediction_df_p90]

prediction_df = pd.concat(frames)

prediction_df.head(3)

Unnamed: 0,Timestamp,Value
0,2018-01-01T00:00:00,26715790000.0
0,2018-01-01T00:00:00,28476710000.0
0,2018-01-01T00:00:00,30237630000.0


## Comparing the Prediction to Actual Results

After obtaining the dataframes the next task is to compare them together to determine the best fit.

In [27]:

# We start by creating a dataframe to house our content, here source will be which dataframe it came from
results_df = pd.DataFrame(columns=['timestamp', 'metric_value', 'source'])

Import the observed values into the dataframe:

In [28]:
for index, row in actual_df.iterrows():
    print(index, row['timestamp'], row['metric_value'])


0 timestamp metric_value
1 2018-12-31 28076000000


In [29]:
import datetime
for index, row in actual_df.iterrows():
    if index != 0:
        clean_timestamp = parse("2018-01-01")
        results_df = results_df.append({'timestamp' : clean_timestamp , 'metric_value' : row['metric_value'], 'source': 'actual'} , ignore_index=True)

In [30]:
# To show the new dataframe
results_df.head()

Unnamed: 0,timestamp,metric_value,source
0,2018-01-01,28076000000,actual


In [31]:
# Now add the P10, P50, and P90 Values
for index, row in prediction_df_p10.iterrows():
    clean_timestamp = parse(row['Timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'metric_value' : row['Value'], 'source': 'p10'} , ignore_index=True)
for index, row in prediction_df_p50.iterrows():
    clean_timestamp = parse(row['Timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'metric_value' : row['Value'], 'source': 'p50'} , ignore_index=True)
for index, row in prediction_df_p90.iterrows():
    clean_timestamp = parse(row['Timestamp'])
    results_df = results_df.append({'timestamp' : clean_timestamp , 'metric_value' : row['Value'], 'source': 'p90'} , ignore_index=True)

In [39]:
pd.options.display.float_format = '{:20}'.format
results_df

Unnamed: 0,timestamp,metric_value,source
0,2018-01-01,28076000000.0,actual
1,2018-01-01,26715791360.0,p10
2,2018-01-01,28476710912.0,p50
3,2018-01-01,30237632512.0,p90


In [40]:
pivot_df = results_df.pivot(columns='source', values='metric_value', index="timestamp")

pivot_df

source,actual,p10,p50,p90
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-01-01,28076000000,26715791360.0,28476710912.0,30237632512.0
