# Amazon Forecast: predicting time-series at scale

Forecasting is used in a variety of applications and business use cases: For example, retailers need to forecast the sales of their products to decide how much stock they need by location, Manufacturers need to estimate the number of parts required at their factories to optimize their supply chain, Businesses need to estimate their flexible workforce needs, Utilities need to forecast electricity consumption needs in order to attain an efficient energy network, and 
enterprises need to estimate their cloud infrastructure needs.

<img src="BlogImages/amazon_forecast.png">


# Table of Contents

* Step 0: [Setting up](#setup)
* Step 1: [Preparing the Datasets](#prepare)
* Step 2: [Importing the Data](#import)
 * Step 2a: [Creating a Dataset Group](#create)
 * Step 2b: [Creating a Target Dataset](#target)
 * Step 2c: [Creating a Related Dataset](#related)
 * Step 2d: [Update the Dataset Group](#update)
 * Step 2e: [Creating a Target Time Series Dataset Import Job](#targetImport)
 * Step 2f: [Creating a Related Time Series Dataset Import Job](#relatedImport)
* Step 3: [Choosing an Algorithm and Evaluating its Performance](#algo)
 * Step 3a: [Choosing DeepAR+](#DeepAR)
 * Step 3b: [Choosing Prophet](#prophet)
* Step 4: [Computing Error Metrics from Backtesting](#error)
* Step 5: [Creating a Forecast](#forecast)
* Step 6: [Querying the Forecasts](#query)
* Step 7: [Exporting the Forecasts](#export)
* Step 8: [Clearning up your Resources](#cleanup)


#  First let us setup Amazon Forecast<a class="anchor" id="setup">

This section sets up the permissions and relevant endpoints.

In [None]:
%load_ext autoreload
%autoreload 2
from util.fcst_utils import *
from util.hint import hint
import warnings, boto3, s3fs, json

plt.rcParams['figure.figsize'] = (15.0, 5.0)
warnings.filterwarnings('ignore')

Now we wil create a boto3 session. Boto3 allows us to interact with all of the AWS services via python.
The region for the session is the defaulted region for your account. You can choose any of the 6 regions where the forecast service is available.

Once we have the boto3 session, we csan create the clients for forecast and forecast query.

In [None]:
# get boto3 session and s3 client
session = boto3.Session()  # can specifify region here i.e. region='us-west-2'

# get the s3, forecast and forecast-query clients
forecast = session.client('forecast')
forecast_query = session.client('forecastquery')

We also need the role arn when creating various forecast entities, the unique s3 bucket where all our forecast data will be stored and the project prefix for all the forecast entities we will be creating.

In [None]:
role_arn = get_or_create_role_arn()
account_id = session.client('sts').get_caller_identity().get('Account')
project=f"forecastdemo"
# set bucket name and key

bucket_name="<your-bucket-name>"
#bucket_name="forecastdemo-jihys"
s3_data_path = f"s3://{bucket_name}/data"

Finally, we need the parameters that we will pass to the Forecast service that determine how to process the time series data. This includes:

In [None]:
timeseries_frequency = "H" # hourly frequency data
forecast_horizon = 24  # forecast 24 hours into future
timestamp_format = "yyyy-MM-dd HH:mm:ss" # timestamp format

# Overview

<img src="BlogImages/outline.png">

<img src="BlogImages/forecast_workflow.png">

The above figure summarizes the key workflow of using Forecast. 

# Step 1: Preparing the Datasets<a class="anchor" id="prepare">

In [None]:
bike_df = pd.read_csv("../data/train.csv", dtype = object)
bike_df.head()

In [None]:
bike_df['count'] = bike_df['count'].astype('float')
bike_df['workingday'] = bike_df['workingday'].astype('float')

We take about two and a half week's of hourly data for demonstration, just for the purpose that there's no missing data in the whole range.

In [None]:
bike_df_small = bike_df[-2*7*24-24*3:]
bike_df_small['item_id'] = "bike_12"

Let us plot the time series first.

In [None]:
bike_df_small.plot(x='datetime', y='count', figsize=(15, 8))

We can see that the target time series seem to have a drop over weekends. Next let's plot both the target time series and the related time series that indicates whether today is a `workday` or not. More precisely, $r_t = 1$ if $t$ is a work day and 0 if not.

In [None]:
plt.figure(figsize=(15, 8))
ax = plt.gca()
bike_df_small.plot(x='datetime', y='count', ax=ax);
ax2 = ax.twinx()
bike_df_small.plot(x='datetime', y='workingday', color='red', ax=ax2);

Notice that to use the related time series, we need to ensure that the related time series covers the whole target time series, as well as the future values as specified by the forecast horizon. More precisely, we need to make sure:
```
len(related time series) >= len(target time series) + forecast horizon
```
Basically, all items need to have data start at or before the item start date, and have data until the forecast horizon (i.e. the latest end date across all items + forecast horizon).  Additionally, there should be no missing values in the related time series. The following picture illustrates the desired logic. 

<img src="BlogImages/rts_viz.png">

For more details regarding how to prepare your Related Time Series dataset, please refer to the public documentation <a href="https://docs.aws.amazon.com/forecast/latest/dg/related-time-series-datasets.html">here</a>. 

Suppose in this particular example, we wish to forecast for the next 24 hours, and thus we generate the following dataset.

In [None]:
target_df = bike_df_small[['item_id', 'datetime', 'count']][:-24]
related_df = bike_df_small[['item_id', 'datetime', 'workingday']]

In [None]:
target_df.head(5)

As we can see, the length of the related time series is equal to the length of the target time series plus the forecast horizon. 

In [None]:
print(len(target_df), len(related_df))
assert len(target_df) + 24 == len(related_df), "length doesn't match"

Next we check whether there are "holes" in the related time series.  

In [None]:
assert len(related_df) == len(pd.date_range(
    start=list(related_df['datetime'])[0],
    end=list(related_df['datetime'])[-1],
    freq=timeseries_frequency
)), "missing entries in the related time series"

Everything looks fine, and we plot both time series again. As it can be seen, the related time series (indicator of whether the current day is a workday or not) is longer than the target time series.  The binary working day indicator feature is a good example of a related time series, since it is known at all future time points.  Other examples of related time series include holiday and promotion features.

In [None]:
plt.figure(figsize=(15, 10))
ax = plt.gca()
target_df.plot(x='datetime', y='count', ax=ax);
ax2 = ax.twinx()
related_df.plot(x='datetime', y='workingday', color='red', ax=ax2);

In [None]:
target_df.to_csv("../data/bike_target.csv", index= False, header = False)
related_df.to_csv("../data/bike_related.csv", index= False, header = False)

If you don't have this bucket `amazon-forecast-data-{account_id}`, create it first on S3.

In [None]:
# sync data to s3
# '!' is used to make calls to the os shell
# where we then us the aws command line
#!aws s3 mb s3://$bucket_name
!aws s3 sync ../data $s3_data_path

# Step 2. Importing the Data<a class="anchor" id="import">

To train a predictor, you create one or more datasets, add them to a dataset group, and provide the dataset group for training.

For each dataset that you create, you associate a dataset domain and a dataset type. A dataset domain defines a forecasting use case.

Amazon Forecast supports the following dataset domains:

* **RETAIL** – For retail demand forecasting

* **INVENTORY_PLANNING** – For supply chain and inventory planning

* **EC2 CAPACITY** – For forecasting Amazon Elastic Compute Cloud (Amazon EC2) capacity

* **WORK_FORCE** – For work force planning

* **WEB_TRAFFIC** – For estimating future web traffic

* **METRICS** – For forecasting metrics, such as revenue and cash flow

* **CUSTOM** – For all other types of time-series forecasting

Each domain can have one to three dataset types. The dataset types that you create for a domain are based on the type of data that you have and what you want to include in training.

Each domain requires a **TARGET_TIME_SERIES** dataset, and optionally supports the RELATED_TIME_SERIES and ITEM_METADATA dataset types.

The dataset types are:

**TARGET_TIME_SERIES** The only required dataset type. This type defines the target field that you want to generate forecasts for. For example, if you want to forecast the sales for a set of products, then you must create a dataset of historical time-series data for each of the products that you want to forecast. Similarly, you can create a target_time_series dataset for metrics— such as revenue, cash flow, and sales—that you might want to forecast.

**RELATED_TIME_SERIES** Time-series data that is related to the target time-series data. For example, price is related to product sales data, so you might provide it as a related_time_series.

**ITEM_METADATA** Metadata that is applicable to the target time-series data. For example, if you are forecasting sales for a particular product, attributes of the product—such as brand, color, and genre—will be part of item_metadata. When predicting EC2 capacity for EC2 instances, metadata might include the CPU and memory of the instance types.

For each dataset type, your input data must contain certain required fields. You can also include optional fields that Amazon Forecast suggests that you include.

For this workshop we will be working with a **RETAIL** use case for bike sales. An example dataset for a retailer  normally records the transaction record like this:
<img src="BlogImages/data_format.png">

We will be doing the data import via a series of python calls to the forecast service, but youn could also do this directly from the aws console this dialog.

<img src="BlogImages/timestamp.png">

## Step 2a. Creating a Dataset Group<a class="anchor" id="create">
First let's create a dataset group and then update it later to add our datasets. Since this is **RETAIL** use case we will specify that as the domain.

In [None]:
dataset_group = f"{project}_dataset_group"


print(dataset_group)
create_dataset_group_response = forecast.create_dataset_group(Domain="RETAIL",
                                                          DatasetGroupName=dataset_group,
                                                          DatasetArns=[])

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']

forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

## Step 2b. Creating a Target Dataset<a class="anchor" id="target">
Now we will define a target time series. This is a required dataset to use the service. For this exmaple, the number of items sold, or `demand` is the target value we will be forecasting. 
  
First, we specify the name and schema of our dataset. Make sure the order of the attributes (columns) matches the raw data in the files. We follow the same three attribute format as the above example.

In [None]:
name = f"{project}_target_dataset"
schema = {
    "Attributes": [
        {
            "AttributeName": "item_id", 
            "AttributeType": "string"
        },
        {
            "AttributeName": "timestamp", 
            "AttributeType": "timestamp"
        },
        {
            "AttributeName": "demand", 
            "AttributeType": "float"
        }
    ]
}

Now that we have a schema and name we can create the target data set. This only sets up the definition of the dataset. No data has been imported to Forecast yet. Data import will happen later when we create the import jobs.

In [None]:
response = forecast.create_dataset(Domain="RETAIL",
                               DatasetType='TARGET_TIME_SERIES',
                               DatasetName=name,
                               DataFrequency=timeseries_frequency,
                               Schema=schema
)

target_dataset_arn = response['DatasetArn']
forecast.describe_dataset(DatasetArn=target_dataset_arn)

## Step 2c. Creating a Related Dataset<a class="anchor" id="related">
Now we will create a related time series dataset using the related price data for the items. The method call is very simliar to the above, except you will have `price` instead of `demand`.

In [None]:
hint('2c')

In [None]:
name = f"{project}_related_dataset"
schema = # TODO

response = # TODO



related_dataset_arn = response['DatasetArn']
forecast.describe_dataset(DatasetArn=related_dataset_arn)

## Step 2d. Updating the dataset group with the datasets we created<a class="anchor" id="update">
You can have multiple datasets under the same dataset group. Update it with the datasets we created before.

In [None]:
dataset_arns = [target_dataset_arn, related_dataset_arn]
forecast.update_dataset_group(DatasetGroupArn=dataset_group_arn, DatasetArns=dataset_arns)
forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)

## Step 2e. Creating a Target Time Series Dataset Import Job<a class="anchor" id="targetImport">
    
Now that we that we have defined the target time series, will still need to create an import job to actually load the data into Amazon Forecast from s3.

In [None]:
s3_path = f"{s3_data_path}/bike_target.csv"

response = forecast.create_dataset_import_job(
    DatasetImportJobName=dataset_group,
    DatasetArn=target_dataset_arn,
    DataSource= {
        "S3Config" : {
            "Path": s3_path,
            "RoleArn": role_arn
        } 
    },
    TimestampFormat= timestamp_format
)

target_dataset_import_job_arn = response['DatasetImportJobArn']

## Step 2f. Creating a Related Time Series Dataset Import Job<a class="anchor" id="relatedImport">

In [None]:
hint('2f')

In [None]:
s3_path = f"{s3_data_path}/bike_related.csv"
response = # TODO
related_dataset_import_job_arn = response['DatasetImportJobArn']

We now need to wait for all the import jobs to finish. We will use a simple blocking `wait` method that checks the import job status.

In [None]:
assert(wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=target_dataset_import_job_arn)))
assert(wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=related_dataset_import_job_arn)))

# Step 3. Choosing an Algorithm <a class="anchor" id="algo">

Once the datasets are specified with the corresponding schema, Amazon Forecast will automatically aggregate all the relevant pieces of information for each item, such as sales, price, promotions, as well as categorical attributes, and generate the desired dataset. Next, one can choose an algorithm (forecasting model) and evaluate how well this particular algorithm works on this dataset. The following graph gives a high-level overview of the forecasting models.
<img src="BlogImages/recipes.png">
<img src="BlogImages/pred_details.png">


Amazon Forecast provides several state-of-the-art forecasting algorithms including classic forecasting methods such as ETS, ARIMA, Prophet and deep learning approaches such as DeepAR+. Classical forecasting methods, such as Autoregressive Integrated Moving Average (ARIMA) or Exponential Smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. Amazon's Non-Parametric Time Series (NPTS) forecaster also fits a single model to each individual time series.  Unlike the naive or seasonal naive forecasters that use a fixed time index (the previous index $T-1$ or the past season $T - \tau$) as the prediction for time step $T$, NPTS randomly samples a time index $t \in \{0, \dots T-1\}$ in the past to generate a sample for the current time step $T$.

In many applications, you may encounter many similar time series across a set of cross-sectional units. Examples of such time series groupings are demand for different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of these time series. DeepAR+ takes this approach, outperforming the standard ARIMA and ETS methods when your dataset contains hundreds of related time series. The trained model can also be used for generating forecasts for new time series that are similar to the ones it has been trained on. While deep learning approaches can outperform standard methods, this is only possible when there is sufficient data available for training. It is not true for example when one trains a neural network with a time-series contains only a few dozens of observations. Amazon Forecast provides the best of two worlds allowing users to either choose a specific algorithm or let Amazon Forecast automatically perform model selection. 

In [None]:
base_algorithm_arn = 'arn:aws:forecast:::algorithm/'


## Step 3a. Choosing DeepAR+<a class="anchor" id="DeepAR">
    
Amazon Forecast DeepAR+ is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNNs). Classical forecasting methods, such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS), fit a single model to each individual time series, and then use that model to extrapolate the time series into the future. In many applications, however, you have many similar time series across a set of cross-sectional units. These time-series groupings demand different products, server loads, and requests for web pages. In this case, it can be beneficial to train a single model jointly over all of the time series. DeepAR+ takes this approach. When your dataset contains hundreds of feature time series, the DeepAR+ algorithm outperforms the standard ARIMA and ETS methods. You can also use the trained model for generating forecasts for new time series that are similar to the ones it has been trained on.

For more on DeepAR+, see the [Amazon Forecast Doumentation](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-deeparplus.html).

In [None]:
algorithm_arn = f'{base_algorithm_arn}Deep_AR_Plus'
predictor_name = f'{project}_Deep_AR_Pls'

response = forecast.create_predictor(
    PredictorName = predictor_name,
    AlgorithmArn = algorithm_arn,
    ForecastHorizon = forecast_horizon,
    PerformAutoML = False,
    PerformHPO = False,
    InputDataConfig = {'DatasetGroupArn': dataset_group_arn},
    FeaturizationConfig = {'ForecastFrequency': timeseries_frequency}
)

predictor_arn_deep_ar = response['PredictorArn']

## Step 3b.  Choosing Prophet<a class="anchor" id="prophet">
    
Prophet is a popular local Bayesian structural time series model. The Amazon Forecast Prophet algorithm uses the Prophet class of the Python implementation of Prophet.

#### How Prophet Works
Prophet is especially useful for datasets that:

* Contain an extended time period (months or years) of detailed historical observations (hourly, daily, or weekly)
* Have multiple strong seasonalities
* Include previously known important, but irregular, events
* Have missing data points or large outliers
* Have non-linear growth trends that are approaching a limit

Prophet is an additive regression model with a piecewise linear or logistic growth curve trend. It includes a yearly seasonal component modeled using Fourier series and a weekly seasonal component modeled using dummy variables.

For more information, see the [Amazon Forecast Documentation](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-prophet.html).

Prophet Hyperparameters and Related Time Series
Amazon Forecast uses the default Prophet hyperparameters. Prophet also supports related time-series as features, provided to Amazon Forecast in the related time-series CSV file.

In [None]:
hint('3b')

In [None]:
algorithm_arn = f'{base_algorithm_arn}Prophet'
predictor_name = f'{project}_Prophet'

response = # TODO

predictor_arn_prophet = response['PredictorArn']

### Optional: Want to try another Predictor?
If you would like to try different predictors, check out the [Amazon Forecast Documentation](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html), and create another predictor.

In [None]:
algorithm_arn_other=f'{base_algorithm_arn}NPTS'
predictor_name=f'{project}_NPTS'

response = # TODO


predictor_arn_other= response['PredictorArn']
wait(lambda: forecast.describe_predictor(PredictorArn=predictor_arn_other))
forecast.describe_predictor(PredictorArn=predictor_arn_other)


We now need to wait for both predictors to complete training.

In [None]:
wait(lambda: forecast.describe_predictor(PredictorArn=predictor_arn_prophet))
forecast.describe_predictor(PredictorArn=predictor_arn_prophet)

if predictor_arn_other:
    wait(lambda: forecast.describe_predictor(PredictorArn=predictor_arn_other))
    forecast.describe_predictor(PredictorArn=predictor_arn_other)

wait(lambda: forecast.describe_predictor(PredictorArn=predictor_arn_deep_ar))
forecast.describe_predictor(PredictorArn=predictor_arn_deep_ar)

# Step 4. Computing Error Metrics from Backtesting<a class="anchor" id="error">
    
### How to evaluate a forecasting model?

To evaluate the accuracy of an algorithm for various forecasting scenarios and to tune the predictor, use predictor metrics. Amazon Forecast uses backtesting to produce metrics.

Forecast automatically splits your input data into two datasets, training and test, as shown in the following figure. Forecast decides how to split the input data by using the BackTestWindowOffset parameter that you specify in the CreatePredictor operation, or if not specified, it uses the default value of the ForecastHorizon parameter. For more information, see EvaluationParameters.

<img src="BlogImages/backtest.png">

To evaluate the metrics in multiple backtest scenarios with different virtual forecast start dates, as shown in the following figure, use the NumberOfBacktestWindows parameter in the `CreatePredictor` operation. The default for the `NumberOfBacktestWindows` parameter is 1. If you use the default, Forecast uses the simple splitting method shown in the preceding figure.

<img src="BlogImages/evaluation-backtests.png">

After training, Amazon Forecast calculates the root mean square error (RMSE) and weighted quantile losses to determine how well the model predicted the test data in each backtest window and the average value over all the backtest windows. These metrics measure the difference between the values predicted by the model and the actual values in the test dataset. To retrieve the metrics, you use the `GetAccuracyMetrics` operation.

#### Prediction Quantiles and MAPE

Prediction quantiles (intervals) express the uncertainty in the forecasts. By calculating prediction quantiles, the model shows how much uncertainty is associated with each forecast. Without accompanying prediction quantiles, point forecasts have limited value.

Predicting forecasts at different quantiles is particularly useful when the costs of under and over predicting differ. Amazon Forecast provides probabilistic predictions at three distinct quantiles—10%, 50%, and 90%—and calculates the associated loss (error) at each quantile. The weighted quantile loss (wQuantileLoss) calculates how far off the forecast is from actual demand in either direction. This is calculated as a percentage of demand on average in each quantile. This metric helps capture the bias inherent in each quantile, which can't be captured by a calculation like MAPE (Mean Absolute Percentage Error), where the weights are equal. As with MAPE and RMSE, lower wQuantileLoss errors indicate better overall forecast accuracy.

Amazon Forecast calculates the weighted P10, P50, and P90 quantile losses, where τ is in the set {0.1, 0.5, 0.9}, respectively. This covers the standard 80% confidence interval. For RMSE, Amazon Forecast uses the P50 forecast to represent the predicted value, for example, ŷi,t = qi,t(0.5).

When the sum of the exact target over all items and all time is approximately zero in a given backtest window, the weighted quantile loss expression is undefined. In this case, Amazon Forecast outputs the unweighted quantile loss, which is the numerator in the above wQuantileLoss expression.

**wQuantileLoss[0.1]**: For the P10 prediction, the true value is expected to be lower than the predicted value 10% of the time.

For example, suppose that you're a retailer and you want to forecast product demand for winter gloves that sell well only during the fall and winter. If you don't have a lot of storage space and the cost of invested capital is high, or if the price of being overstocked on winter gloves concerns you, you might use the P10 quantile to order a relatively low number of winter gloves. You know that the P10 forecast overestimates the demand for your winter gloves only 10% of the time, so 90% of the time you'll be sold out of your winter gloves.

**wQuantileLoss[0.5]**: For the P50 prediction, the true value is expected to be lower than the predicted value 50% of the time. In most cases, the point forecasts that you generate internally or with other forecasting tools should match the P50 forecasts.

**wQuantileLoss[0.9]**: For the P90 prediction, the true value is expected to be lower than the predicted value 90% of the time.

If you determine that being understocked on gloves will result in huge amounts of lost revenue—for example, the cost of not selling gloves is extremely high or the cost of invested capital is low—you might choose to use the P90 quantile to order gloves.

The following figure of a forecast that has a Gaussian distribution, shows the quantiles that divide the forecast into four regions of equal probability. For information about the quantiles of a distribution, see Quantile on Wikipedia.

<img src="BlogImages/metrics-gaussian.png">

### Get the Error Metrics

Now that we have trained predictors, we can get the error metrics for them. 

In [None]:
error_metrics_deep_ar_plus = forecast.get_accuracy_metrics(PredictorArn=predictor_arn_deep_ar)
print(error_metrics_deep_ar_plus)

error_metrics_prophet = forecast.get_accuracy_metrics(PredictorArn=predictor_arn_prophet)
print(error_metrics_prophet)

if predictor_arn_other:
    error_metrics_other = forecast.get_accuracy_metrics(PredictorArn=predictor_arn_other)
    print(error_metrics_other)

In [None]:
def extract_summary_metrics(metric_response, predictor_name):
    df = pd.DataFrame(metric_response['PredictorEvaluationResults']
                 [0]['TestWindows'][0]['Metrics']['WeightedQuantileLosses'])
    df['Predictor'] = predictor_name
    return df

In [None]:
deep_ar_metrics = extract_summary_metrics(error_metrics_deep_ar_plus, "DeepAR")
prophet_metrics = extract_summary_metrics(error_metrics_prophet, "Prophet")

if predictor_arn_other:
    other_metrics = extract_summary_metrics(error_metrics_other, "Other")

In [None]:
metrics = [deep_ar_metrics, prophet_metrics]
if predictor_arn_other:
    metrics.append(other_metrics)
    
pd.concat(metrics) \
        .pivot(index='Quantile', columns='Predictor', values='LossValue').plot.bar();

As we mentioned before, if you only have a handful of time series (in this case, only 1) with a small number of examples, the neural network models (DeepAR+) are not the best choice. Here, we clearly see that DeepAR+ behaves worse than Prophet in the case of a single time series. 

# Step 5. Creating a Forecast<a class="anchor" id="forecast">

The `create_forecast` method uses the predictor to create a forecast. In the response, you will get the Amazon Resource Name (ARN) of the forecast. You use this ARN to retrieve and export the forecast. 

In [None]:
forecast_name = f'{project}_deep_ar_plus'
response = forecast.create_forecast(
    ForecastName=forecast_name,
    PredictorArn=predictor_arn_deep_ar
)

forecast_arn_deep_ar = response['ForecastArn']

Now create a forecast using the prophet dataset and the optional predictor if you created one.

In [None]:
hint('5')

In [None]:
forecast_name = f'{project}_prophet'

response = # TODO

forecast_arn_prophet = response['ForecastArn']

In [None]:
#forecast_name = # TODO
#response = # TODO
#forecast_arn_other = response['ForecastArn']

Now we need to wait for the forecasts to be finish being created.

In [None]:
wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_deep_ar))
forecast.describe_forecast(ForecastArn=forecast_arn_deep_ar)

wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_prophet))
forecast.describe_forecast(ForecastArn=forecast_arn_prophet)

if predictor_arn_other:
    wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn_other))
    forecast.describe_forecast(ForecastArn=forecast_arn_other)

# Step 6. Querying the Forecasts<a class="anchor" id="query">

To query the forecasts that have been created,  use the following parameters.

* **start-date** and **end-date** – Specifies an optional date range to retrieve the forecast for. If you don't specify these parameters, the operation returns the entire forecast for bike_12.
* **filters** – Specifies the item_id filter to retrieve the electricity forecast for bike_12.

Because this is an hourly forecast, the response shows hourly forecast values. In the response, note the following:

* **mean** – For the specific date and time, the mean is the predicted mean value.
* **p90, p50, and p10** – Specify the confidence level that the actual value will be below the listed value at the specified date and time. 

For more information about this operation, see [QueryForecast](https://docs.aws.amazon.com/forecast/latest/dg/API_forecastquery_QueryForecast.html).

In [None]:
item_id = 'bike_12'

response = forecast_query.query_forecast(
    ForecastArn=forecast_arn_deep_ar,
    Filters={"item_id": item_id}
)

The response is a json structure:

In [None]:
print(json.dumps(response))

We will use a utility function already created for you to plot the actual values against the predicted values.

In [None]:
plt.figure(figsize=(30,10))
fname = f'../data/bike_small.csv'
actual = load_exact_sol(fname, item_id)

plot_forecasts(response, actual)
plt.title("DeepAR Forecast");

Now query the forecast for prophet=, and if you created it, your optional predictor.

In [None]:
hint('6')

In [None]:
plt.figure(figsize=(30,10))
response = #To Do

plot_forecasts(response, actual)
plt.title("Prophet Forecast");

#response = # TODO OTHER
# plot_forecasts(response, actual)
# plt.title("Other Forecast");

# Step 7. Exporting your Forecasts<a class="anchor" id="export">

In [None]:
name = f'{project}_forecast_export_deep_ar_plus'
s3_path = f"{s3_data_path}/{name}"

response = forecast.create_forecast_export_job(
    ForecastExportJobName=name,
    ForecastArn=forecast_arn_deep_ar,
    Destination={
        "S3Config" : {
            "Path": s3_path,
             "RoleArn": role_arn
        }
    }
)

forecast_export_arn_deep_ar = response['ForecastExportJobArn']

Now create uour own export job for the prophet model

In [None]:
hint('7')

In [None]:
name = f'{project}_forecast_export_prophet'
s3_path = f"{s3_data_path}/{name}"

response = # TODO

forecast_export_arn_prophet = response['ForecastExportJobArn']

# Step 8. Cleaning up your Resources<a class="anchor" id="cleanup">

Once we have completed the above steps, we can start to cleanup the resources we created. All delete jobs, except for `delete_dataset_group` are asynchronous, so we have added the helpful `wait_till_delete` function. 
Resource Limits documented <a href="https://docs.aws.amazon.com/forecast/latest/dg/limits.html">here</a>. 

In [None]:
# Delete forecast export for both algorithms
wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = forecast_export_arn_deep_ar))
wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = forecast_export_arn_prophet))
if forecast_export_arn_other:
    wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = forecast_export_arn_other))

In [None]:
# Delete forecast for both algorithms
wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn_deep_ar))
wait_till_delete(lambda: forecast.delete_forecast(ForecastArn = forecast_arn_prophet))
if forecast_export_arn_other:
    wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = forecast_arn_other))

In [None]:
# Delete predictor for both algorithms
wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn_deep_ar))
wait_till_delete(lambda: forecast.delete_predictor(PredictorArn = predictor_arn_prophet))
if forecast_export_arn_other:
    wait_till_delete(lambda: forecast.delete_forecast_export_job(ForecastExportJobArn = predictor_arn_other))

In [None]:
# Delete the target time series and related time series dataset import jobs
wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=target_dataset_import_job_arn))
wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=related_dataset_import_job_arn))
wait_till_delete(lambda: forecast.delete_dataset_import_job(DatasetImportJobArn=item_metadata_dataset_import_job_arn))

In [None]:
# Delete the target time series and related time series datasets
wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=target_dataset_arn))
wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=related_dataset_arn))
wait_till_delete(lambda: forecast.delete_dataset(DatasetArn=item_metadata_dataset_arn))

In [None]:
# Delete dataset group
forecast.delete_dataset_group(DatasetGroupArn=dataset_group_arn)