# DeepAR 

Contents: 

- Overview Main paper: [Salinas et al. (2017). "DeepAR: Probabilistic forecasting with autoregressive recurrent networks"](https://arxiv.org/pdf/1704.04110)
- AWS Docs: [DeepAR Forecasting Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html)

## Salinas et al. (2017)

### 1 - Introduction

- Why TS forecasting is important
- What's different? -> Before: Local methods; Now: Global methods
- Extensive empirical evaluation of those who developed the algorithm. DeepAR is the go-to time series forecasting algorithm in AWS. TM: Value judgement to the disadvantage of ARIMAETS?
- Local methods: Based on ARIMA, ES, state space or some combinations (exclude ML combinations)?
- Global method: New type of forecasting -> Availability of hundreds or thousands of series that are related or generated by a common process...; 
- DeepAR: Autoregressive recurrent networks which learns a global model from historical data of al time series in the data set.
 - LSTM-based recurrent neural network architecture
- Providing better forecast accuracy than previous methods?
- Additional key advantages: 
 - i) minimial feature engineering; 
 - ii) probabilistic forecasts (monte carlo) -> Can be used o compute consisten quantile estimates for all sub-ranges in the prediction horizon.
 - iii) cold-start forecasting - No history at all required if similar series is added. 
 - iv) Can incorporate wide range of likelihood functions


### 2 - Related Work

- Examples of forecasting individual time series include foremost ARIMA models and exponential smoothing methods.
- Sharing information across individual time series: Can improve forecast accuracy but is difficult in practice
- Using RNN and train these networks on all data simultaneously.
- Main difference here: 1) Probabilistic forecasting (i.e. interested in entire distribution) 2) To obtain accurate distribution for (unbounded) count data --> Negative Binomial likelihood. 
 - Negative binomial likelihood precludes us from directly applying standard data normalization techniques.


### 3 - Model

Denoting the value of time series i at time t by $z_{i,t}$, our goal is to model the conditional distribution
$$
P(z_{i, t_0 : T}|z_{i,1:t_0-1}, x_{i,1:T} )
$$

where 

$$
z_{i, t_0 : T} := [z_{i,t_0}, z_{i, t_0 +1}, ..., z_{i, T}]
$$

is the future of time series until T. The difference between $t_0$ and $T$ is the forecast horizon, i.e. $z_{i,t_0 + 4}$ is equal to a forecast horizon of $h=4$. 

All previous realisations of the target time series until $t-1$ are denoted by

$$
z_{i,1:t_0-1} := [z_{i,1}, ..., z_{i,t_0 - 2}, z_{i,t_0 - 1}]
$$ 

$t_0$ denotes the time point from which we assume $z_{i,t}$ to be unknown at prediction time, $x_{i, 1:T}$ are covariates that are assumed to be known for all time points (?also for the future?).

According to this notation, the time range $[1, t_0-1]$ corresponds to the "past" that is used for training/conditioning, whereas the time range $[t_0, T]$ indicates the prediction range. 

- Time index t is relative, i.e. $t=1$ can correspond to a different actual time period for each i.

The model's distribution is assumed to be 

\begin{align}
Q_{\theta}(z_{i,t_0:T} | z_{i,1:t_0-1}, x_{i,1:T}) 
\end{align}

- depends on own previous values until $t_0-1$ (the last known timepoint.
- also depends on feature variables x
 - for previous timepoints
 - as well as for time points from $t_0:T$ (unknown to  end of forecast horizon) <- estimation of feature variables? 
 
The model distribution $Q_{\theta}$ consists of a product of likelihood factors 

\begin{align}
Q_{\theta}(z_{i,t_0:T}|z_{i,1:t_0-1},x_{i,1:T}) = \prod_{t=t_0}^T Q_{\theta}(z_{i,t}|z_{i,1:t-1},x_{i,1:T}) = \prod_{t=t_0}^T \ell( z_{i,t}|\theta(h_{i,t}, \Theta))
\end{align}

where 
$$
h_{i,t} = h(h_{i, t-1}, z_{i,t-1}, x_{i,t} , \Theta)
$$

and h being a multi-layer recurrent neural network with LSTM cells. 

- Model is autoregressive (AR) in the sense that it takes in observations at the last time step $z_{i,t-1}$ as an input, as well as recurrent, i.e. previous output of the network $h_{t-1}$ is fed back as an input at the next time step. 

More precisely (based on the docs), DeepAR adds frequency-specific time-lags which are used as additional feature variables.

Based on the frequency, the following feature variables are added:

<img src="../images/frequency_feature_table.png" width="80%">

- Likelihood $l(z_{i,t} | \theta(h_{i,t})$ is a fixed conditional distribution
- Parameters are given by a function $\theta(h_{i,t}, \Theta)$ 
- Network output: $h_{i,t}$

- ?Information about (...)?



## DeepAR: AWS Forecasting Algorithm

### What is DeepAR?

DeepAR is a supervised learning algorithm for time series forecasting that uses recurrent neural networks (RNN) to produce probabilistic forecasts.

### DeepAR highlights

- Claims to be able to be more accurate compared to classical forecasting techniques such as ARIMA or ES.
- Cold start forecasting: Can generate forecast for a time series with little or no existing historical data (forecasts based on the net trained with similar series)
- Can produce points forecasts as well as probabilistic forecasts (i.e. forecast lies between X and Y with Z% probability)
- Information sharing across series
- DeepAR forecasting algorithm can be used with AWS 

For an overview of the Amazon Sagemaker, refer to the [SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html). 

### DeepAR Forecasting Algorithm

The following notes are based on Amazon's documentation: [DeepAR Forecasting Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html) as one of the built-in algorithms in the Amazon SageMaker.

> DeepAR forecasting algorithm is a supervised learning algorithm for forecasting **scalar (one-dimensional)** time series using **recurrent neural networks (RNN)**.

Traditional methods, such as ARIMA or ETS, fit a single model to each individual time series. Subsequently, this model is used to extrapolate the time series into the future. There is no information sharing between the individual time series. 

In case of similar or related time series (how similar do they have to be?) information sharing may be beneficial. Examples include time series grouping where individual series are related to each other like products, server loads, household electricity. 

For this type of application, one may benefit from training a single model jointly over all of the time series using a recurrent neural network (RNN). One example of this approach is the DeepAR algorithm. In case the dataset contains hundreds of related time series, DeepAR outperforms the standard ARIMA and ETS methods. 

The training of the DeepAR algorithm:

- Input is one or, preferably, more target time series that hve been gnerated by the same process or similar processes
- Based on this input, the algorithm trains a model that learns an approximation of this process/processes and uses it to predict how the target time series evolves. 
- Target time series can be optionally associated with a vector of static (time-independent) categorical features provided by the `cat` field 
- Target time series can also be associated with a vector of dynamic (time-dependent) time series provided by the `dynamic_feat` field. 
- SageMaker trains the DeepAR model by randomly sampling training examples from each target time series in the training dataset.
- Each training example consists of a pair of adjacent context and prediction windows with fixed predefined lengths
 - Control how far in the past the network can see, use `context_length` hyperparameter.
 - Control how far in the future predictions can be made, use the `prediction_length` hyperparameter.

### Input/Output Interface for the DeepAR Algorithm

- Two supported data channels: `train`, `test`(optional)
 - `test`: data used to evaluate model after taining
- Training and test data can be provided as `JSON Lines` format.
- Files can be in gzip or Parquet

JSON Lines:

- [JSON Lines](http://jsonlines.org/): text format, also called newline-delimited JSON. Conventient format for storing structured data. 
 - JSON Lines has three requirements: UTF-8 Encoding,  Each Line is a Valid JSON Value, and Line Seperator is `\n` 
 - File extension `.jsonl`
 - `gzip` or `bzip2` are recommended for saving space, resulting in `.jsonl.gz` or `jsonl.bz2` files.

Parquet:

- [Apache Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) is a column-oriented data storage format
- Provides efficient data compression and encoding schemes

Specifying paths: 

- Specify single file or a directory that contains multiple files (can be stored in subdirectories)
- If directory is specified: DeepAR uses all files in the directory as inputs for the corresponding channel, 
 - except those that start with a period (.) and those named `_SUCCESS`
 - ensures that you can directly use output folders produced by Spark jobs as input channels for DeepAR training jobs
 
Input files:

- DeepAR determines input format from the file exensions (`.json`, `json.gz`, or `parquet`) in the specified input path. 

Records in your input files should contain: 

- `start` - Start timestamp as string with the format `YYYY-MM-DD HH:MM:SS`. 
- `target` - Represents the time series. An array of floating-point values or integers. 
 - Missing values can be encoded as `null` literals or `"NaN"` strings in JSON, or as `nan`floating-point values in Parquet. 

Optional inputs: 

- `dynamic_feat`(optional) - Vector of custom feature time series (dynamic features). 
 - In the form of an array of arrays of floating-point values or integers
 - Must have the same number of inner arrays (same number of feature time series)
 - Each inner array must have the same length as the associated target value. 
 - Missing values are not supported in the features
 - Example: If target time seies represents the demand of different products, an associated dynamic_feat might be a boolean time-series which indicates whether a promotion was applied (1) to the particular product or not (0). 
 
```
{"start": ..., "target": [1, 5, 10, 2], "dynamic_feat": [[0, 1, 1, 0]]}
```
Here, there is one time series without specified start date and of length 4, one dynamic feature (one array of arrays)

- `cat` (optional) - An array of categorical features that can be used to encode the groups that the record belongs to. 
 - Must be encoded as a 0-based sequence of positive integers. For example, categorical domain {R,G,B} can be encoded as {0, 1, 2}. All values from each categorical domain must be represented in the training dataset.
 - Each categorical feature is embedded in a low-dimensional space whose dimensionality is controlled by `embedding_dimension`
 
If a JSON file is used, it mus be in JSON Lines format. For example: 

```
{"start": "2009-11-01 00:00:00", "target": [4.3, "NaN", 5.1, ...], "cat": [0, 1], "dynamic_feat": [[1.1, 1.2, 0.5, ...]]}

{"start": "2012-01-30 00:00:00", "target": [1.0, -5.0, ...], "cat": [2, 3], "dynamic_feat": [[1.1, 2.05, ...]]}
 
{"start": "1999-01-30 00:00:00", "target": [2.0, 1.0], "cat": [1, 4], "dynamic_feat":
 [[1.3, 0.4]]}
``` 

In this example, each time series has two associated categorical features and one time series features. 

For Parquet: 

- "start" can be datetime type.
- Parquet files can be compressed using gzip or snappy compression

What happens if the algorithm is trained without the optional `cat` and `dynamic_feat`? 

- learns a "global" model, 
 - that is a model that is agnostic to the specific identity of the target time series at inference time and is condtioned only on its shape. 

What if the model is conditioned on the `cat` and `dynamic_feat` feature data? 

- Prediction will be influenced by the character of time series with the corresponding `cat` features. 
- For example: `target` time series represents the demand of clothing items 
 - Associate a two-dimensional cat vector that encodes the type of item (e.g. 0 = shoes, 1 = dress) in the first component and the color (e.g. 0 = red, 1 = blue) in the second component.
 
```
{"start": ..., "target": ..., "cat": [0, 0], ...} # red shoes

{"start": ..., "target": ..., "cat": [0, 1], ...} # blue dress
```
- At inference, you can request predictions for targets with cat values that are combinations of the cat values observed in the trainind data, for example: 

```
{"start": ..., "target": ..., "cat": [0, 1], ...} # red dress

{"start": ..., "target": ..., "cat": [1, 1], ...} # blue dress
```

Guidelines for training data: 

- Start time and length can differ
- Series must have the same 
 - frequency,
 - number of categorical features
 - number of dynamic features 
- Shuffle training file wrt position of the series in the file. 
- `start` timestep is used to derive the internal features. 
- If `cat` is used: All series must have the same number of categorical features. 
 - Algorithm uses `cat` and extracts the cardinality of the groups. Can be disabled (default `cardinality` is `"auto"`)
 - If model was trained using a cat feature, you must include it for inference
- If trainind data contain `dynamic_feat`: Automatically used by algorithm
 - All series must have the same number of feature series 
 - Time points must correspond one-to-one to the time points in `target`
 - same length of `dynamic_feat` as `target`.
 - can be disabled
 - If model was trained with `dynamic_feat`, then it must be provided for inference. In addition, each of the features has to have the length of the provided target plus `prediction_length`. You must provide the feature value in the future. 
 
Optional test channel data: 

- If specified, then DeepAR evaluates trained model with different accuracy metrics 
- Accuracy metrics: $RMSE$ and $wQuantileLoss$, which will be defined later in this doc.
- Specify the length of the forecast horizon by setting the `prediction_length` hyperparameter. 
- Specify which quantiles to calculate the loss for by setting the `test_quantiles` hyperparameter. In addition to these, the average of the prescribed losses is reported as part of the training logs. 

For inference: 

- DeepAR accepts JSON format and the following fields 
 - `"instances"`, including one or more time series in JSON Lines format
 - A name of `"configuration"`, which includes parameters for generating the forecast.

## Best Practices for Using the DeepAR Algorithm

During tuning of a DeepAR model:

- Split data in training and test dataset
 - To create training and test data that fit the criteria (algorithm does not see test data): Use entire dataset as a test and remov last `prediction_length` points of each time series for training
 
- Avoid using very large values for the `prediction_length` becaust this makes the model slow and less accuracte
 - In those cases: Consider aggregating data at a higher frequence (5min instead of 1min).

- Lags enable model to look further back in the time series than the value specified for `context_length`. 
 - Hence, `context_length` does not need to be large
 - Recommendation: Start with `context_length` equal to `prediction_length`. 

- In general, it is recommended to train DeepAR on as many time series as are available.
 - Standard forecasting algorithms, such as ARIMA or ETS, might provide more accurate results on single time series as well as to a moderate number of series.

> DeepAR starts to outperform the standard methods when your dataset contains hundreds of related time series. Currently, DeepAR requires that the total number of observations available across all training time series is at least 300. 

## EC2 Instance Recommendations for the DeepAR Algorithm 

Training: 

- DeepAR can be trained on GPU and CPU instances in both single and multi-machine settings
- Recommend starting with a single CPU instance, i.e. `ml.c4.2xlage` or `ml.c4.4xlarge`
 - Switch to GPU instances and multiple machines only when necessary. 

Inference: 

- DeepAR supports only CPU instances. 

Job failures:

- large values for `context_length`, `prediction_length`, `num_cells`, `num_layers` or `mini_batch_size` can create models that are too large for small instances. 
- This may also occur when running hyperparameter tuning jobs. 
- In that case, use an instance type large enough for the model tuning job and consider limiting the upper values for the critical parameters to avoid job failures.

## How the DeepAR Algorithm Works

- AWS [How the DeepAR Algorithm Works](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_how-it-works.html)

During training: 

- accepts training dataset and optional test dataset
- test data to evaluate the trained model.
- Both, training and test datasetes consist of one or, preferably, more target time series
- Each target time series can optionally be associated with a vector of feature time series and a vector of categorical features. 

### Example

(Notation different to paper/above?)

- Training set i consists of 
- a target time series $z_{i,t}$ 
- two associated feature time series, $x_{i,1,t}$ and $x_{i,2,t}$:

<img src="../images/deepar_example.png">

DeepAR only supports feature time series that known in the future. Why? Allows to run "what if" scenarios, like "what happens, if I change the price of a prodcut in some way?
 
### How Feature Time Series Work in the DeepAR Algorithm

For time-dependent patterns, such as spikes during weekends, 

- DeepAR automatically creates feature time series based on the frequency of the target time series.

Example (weekly data): 

- DeepAR creates two feature time series
 - day of the month
 - day of the year
 
<img src="../images/deepar_weekly_features.png">

The following table lists the derived features for the supported basic time frequencies: 

<img src="../images/frequency_feature_table.png">

### How are the models trained in DeepAR?

- Random sampling: Several training examples from each of the time series in the training dataset are randomly sample. 
 - Training examples: Consists of a parif of adjacent context and prediction windows with fixed predefined lengths. 
  - `context_length`: Controls how far in the past the network can see
  - `prediction_length`: How far in the future predictions can be made. 

In the following figure five samples with context lengths of 12 hours and prediction lengths of 6 hours are drawn from element i. The feature time series $x_{i,1,t}$ and $u_{i,2,t}$ are ommited for brevity.

<img src="../images/deepar_example2.png">

- DeepAR automatically feeds lagged values from the target time series (hence autoregressive). 

Here, in the example with hourly frequency: 

- Model exposes the $t_{i,t}$ values, which occured approximately one, two, and three days in the past. 

<img src="../images/deepar_lag.png">

#### Inference 

- takes target time series as input (independent of whether it was trained on the series or not
- forecasts a probability distribution for the next `prediction_length` values. 
- DeepAR is trained on the entire dataset, hence forecasts takes into account patterns learned from similar time series. 

### DeepAR Hyperparameters

[DeepAR hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html): 

- `context_length`
- `epochs`
- `prediction_length` - Forecast horizon
- `time_freq` - Time frequency
 - required to select appropriate data features and lags
 - M: monthly; W: weekly, D: daily, H: hourly; min: every minute
- `cardinality` - Array specifying the number of categories (groups) if categorical features (`cat`) are used
- `dropout_rate` - The dropout rate to use during training. Zoneout regularization is used. 
- `early_stopping_patience` - Training stops when no prograss is made within specified number of epochs. Model with lowest loss is returned as final model. 
- `embedding_dimension` - Model can learn group-level time series patterns, where an embedding vector of this size for each group is used. 
- `learning_rate` - Learning rate used in training. 
- `likelihood` - Distribution used for generating a probabilisitic forecast. 
 - gaussian, beta, negative-binomial, student-t, deterministic-L1
- `mini_batch_size` - The size of mini-batches used during training.
- `num_cells` - Cells to use in each hidden layer
- `num_dynamic_feat` - ...
- `num_eval_sampels` - The number of samples that are used per time-series when calculating test accuracy metrics. 
- `num_layers` - The number of hidden  layers in the RNN. Defualt value: 2
- `test_quantiles` - Quantiles for which to calculate quantile loss on the test channel.

## Tune a DeepAR Model

- Automatic model tuning (hyperparameter tuning) finds best version of a model by running many jobs that test a range of values
- objective metric can be predetermined
- More information about [automatic model tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html)

#### Metrics computed by the DeepAR algorithm

During training, three metrics are reported. Choose one of these as the objective:

RMSE: 
- `test:RMSE` - RMSE between forecast and actual target computed on the test set 

$$
RMSE = \sqrt{ \frac{1}{nT} \sum_{t=1}^{T} (\hat{y}_{i,t} - y_{i,t})^2  }
$$

where $y_{i,t}$ is the true value of time series i at the time t. $\hat{y}_{i,t}$ is the mean prediction. The sum is over all n time series in the test set and over the last T time points for each series. T corresponds to the forecast horizon (come on, why T?). 

- `test:mean_wQuantileLoss` - Average overall weighted quantile loss on the test set. Control which quantiles are used by setting the `test_quantiles` hyperparameter. 
- For a quantile in the range [0,1], the weighted quantile loss is defined as:
 
$$
wQuantileLoss_{\tau} = 2 \frac{\sum_{i,t}Q_{i,t}^{(\tau)} }{\sum_{y,t} |y_{i,t}| }
$$

where 

\begin{equation}
  Q_{i,t}^{(\tau)}=\begin{cases}
    (1-\tau)\: |q_{i,t}^{(\tau)} - y_{i,t}| , & \text{if $q_{i,t}^{(\tau)} > y_{i,t}$}.\\
    \tau \: |q_{i,t}^{(\tau)} - y_{i,t}|, & \text{otherwise}.
  \end{cases}
\end{equation}

$q_{i,t}^{(\tau)}$ is the $\tau$-quantile of the distribution that the model predicts. 

- `train:final_loss` - The training negative log-likelihood loss averaged over the last training epoch for the model. 

#### Tunable Hyperparameters for the DeepAR Algorithm

After evaluating the test/training loss it may be necessary to tune some hyperparameters.

The hyperparameters that have the greatest impact, listed in order from the most to least impactful, on DeepAR objective metrics are: 

- `epochs`
- `context_length`
- `mini_batch_size`
- `learning_rate`
- `num_cells`

<img src="../images/deepar_hyperparameters.png">



## DeepAR Inference Formats

### DeepAR JSON Request Formats

Endpoint takes the following JSON request format:

- `instances` field corresponds to the time series that should be forecasted by the model.
- Provide `cat` for each instance if model was trained with categories. Should be omitted otherwise
- If trained with custom dynamic features, provide same number of `dynamic_feat` values for each instance
 - Each should have a length given by `length(target)` + `prediction_length`, where `prediction_length` corresponds to forecast horizon. 

Request: 

```
{
     "instances": [
         {
             "start": "2009-11-01 00:00:00",
             "target": [4.0, 10.0, "NaN", 100.0, 113.0],
             "cat": [0, 1],
             "dynamic_feat": [[1.0, 1.1, 2.1, 0.5, 3.1, 4.1, 1.2, 5.0, ...]]
         },
         {
             "start": "2012-01-30",
             "target": [1.0],
             "cat": [2, 1],
             "dynamic_feat": [[2.0, 3.1, 4.5, 1.5, 1.8, 3.2, 0.1, 3.0, ...]]
         },
         {
             "start": "1999-01-30",
             "target": [2.0, 1.0],
             "cat": [1, 3],
             "dynamic_feat": [[1.0, 0.1, -2.5, 0.3, 2.0, -1.2, -0.1, -3.0, ...]]
         }
     ],
     "configuration": {
     "num_samples": 50,
     "output_types": ["mean", "quantiles", "samples"],
     "quantiles": ["0.5", "0.9"]
     }
}

```

`configuration` is optional: 
 
- `configuration.num_samples` - sets number of sample paths that the model generates to estimate the mean and quantiles. 
- `configuration.output_types`- describes information that will be returned in the request. 
 - valid values are: "mean", "quantiles, "samples"
 - "quantiles" returns each of the quantile values in `configuration.quantiles` 
 - Specifying `samples` returns the raw samples used to calculate the other outputs. 
 
### DeepAR JSON Response Formats

Response , where `[...]` are arrays of numbers: 

```
{
     "predictions": [
         {
             "quantiles": {
                 "0.9": [...],
                 "0.5": [...]
             },
             "samples": [...],
             "mean": [...]
             },
             {
                 "quantiles": {
                     "0.9": [...],
                     "0.5": [...]
                 },
                 "samples": [...],
                 "mean": [...]
             },
             {
                 "quantiles": {
                     "0.9": [...],
                     "0.5": [...]
                 },
                 "samples": [...],
                 "mean": [...]
             }
     ]
}
```

- Response timeout of 60 seconds
- When passing multiple time series in a single request, forecasts are generated sequentially
- Forecast for each series typically takes about 300 to 1000 milliseconds, depending on model size, etc.
- Passing too many series may cause tieouts. 
- Better: Send fewer series per request and send more requests (HOW?)
 - Because DeepAR uses multiple workers per instance, this achieves much higher throughpout by sending multiple requests in parallel. 
- DeepAR uses one worker per CPU for inference (defualt)
- Number of workers for inference can be overwritten when calling the SageMaker [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateModel.html) API.

## Batch Transform with the DeepAR Algorithm

- Getting inferences by using batch transform from data using the JSON Lines format. 
- Each record is represented on a signle line as a JSON object, and lines are separated by newline characters. 

For example: 

```
{"start": "2009-11-01 00:00:00", "target": [4.3, "NaN", 5.1, ...], "cat": [0, 1], "dynamic_feat": [[1.1, 1.2, 0.5, ..]]}

{"start": "2012-01-30 00:00:00", "target": [1.0, -5.0, ...], "cat": [2, 3], "dynamic_feat": [[1.1, 2.05, ...]]}

{"start": "1999-01-30 00:00:00", "target": [2.0, 1.0], "cat": [1, 4], "dynamic_feat": [[1.3, 0.4]]}
```

Note: 

- Set `BatchStrategy`value to `SingleRecord` and 
- set the `SplitType` value in the `TransformInput` configuration to `Line`
- default values currently cause runtime failures. 

Configuration field: 

- Set once for the entire batch inference job using environment variable named `DEEPAR_INFERENCE_CONFIG`
 - Value can be passed when model is created by calling `CreateTransformJob` API. 
 
If `DEEPAR_INFERENCE_CONFIG` is missing in the container environment, the inference container uses the following default: 
 
```
{
 "num_samples": 100,
 "output_types": ["mean", "quantiles"],
 "quantiles": ["0.1", "0.2", "0.3", "0.4", "0.5", "0.6", "0.7", "0.8", "0.9"]
}

```

- Output is also in JSON Lines format. Predictions are encoded as objects identical to the ones returned by responses in online inference mode

For example: 

```
{ "quantiles": { "0.1": [...], "0.2": [...] }, "samples": [...], "mean": [...] }
``` 

In `TransformInput` configuration of the `CreateTransformJob` request: 

- Set `AssembleWith` value to `Line`, as the default `None` concatenates all JSON objects on the same line. 

For example, SageMaker `CreateTransformJob` request for a DeepAR job with a custom `DEEPAR_INFERENCE_CONFIG`: 

```
{
     "BatchStrategy": "SingleRecord",
     "Environment": {
         "DEEPAR_INFERENCE_CONFIG" : "{ \"num_samples\": 200, \"output_types\": [\"mean\"] }",
         ...
    },
     "TransformInput": {
     "SplitType": "Line",
     ...
     },
     "TransformOutput": {
         "AssembleWith": "Line",
         ...
     },
     ...
}
```

In [2]:
!!jupyter nbconvert "DeepAR - 01 - Theory".ipynb

['[NbConvertApp] Converting notebook DeepAR - 01 - Theory.ipynb to html',
 '[NbConvertApp] Writing 284541 bytes to DeepAR - 01 - Theory.html']

In [None]:
print("Done!")