From c7cc6a962e822f42764b9a730ea0e93cdbd5dd6c Mon Sep 17 00:00:00 2001
From: Ryan Russell <git@ryanrussell.org>
Date: Mon, 27 Jun 2022 15:17:07 -0500
Subject: [PATCH 1/2] docs: readability improvements r2

Signed-off-by: Ryan Russell <git@ryanrussell.org>
---
 docs/userguide/covariates.md               | 2 +-
 docs/userguide/forecasting_overview.md     | 8 ++++----
 docs/userguide/gpu_and_tpu_usage.md        | 2 +-
 docs/userguide/torch_forecasting_models.md | 6 +++---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/userguide/covariates.md b/docs/userguide/covariates.md
index 6ffbf688e9..db770b89a2 100644
--- a/docs/userguide/covariates.md
+++ b/docs/userguide/covariates.md
@@ -184,7 +184,7 @@ all_past_covariates = [past_covariates1, past_covariates2, ...]
 model = NBEATSModel(input_chunk_length=1, output_chunk_length=1)
 
 model.fit(all_targets,
-          past_covariates=all_past_covarites)
+          past_covariates=all_past_covariates)
 
 pred = model.predict(n=1,
                      series=all_targets[0],
diff --git a/docs/userguide/forecasting_overview.md b/docs/userguide/forecasting_overview.md
index 5501d58ccc..21f79ecaa8 100644
--- a/docs/userguide/forecasting_overview.md
+++ b/docs/userguide/forecasting_overview.md
@@ -7,7 +7,7 @@ Below, we give an overview of what these features mean.
 
 ## Generalities
 
-All forecasting models work in the same way: first they are built (taking some hyper-paramers in argument), then they are fit on one or several series
+All forecasting models work in the same way: first they are built (taking some hyper-parameters in argument), then they are fit on one or several series
 by calling the `fit()` function, and finally they are used to obtain one or several forecasts by calling the `predict()` function.
 
 **Example:**
@@ -35,7 +35,7 @@ model.fit([series1, series2])                              # fit on two series
 forecast = model.predict(series=[series3, series4], n=36)  # predict potentially different series
 ```
 
-Furthermore, we define the following types of time series consummed by the models:
+Furthermore, we define the following types of time series consumed by the models:
 
 * **Target series:** the series that we are interested in forecasting.
 * **Covariate series:** some other series that we are not interested in forecasting, but that can provide valuable inputs to the forecasting model.
@@ -74,7 +74,7 @@ These models are shown with a "✅" under the `Multivariate` column on the [mode
 
 Some models support being fit on multiple time series. To do this, it is enough to simply provide a Python `Sequence` of `TimeSeries` (for instance a list of `TimeSeries`) to `fit()`. When a model is fit this way, the `predict()` function will expect the argument `series` to be set, containing
 one or several `TimeSeries` (i.e., a single or a `Sequence` of `TimeSeries`) that need to be forecasted. 
-The advantage of training on multiple series is that a single model can be exposed to more patterns occuring across all series in the training dataset. That can often be beneficial, especially for more expre based models.
+The advantage of training on multiple series is that a single model can be exposed to more patterns occurring across all series in the training dataset. That can often be beneficial, especially for more expre based models.
 
 In turn, the advantage of having `predict()` providing forecasts for potentially several series at once is that the computation can often be batched and vectorized across the multiple series, which is computationally faster than calling `predict()` multiple times on isolated series.
 
@@ -167,7 +167,7 @@ pred.plot(label='forecast')
 ![TCN Laplace regression](./images/probabilistic/example_tcn_laplace.png)
 
 
-It is also possible to perform quantile regression (using arbitrary quantiles) with neural networks, by using [darts.utils.likelihood_models.QuantileRegression](https://unit8co.github.io/darts/generated_api/darts.utils.likelihood_models.html#darts.utils.likelihood_models.QuantileRegression), in which case the network will be trained with the pinball loss. This produces an empirical non-parametric distrution, and it can often be a good option in practice, when one is not sure of the "real" distribution, or when fitting parametric likelihoods give poor results.
+It is also possible to perform quantile regression (using arbitrary quantiles) with neural networks, by using [darts.utils.likelihood_models.QuantileRegression](https://unit8co.github.io/darts/generated_api/darts.utils.likelihood_models.html#darts.utils.likelihood_models.QuantileRegression), in which case the network will be trained with the pinball loss. This produces an empirical non-parametric distribution, and it can often be a good option in practice, when one is not sure of the "real" distribution, or when fitting parametric likelihoods give poor results.
 For example, the code snippet below is almost exactly the same as the preceding snippet; the only difference is that it now uses a `QuantileRegression` likelihood, which means that the neural network will be trained with a pinball loss, and its number of outputs will be dynamically configured to match the number of quantiles.
 
 ```python
diff --git a/docs/userguide/gpu_and_tpu_usage.md b/docs/userguide/gpu_and_tpu_usage.md
index 3b345df4d1..1877863b28 100644
--- a/docs/userguide/gpu_and_tpu_usage.md
+++ b/docs/userguide/gpu_and_tpu_usage.md
@@ -82,7 +82,7 @@ Epoch 299: 100% 8/8 [00:00<00:00, 42.49it/s, loss=0.00285, v_num=logs]
 Now the model is ready to start predicting, which won't be shown here since it's included in the example linked in the start of this guide.
 
 ## Use a GPU
-GPUs can dramatically improve the performance of your model in terms of processing time. By using an Accelarator in the [Pytorch Lightning Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#accelerator), we can enjoy the benefits of a GPU. We only need to instruct our model to use our machine's GPU through PyTorch Lightning Trainer parameters, which are expressed as the `pl_trainer_kwargs` dictionary, like this:
+GPUs can dramatically improve the performance of your model in terms of processing time. By using an Accelerator in the [Pytorch Lightning Trainer](https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#accelerator), we can enjoy the benefits of a GPU. We only need to instruct our model to use our machine's GPU through PyTorch Lightning Trainer parameters, which are expressed as the `pl_trainer_kwargs` dictionary, like this:
 ```python
 my_model = RNNModel(
     model="RNN",
diff --git a/docs/userguide/torch_forecasting_models.md b/docs/userguide/torch_forecasting_models.md
index 90b116788e..15a4c4306d 100644
--- a/docs/userguide/torch_forecasting_models.md
+++ b/docs/userguide/torch_forecasting_models.md
@@ -130,7 +130,7 @@ We also noticed that our ice-cream sales depend on the day of the week so we wan
 - past covariates: measured average daily temperatures in the past `temperature`
 - future covariates: day of the week for past and future `weekday`
 
-Checking Table 1, a model that would accomodate this kind of covariates would be a
+Checking Table 1, a model that would accommodate this kind of covariates would be a
 `SplitCovariatesModel` (if we don't use historic values of future covariates), or
 `MixedCovariatesModel` (if we do). We choose a `MixedCovariatesModel` - the `TFTModel`.
 
@@ -233,7 +233,7 @@ want to predict - the forecast horizon `n` - we distinguish between two cases:
 - If `n > output_chunk_length`: we must predict `n` by calling the internal model multiple times. Each call outputs `output_chunk_length` prediction points. We go through as many calls as needed until we get to the final `n` prediction points, in an auto-regressive fashion.
     - in our example: predict ice-cream sales for the next 3 days at once (`n = 3`)
 
-  To do this we have to supply additional `past_covariates` for the next `n - output_chunk_length = 2` time steps (days) after the end of our 365 days training data. Unfortunately, we do not have measured `temperture` for the future. But let's assume we have access to temperature forecasts for the next 2 days. We can just append them to `temperature` and the prediction will work!
+  To do this we have to supply additional `past_covariates` for the next `n - output_chunk_length = 2` time steps (days) after the end of our 365 days training data. Unfortunately, we do not have measured `temperature` for the future. But let's assume we have access to temperature forecasts for the next 2 days. We can just append them to `temperature` and the prediction will work!
 
   ```python
   temperature = temperature.concatenate(temperature_forecast, axis=0)
@@ -271,7 +271,7 @@ In many cases using a GPU will provide a drastic speedup compared to CPU.
 It can also incur some overheads (for transferring data to/from the GPU),
 so some testing and tuning is often necessary.
 We refer to our [GPU/TPU guide](https://unit8co.github.io/darts/userguide/gpu_and_tpu_usage.html)
-for more informations on how to setup a GPU (or a TPU) via PyTorch Lightning.
+for more information on how to setup a GPU (or a TPU) via PyTorch Lightning.
 
 ### Tune the batch size
 A larger batch size tends to speed up the training because it reduces the number

From f0b4db3214035d0ac002341d24ac161d36789eee Mon Sep 17 00:00:00 2001
From: Ryan Russell <ryanrussell@users.noreply.github.com>
Date: Wed, 29 Jun 2022 08:18:36 -0500
Subject: [PATCH 2/2] Update docs/userguide/forecasting_overview.md

Co-authored-by: Julien Herzen <j.herzen@gmail.com>
---
 docs/userguide/forecasting_overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/userguide/forecasting_overview.md b/docs/userguide/forecasting_overview.md
index 21f79ecaa8..8a364eb175 100644
--- a/docs/userguide/forecasting_overview.md
+++ b/docs/userguide/forecasting_overview.md
@@ -74,7 +74,7 @@ These models are shown with a "✅" under the `Multivariate` column on the [mode
 
 Some models support being fit on multiple time series. To do this, it is enough to simply provide a Python `Sequence` of `TimeSeries` (for instance a list of `TimeSeries`) to `fit()`. When a model is fit this way, the `predict()` function will expect the argument `series` to be set, containing
 one or several `TimeSeries` (i.e., a single or a `Sequence` of `TimeSeries`) that need to be forecasted. 
-The advantage of training on multiple series is that a single model can be exposed to more patterns occurring across all series in the training dataset. That can often be beneficial, especially for more expre based models.
+The advantage of training on multiple series is that a single model can be exposed to more patterns occurring across all series in the training dataset. That can often be beneficial, especially for larger models with more capacity.
 
 In turn, the advantage of having `predict()` providing forecasts for potentially several series at once is that the computation can often be batched and vectorized across the multiple series, which is computationally faster than calling `predict()` multiple times on isolated series.