---
<h1 align="center"><span style='font-family:Georgia'> Training Bitcoin Dataset on the linear and formers models </span></h1>
    
---
The aim of this section is to train five diffrent models (DLinear, NLinear, Informer, FEDfromer and PatchTST) on Bitcoin cryptocurrency datsets, that preprocessed and explored in `Crypto-Forecasting-EDA-Testing-on-5models` notebook. For each model we will use four diffrent combination of hyperparameters.


---
## Inputs and Outputs
---
##### **Input sequence:**
The input to the Informer model is a time series sequence of fixed length. The sequence contains a set of features, where each feature corresponds to a value measured at a specific time step in the sequence. 
- The length of the input sequence is defined by the `seq_len` hyperparameter.

##### **Output sequence:**

The output of the Informer model is a time series sequence of predicted values, where each value corresponds to a prediction made at a specific time step in the output sequence. 
- The length of the output sequence is defined by the `pred_len` hyperparameter.

---
## Methodology
---

Apply the same Pipline from (Zeng, 2023) to train the linear and formers models  on `Bitcoin` cryptocurrency datset. Each model is trained using four different combinations of hyperparameters.

**The hyperparameters that are varied for each model include the following:**

| Hyperparameter | Description |
|---|---|
| pred_len | The length of the prediction horizon. |[96, 129, 336, 720]|
| seq_len | The length of the sequence that is used to make a prediction. |336| 
| enc_in | The size of the input sequence to the encoder. | 9|
| dec_in | The size of the input sequence to the decoder. | 9|
| c_out | The size of the output sequence . | 1|


### **Results**

<!-- The results of the training are evaluated using the mean absolute error (MAE) metric. The MSE and MAE is calculated for each trail and the results are reported in the notebook. -->

### **Conclusion**
<!-- - The Informer model consistently outperforms the DLinear and PatchTST models.
- The best performing Informer model is the one with a pred_len of 336, seq_len of 1024, embed_dim of 128, d_model of 256, num_heads of 8, and num_layers of 6.
- The MAE for the best performing Informer model is 0.0031.
> Overall, the results of the training suggest that the Informer model is the best model for forecasting Bitcoin prices. The Informer model is able to achieve a low MAE, which indicates that it is able to make accurate predictions.
 -->


---
# Setup
---

In [4]:
!pip install einops 

Collecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m221.4 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.6.1


In [5]:
!pip install einsum

Collecting einsum
  Downloading einsum-0.3.0-py3-none-any.whl (5.1 kB)
Installing collected packages: einsum
Successfully installed einsum-0.3.0


**Clone the repository from github**

In [3]:
# !git clone https://github.com/debi2023-group3/Time-Series-Forcasting-Group3.git

**Add project_files to system path**

In [3]:
import sys
# if not 'Time-Series-Forcasting-Group3' in sys.path:
#     sys.path += ['Time-Series-Forcasting-Group3']
    
sys.path

['/home/jovyan/Transformer-based-solutions-for-the-long-term-time-series-forecasting',
 '/opt/conda/lib/python39.zip',
 '/opt/conda/lib/python3.9',
 '/opt/conda/lib/python3.9/lib-dynload',
 '',
 '/opt/conda/lib/python3.9/site-packages']

**Important library**

In [4]:
import torch
import random
import numpy as np

from exp.exp_DLinear import Exp_Main as Exp_DLinear
from exp.exp_NLinear import Exp_Main as Exp_NLinear
from exp.exp_Informer import Exp_Informer
from exp.exp_FEDformer import Exp_FEDFormer
from exp.exp_PatchTST import Exp_Main as exp_PatchTST

In [5]:
class dotdict(dict):
    """dot.notation access to dictionary attributes"""
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__
    
args = dotdict()

In [6]:
fix_seed = 2021
random.seed(fix_seed)
torch.manual_seed(fix_seed)
np.random.seed(fix_seed)

**GPU Device Hyperparameters**

In [11]:
######################### Device Hyperparameters  ##########################
args.use_multi_gpu = False
args.num_workers = 0
args.use_gpu = False #torch.cuda.is_available() 
# args.gpu = 0                           # The index of the GPU to use.

**Dataset Hyperparameters**

In [12]:
######################### Dataset Hyperparameters  ##########################

used_data='crypto_h1'
args.data = 'custom'           # dataset name
args.root_path = './Datasets/CustomData/' # root path of data file
args.data_path = 'crypto_h1.csv' # data file
args.features = 'MS'           # forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate
args.target = 'Target'         # target feature in S or MS task
args.freq = 'h'                # freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h
args.timeenc = 1
"""

"""
args.embed = 'timeF'           # time features (date) encoding, options:[timeF, fixed, learned]
args.padding = 0               # the amount of padding to add to the input sequence of the Informer model

**Modeling Hyperparameters**

In [13]:
#### [Encoder and Decoder] - ProbSparse Self-attention Hyperparameters  #####
args.attn = 'prob'              # attention used in encoder, options:[prob, full]
args.d_model = 512              # dimension of model (the dimensionality of the input feature vectors, as well as the query, key, and value vectors in each attention head. )
args.n_heads = 8                # num of heads
args.factor = 5                 # probsparse attn factor

args.dropout = 0.1              # dropout
args.d_ff = 2048                # dimension of fcn in model
args.activation = 'gelu'        # activation used in fcn (gelu => Gaussian Error Linear Unit.)
args.mix = True                 # apply a linear projection to the concatenated outputs of the attention heads. 

# concat[start token series(label_len), zero padding series(pred_len)]
args.enc_in = 24                # encoder input size
args.dec_in = 24                # decoder input size
args.c_out = 1                  # output size
args.e_layers = 2               # num of encoder layers
args.d_layers = 1               # num of decoder layers

args.seq_len = 336              # input sequence length of DLinear encoder
args.label_len = 48             # start token length of DLinear decoder
args.pred_len = 96              # prediction sequence length


"""
The Distilled Informer architecture is a variant of the standard Informer model that
uses fewer layers and fewer attention heads, making it more computationally efficient.
    (1) 1*3 conv1d with ELU activation
    (2) Max pooling  with strid = 2
""" 
args.distil = True  

**Experiment Hyperparameters**



In [14]:
######################### Experiment Hyperparameters  ##########################

args.output_attention = False # whether to output attention in ecoder
args.use_amp = False          # whether to use automatic mixed precision training
args.train_only = True
args.train_epochs = 6         # The number of epochs to train for.
args.batch_size = 32          # The batch size of training input data.
args.learning_rate = 0.01     # learning rate starts from 1e−4, decaying two times smaller every epoch.
args.lradj = 'type1'          # learning rate decayed two times smaller every epoch.
args.loss = 'mse'             # evaluating criteria
args.patience = 3             # The number of epochs to wait before early stopping.
args.des = 'Exp'              # The description of the experiment.
args.itr  = 1

######################### Formers Hyperparameters  ##########################
args.modes=32
args.moving_avg=[5]
"""
0: No embeddings are used (default).
1: All three embeddings are used (value embedding + temporal embedding + positional embedding).
2: Value and temporal embeddings are used.
3: Value and positional embeddings are used.
4: Only value embedding is used.
"""
args.embed_type= 1


args.model = 'DLinear' 
args.checkpoints = './Checkpoints/DLinear_checkpoints' # location of model checkpoints


---
# Working on DLinear
---

## Trail 1: DLinear, Dataset:Bitcoin,  Metric: 96
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [11]:
args.model = 'DLinear' 
args.checkpoints = './Checkpoints/DLinear_checkpoints' # location of model checkpoints

Exp = Exp_DLinear

In [14]:
args.pred_len = 96
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of DLinear_train_on_crypto_h1_96:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 6, 'batch_size': 32, 'learning_rate': 0.01, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'DLinear', 'checkpoints': './Checkpoints/DLinear_checkpoints'}


### Training

In [11]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 87951
                                   Training                               
Epoch: 1, Iters: 300
--------------------------------------------------------------------------------
    Loss : 1.4127693 (MSE)
    Speed: 0.9177 sec/iter 
    Left time: 14855.9572 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 600
--------------------------------------------------------------------------------
    Loss : 1.1203606 (MSE)
    Speed: 0.9020 sec/iter 
    Left time: 14331.7390 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 900
--------------------------------------------------------------------------------
    Loss : 1.9532999 (MSE)
    Speed: 0.8753 sec/iter 
    Left time: 13645.6585 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1200
-----------------------------------------------------------------------

Model(
  (decompsition): series_decomp(
    (moving_avg): moving_avg(
      (avg): AvgPool1d(kernel_size=(25,), stride=(1,), padding=(0,))
    )
  )
  (Linear_Seasonal): Linear(in_features=336, out_features=96, bias=True)
  (Linear_Trend): Linear(in_features=336, out_features=96, bias=True)
)

### Testing

In [14]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 17581
mae:0.564278244972229, mse:0.8129333257675171, rmse:0.9016281366348267, mape:4.07287073135376, mspe:23168.220703125


---
## Trail 2: DLinear, Dataset: Bitcoin , Metric: 192
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary

In [13]:
args.pred_len = 192
args.train_epochs = 10
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of DLinear_train_on_crypto_h1_192:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 192, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 10, 'batch_size': 32, 'learning_rate': 0.01, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'DLinear', 'checkpoints': './Checkpoints/DLinear_checkpoints'}


### Training

In [13]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 87855
                                   Training                               
Epoch: 1, Iters: 500
--------------------------------------------------------------------------------
    Loss : 2.0627160 (MSE)
    Speed: 0.3744 sec/iter 
    Left time: 10089.3863 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1000
--------------------------------------------------------------------------------
    Loss : 1.5394423 (MSE)
    Speed: 0.3586 sec/iter 
    Left time: 9485.2107 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1500
--------------------------------------------------------------------------------
    Loss : 2.2001054 (MSE)
    Speed: 0.3760 sec/iter 
    Left time: 9757.8364 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 2000
-----------------------------------------------------------------------

Model(
  (decompsition): series_decomp(
    (moving_avg): moving_avg(
      (avg): AvgPool1d(kernel_size=(25,), stride=(1,), padding=(0,))
    )
  )
  (Linear_Seasonal): Linear(in_features=336, out_features=192, bias=True)
  (Linear_Trend): Linear(in_features=336, out_features=192, bias=True)
)

### Testing

In [None]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 17485
mae:0.5375852584838867, mse:0.7717941403388977, rmse:0.8785181641578674, mape:1.9065319299697876, mspe:1871.793701171875


---
## Trail 3: DLinear, Dataset: Bitcoin,  Metric: 336

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [14]:
args.pred_len = 336
args.train_epochs = 10

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of DLinear_train_on_crypto_h1_336:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 336, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 10, 'batch_size': 32, 'learning_rate': 0.01, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'DLinear', 'checkpoints': './Checkpoints/DLinear_checkpoints'}


### Training

In [44]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 49330
                                   Training                               
Epoch: 1, Iters: 100
--------------------------------------------------------------------------------
    Loss : 1.2003411 (MSE)
    Speed: 0.0704 sec/iter 
    Left time: 2161.6700 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 200
--------------------------------------------------------------------------------
    Loss : 1.3864771 (MSE)
    Speed: 0.0752 sec/iter 
    Left time: 2301.8974 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 300
--------------------------------------------------------------------------------
    Loss : 1.2943012 (MSE)
    Speed: 0.0740 sec/iter 
    Left time: 2258.5152 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 400
---------------------------------------------------------------------------

Model(
  (decompsition): series_decomp(
    (moving_avg): moving_avg(
      (avg): AvgPool1d(kernel_size=(25,), stride=(1,), padding=(0,))
    )
  )
  (Linear_Seasonal): Linear(in_features=336, out_features=336, bias=True)
  (Linear_Trend): Linear(in_features=336, out_features=336, bias=True)
)

### Testing

In [45]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9665
mae:0.4725356996059418, mse:0.4384952783584595, rmse:0.662189781665802, mape:1.7098760604858398, mspe:248.6994171142578


---
## Trail 4: DLinear, Dataset: Bitcoin,  Metric: 720

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [15]:
args.pred_len = 720
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of DLinear_train_on_crypto_h1_720:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 720, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 10, 'batch_size': 32, 'learning_rate': 0.01, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'DLinear', 'checkpoints': './Checkpoints/DLinear_checkpoints'}


### Training

In [47]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 48946
                                   Training                               
Epoch: 1, Iters: 100
--------------------------------------------------------------------------------
    Loss : 1.2753509 (MSE)
    Speed: 0.0767 sec/iter 
    Left time: 2337.7653 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 200
--------------------------------------------------------------------------------
    Loss : 1.0814701 (MSE)
    Speed: 0.0699 sec/iter 
    Left time: 2122.1606 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 300
--------------------------------------------------------------------------------
    Loss : 0.9717912 (MSE)
    Speed: 0.0762 sec/iter 
    Left time: 2306.6135 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 400
---------------------------------------------------------------------------

Model(
  (decompsition): series_decomp(
    (moving_avg): moving_avg(
      (avg): AvgPool1d(kernel_size=(25,), stride=(1,), padding=(0,))
    )
  )
  (Linear_Seasonal): Linear(in_features=336, out_features=720, bias=True)
  (Linear_Trend): Linear(in_features=336, out_features=720, bias=True)
)

### Testing

In [48]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9281
mae:0.47571229934692383, mse:0.4463936984539032, rmse:0.6681270003318787, mape:1.7608473300933838, mspe:280.3857116699219


### Conclusion



---
# Working on NLinear
---

## Trail 1: NLinear, Dataset:Bitcoin,  Metric: 96
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [16]:
args.model = 'NLinear' 
args.checkpoints = './Checkpoints/NLinear_checkpoints' # location of model checkpoints
args.learning_rate = 0.05
args.train_epochs = 10

Exp = Exp_NLinear

In [17]:
args.pred_len = 96
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of NLinear_train_on_crypto_h1_96:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 10, 'batch_size': 32, 'learning_rate': 0.05, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'NLinear', 'checkpoints': './Checkpoints/NLinear_checkpoints'}


### Training

In [15]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 87951
                                   Training                               
Epoch: 1, Iters: 1000
--------------------------------------------------------------------------------
    Loss : 0.9616043 (MSE)
    Speed: 0.3906 sec/iter 
    Left time: 10343.4494 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1500
--------------------------------------------------------------------------------
    Loss : 6.7267599 (MSE)
    Speed: 0.3874 sec/iter 
    Left time: 10065.3798 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 2000
--------------------------------------------------------------------------------
    Loss : 13.5494080 (MSE)
    Speed: 0.3834 sec/iter 
    Left time: 9769.0851 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 2500
--------------------------------------------------------------------

Model(
  (Linear): Linear(in_features=336, out_features=96, bias=True)
)

### Testing

In [17]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 17581
mae:0.5410618185997009, mse:0.7784981727600098, rmse:0.8823254108428955, mape:2.5675642490386963, mspe:9695.2822265625


---
## Trail 2: NLinear, Dataset: Bitcoin , Metric: 192
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary

In [18]:
args.pred_len = 192
args.train_epochs = 10
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of NLinear_train_on_crypto_h1_192:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 192, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 10, 'batch_size': 32, 'learning_rate': 0.05, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'NLinear', 'checkpoints': './Checkpoints/NLinear_checkpoints'}


### Training

In [19]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 87855
                                   Training                               
Epoch: 1, Iters: 500
--------------------------------------------------------------------------------
    Loss : 4.8056917 (MSE)
    Speed: 0.2131 sec/iter 
    Left time: 5742.7148 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1000
--------------------------------------------------------------------------------
    Loss : 1.9776108 (MSE)
    Speed: 0.2052 sec/iter 
    Left time: 5427.3968 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1500
--------------------------------------------------------------------------------
    Loss : 16.6661854 (MSE)
    Speed: 0.2176 sec/iter 
    Left time: 5646.9240 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 2000
-----------------------------------------------------------------------

Model(
  (Linear): Linear(in_features=336, out_features=192, bias=True)
)

### Testing

In [20]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 17485
mae:0.5417881011962891, mse:0.7799254059791565, rmse:0.8831338286399841, mape:2.4777004718780518, mspe:6836.96630859375


---
## Trail 3: NLinear, Dataset: Bitcoin,  Metric: 336

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [19]:
args.pred_len = 336
args.train_epochs = 14

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of NLinear_train_on_crypto_h1_336:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 336, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 14, 'batch_size': 32, 'learning_rate': 0.05, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'NLinear', 'checkpoints': './Checkpoints/NLinear_checkpoints'}


### Training

In [None]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 87711
                                   Training                               
Epoch: 1, Iters: 500
--------------------------------------------------------------------------------
    Loss : 33.6948242 (MSE)
    Speed: 0.4575 sec/iter 
    Left time: 17321.9003 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1000
--------------------------------------------------------------------------------
    Loss : 5.4512706 (MSE)
    Speed: 0.4376 sec/iter 
    Left time: 16348.7512 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 1500
--------------------------------------------------------------------------------
    Loss : 2.1838574 (MSE)
    Speed: 0.4184 sec/iter 
    Left time: 15422.8138 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 2000
--------------------------------------------------------------------

### Testing

In [None]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

---
## Trail 4: NLinear, Dataset: Bitcoin,  Metric: 720

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [20]:
args.pred_len = 720
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of NLinear_train_on_crypto_h1_720:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 720, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 14, 'batch_size': 32, 'learning_rate': 0.05, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'NLinear', 'checkpoints': './Checkpoints/NLinear_checkpoints'}


### Training

In [36]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 48946
                                   Training                               
Epoch: 1, Iters: 100
--------------------------------------------------------------------------------
    Loss : 0.8163344 (MSE)
    Speed: 0.0179 sec/iter 
    Left time: 544.9103 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 200
--------------------------------------------------------------------------------
    Loss : 1.4792706 (MSE)
    Speed: 0.0190 sec/iter 
    Left time: 575.7243 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 300
--------------------------------------------------------------------------------
    Loss : 0.8873149 (MSE)
    Speed: 0.0180 sec/iter 
    Left time: 545.5953 sec
--------------------------------------------------------------------------------
Epoch: 1, Iters: 400
------------------------------------------------------------------------------

Model(
  (Linear): Linear(in_features=336, out_features=720, bias=True)
)

### Testing

In [37]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9281
mae:0.4768030047416687, mse:0.45033928751945496, rmse:0.6710732579231262, mape:1.8766618967056274, mspe:332.4039001464844


---
# Working on Informer
---

## Trail 1: Informer, Dataset:Bitcoin,  Metric: 96
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [15]:
args.model = 'informer' 
args.checkpoints = './Checkpoints/Informer_checkpoints' # location of model checkpoints
args.pred_len = 96
args.train_epochs = 32
Exp = Exp_Informer

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of informer_train_on_crypto_h1_96:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': False, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 32, 'batch_size': 32, 'learning_rate': 0.01, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'informer', 'checkpoints': './Checkpoints/Informer_checkpoints'}


### Training

In [None]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use CPU


### Testing

In [None]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

---
## Trail 2: Informer, Dataset: Bitcoin , Metric: 192
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary

In [53]:
args.train_epochs = 20
args.pred_len = 192
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of informer_train_on_Bitcoin_192:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 192, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'informer', 'checkpoints': './Checkpoints/Informer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'gpu': 0}


### Training

In [54]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34473
val 4810
test 9809
	iters: 100, epoch: 1 | loss: 0.7687694
	speed: 0.1146s/iter | left time: 2457.1733s
	iters: 200, epoch: 1 | loss: 0.8300142
	speed: 0.1140s/iter | left time: 2433.1966s
	iters: 300, epoch: 1 | loss: 0.7999618
	speed: 0.1125s/iter | left time: 2390.2526s
	iters: 400, epoch: 1 | loss: 0.7695041
	speed: 0.1018s/iter | left time: 2152.7527s
	iters: 500, epoch: 1 | loss: 1.3740997
	speed: 0.1317s/iter | left time: 2770.3412s
	iters: 600, epoch: 1 | loss: 1.0556828
	speed: 0.1189s/iter | left time: 2489.8620s
	iters: 700, epoch: 1 | loss: 0.7324647
	speed: 0.1209s/iter | left time: 2520.5164s
	iters: 800, epoch: 1 | loss: 0.7752126
	speed: 0.1181s/iter | left time: 2449.9250s
	iters: 900, epoch: 1 | loss: 0.9248596
	speed: 0.1215s/iter | left time: 2507.3645s
	iters: 1000, epoch: 1 | loss: 0.8448135
	speed: 0.1186s/iter | left time: 2436.0164s
Epoch: 1 cost time: 126.9002776145935
Epoch: 1, Steps: 1077 | Train Loss: 1.0197096 Vali Loss: 0.14667

Informer(
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=True)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=True)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderLayer(
        (attention): AttentionLayer(
          (inner_attention): ProbAttention(
            (dropout): Dropout(p=0.

### Testing

In [55]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9809
test shape: (306, 32, 192, 1) (306, 32, 192, 1)
test shape: (9792, 192, 1) (9792, 192, 1)
mae:0.4178374707698822, mse:0.3429669439792633, rmse:0.5856338143348694, mape:1.4582691192626953, mspe:412.1361999511719


---
## Trail 3: Informer, Dataset: Bitcoin,  Metric: 336

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [11]:
args.pred_len = 336
args.train_epochs = 20

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of informer_train_on_Bitcoin_336:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 336, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'informer', 'checkpoints': './Checkpoints/Informer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1}


### Training

In [12]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34329
val 4666
test 9665
	iters: 100, epoch: 1 | loss: 1.0690626
	speed: 0.1383s/iter| left time: 2952.1104s
	iters: 200, epoch: 1 | loss: 1.0134298
	speed: 0.1280s/iter| left time: 2719.0130s
	iters: 300, epoch: 1 | loss: 0.9707757
	speed: 0.1182s/iter| left time: 2498.5085s
	iters: 400, epoch: 1 | loss: 1.0381221
	speed: 0.1298s/iter| left time: 2731.4086s
	iters: 500, epoch: 1 | loss: 1.0073758
	speed: 0.1273s/iter| left time: 2665.3104s
	iters: 600, epoch: 1 | loss: 0.6325625
	speed: 0.1110s/iter| left time: 2313.0695s
	iters: 700, epoch: 1 | loss: 0.8685836
	speed: 0.1114s/iter| left time: 2309.7441s
	iters: 800, epoch: 1 | loss: 0.6552352
	speed: 0.1062s/iter| left time: 2192.4018s
	iters: 900, epoch: 1 | loss: 0.9755502
	speed: 0.1089s/iter| left time: 2237.4928s
	iters: 1000, epoch: 1 | loss: 0.9299371
	speed: 0.1033s/iter| left time: 2111.3437s
Epoch: 1 cost time: 127.57613849639893
Epoch: 1, Steps: 1072 | Train Loss: 1.0318637 Vali Loss: 0.1469937 Test L

Informer(
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=True)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=True)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderLayer(
        (attention): AttentionLayer(
          (inner_attention): ProbAttention(
            (dropout): Dropout(p=0.

### Testing

In [13]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9665
test shape: (302, 32, 336, 1) (302, 32, 336, 1)
test shape: (9664, 336, 1) (9664, 336, 1)
mae:0.4178079068660736, mse:0.34347304701805115, rmse:0.5860657095909119, mape:1.4205414056777954, mspe:364.66943359375


---
## Trail 4: Informer, Dataset: Bitcoin,  Metric: 720

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [14]:
args.pred_len = 720
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of informer_train_on_Bitcoin_720:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 720, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'informer', 'checkpoints': './Checkpoints/Informer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1}


### Training

In [15]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 33945
val 4282
test 9281
	iters: 100, epoch: 1 | loss: 1.1358261
	speed: 0.1368s/iter| left time: 2886.2990s
	iters: 200, epoch: 1 | loss: 1.0207936
	speed: 0.1334s/iter| left time: 2801.5408s
	iters: 300, epoch: 1 | loss: 1.0138323
	speed: 0.1379s/iter| left time: 2882.7084s
	iters: 400, epoch: 1 | loss: 0.9752392
	speed: 0.1409s/iter| left time: 2931.2220s
	iters: 500, epoch: 1 | loss: 0.8692607
	speed: 0.1394s/iter| left time: 2885.1513s
	iters: 600, epoch: 1 | loss: 0.8221595
	speed: 0.1306s/iter| left time: 2690.9617s
	iters: 700, epoch: 1 | loss: 0.9817912
	speed: 0.1358s/iter| left time: 2784.5380s
	iters: 800, epoch: 1 | loss: 1.1575950
	speed: 0.1255s/iter| left time: 2560.3393s
	iters: 900, epoch: 1 | loss: 0.9607666
	speed: 0.1368s/iter| left time: 2776.7416s
	iters: 1000, epoch: 1 | loss: 0.8125159
	speed: 0.1271s/iter| left time: 2566.7962s
Epoch: 1 cost time: 143.3161551952362
Epoch: 1, Steps: 1060 | Train Loss: 1.0118202 Vali Loss: 0.1451748 Test Lo

Informer(
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=True)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=True)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderLayer(
        (attention): AttentionLayer(
          (inner_attention): ProbAttention(
            (dropout): Dropout(p=0.

### Testing

In [16]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9281
test shape: (290, 32, 720, 1) (290, 32, 720, 1)
test shape: (9280, 720, 1) (9280, 720, 1)
mae:0.4184359908103943, mse:0.3456279933452606, rmse:0.5879013538360596, mape:1.6623202562332153, mspe:1027.1038818359375


### Conclusion



---
# Working on FEDFormer
---

## Trail 1: FEDformer, Dataset:Bitcoin,  Metric: 96
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [15]:
args.modes=32
args.moving_avg=[5]
args.embed_type == 1

args.model = 'FEDformer' 
args.checkpoints = './Checkpoints/FEDFormer_checkpoints' # location of model checkpoints
args.pred_len = 96
args.train_epochs = 1
args.learning_rate = 0.05
Exp = Exp_FEDFormer

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of FEDformer_train_on_crypto_h1_96:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'crypto_h1.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'timeenc': 1, 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 24, 'dec_in': 24, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 1, 'batch_size': 32, 'learning_rate': 0.05, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'Exp', 'itr': 1, 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'model': 'FEDformer', 'checkpoints': './Checkpoints/FEDFormer_checkpoints'}


### Training

In [16]:
Exp = Exp_FEDFormer
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0


RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

### Testing

In [11]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9905
test shape: (9888, 96, 9) (9888, 96, 1)
mae:1.3929619789123535, mse:3.1631062030792236, rmse:1.7785123586654663, mape:31.83746910095215, mspe:1245519.5


---
## Trail 2: FEDformer, Dataset:Bitcoin,  Metric: 192
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [1]:
args.pred_len = 192
args.train_epochs = 20
args.batch_size = 8

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

NameError: name 'args' is not defined

### Training

In [10]:
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34473
val 4810
test 9809
	iters:  100, epoch: 1 | loss: 1.5980604
	speed: 0.2130sec/iter | left time: 4567.0944sec
	iters:  200, epoch: 1 | loss: 3.0879576
	speed: 0.2001sec/iter | left time: 4269.9379sec
	iters:  300, epoch: 1 | loss: 1.5081090
	speed: 0.2000sec/iter | left time: 4249.0048sec
	iters:  400, epoch: 1 | loss: 1.6581467
	speed: 0.1999sec/iter | left time: 4225.5938sec
	iters:  500, epoch: 1 | loss: 1.7132769
	speed: 0.2001sec/iter | left time: 4210.8043sec
	iters:  600, epoch: 1 | loss: 1.7310362
	speed: 0.1989sec/iter | left time: 4164.4435sec
	iters:  700, epoch: 1 | loss: 1.3443038
	speed: 0.2001sec/iter | left time: 4170.4783sec
	iters:  800, epoch: 1 | loss: 1.1518496
	speed: 0.1999sec/iter | left time: 4146.9589sec
	iters:  900, epoch: 1 | loss: 1.6774831
	speed: 0.1989sec/iter | left time: 4106.2809sec
	iters: 1000, epoch: 1 | loss: 1.4872787
	speed: 0.2000sec/iter | left time: 4108.7284sec
Epoch: 1 cost time: 216.62976121902466
Epoch: 1, Step

Model(
  (decomp): series_decomp_multi(
    (layer): Linear(in_features=1, out_features=1, bias=True)
  )
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderL

### Testing

In [11]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9809
test shape: (9792, 192, 9) (9792, 192, 1)
mae:1.5525404214859009, mse:3.7896485328674316, rmse:1.946702003479004, mape:34.698646545410156, mspe:1438758.25


---
## Trail 3: FEDformer, Dataset:Bitcoin,  Metric: 336
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [12]:
args.pred_len = 336
args.train_epochs = 20

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of FEDformer_train_on_Bitcoin_336:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 336, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'FEDformer', 'checkpoints': './Checkpoints/FEDFormer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1}


### Training

In [13]:
Exp = Exp_FEDFormer
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34329
val 4666
test 9665
	iters:  100, epoch: 1 | loss: 1.6483227
	speed: 0.2107sec/iter | left time: 4497.3442sec
	iters:  200, epoch: 1 | loss: 1.3078232
	speed: 0.2103sec/iter | left time: 4466.5375sec
	iters:  300, epoch: 1 | loss: 1.3237031
	speed: 0.2100sec/iter | left time: 4439.0502sec
	iters:  400, epoch: 1 | loss: 1.8578951
	speed: 0.2009sec/iter | left time: 4228.1704sec
	iters:  500, epoch: 1 | loss: 1.3203173
	speed: 0.2009sec/iter | left time: 4207.4642sec
	iters:  600, epoch: 1 | loss: 1.7005321
	speed: 0.2021sec/iter | left time: 4212.2628sec
	iters:  700, epoch: 1 | loss: 1.5586745
	speed: 0.2050sec/iter | left time: 4251.7740sec
	iters:  800, epoch: 1 | loss: 1.4413584
	speed: 0.2010sec/iter | left time: 4148.5532sec
	iters:  900, epoch: 1 | loss: 1.7719959
	speed: 0.2000sec/iter | left time: 4108.0085sec
	iters: 1000, epoch: 1 | loss: 1.7345303
	speed: 0.1999sec/iter | left time: 4085.2978sec
Epoch: 1 cost time: 218.94065237045288
Epoch: 1, Step

Model(
  (decomp): series_decomp_multi(
    (layer): Linear(in_features=1, out_features=1, bias=True)
  )
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderL

### Testing

In [14]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9665
test shape: (9664, 336, 9) (9664, 336, 1)
mae:1.5651217699050903, mse:3.866671323776245, rmse:1.9663853645324707, mape:34.66867446899414, mspe:1476521.125


---
## Trail 4: FEDformer, Dataset:Bitcoin,  Metric: 720
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [15]:
args.pred_len = 720
args.train_epochs = 20

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of FEDformer_train_on_Bitcoin_720:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 720, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'FEDformer', 'checkpoints': './Checkpoints/FEDFormer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1}


### Training

In [16]:
Exp = Exp_FEDFormer
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 33945
val 4282
test 9281
	iters:  100, epoch: 1 | loss: 2.2763164
	speed: 0.3021sec/iter | left time: 6375.4043sec
	iters:  200, epoch: 1 | loss: 2.0364537
	speed: 0.3010sec/iter | left time: 6320.4144sec
	iters:  300, epoch: 1 | loss: 1.6627011
	speed: 0.3001sec/iter | left time: 6272.6166sec
	iters:  400, epoch: 1 | loss: 1.3063546
	speed: 0.3028sec/iter | left time: 6299.4411sec
	iters:  500, epoch: 1 | loss: 1.6482147
	speed: 0.3019sec/iter | left time: 6250.0993sec
	iters:  600, epoch: 1 | loss: 1.5940429
	speed: 0.3001sec/iter | left time: 6182.4509sec
	iters:  700, epoch: 1 | loss: 1.9007378
	speed: 0.3001sec/iter | left time: 6151.5430sec
	iters:  800, epoch: 1 | loss: 1.8652020
	speed: 0.3020sec/iter | left time: 6160.8647sec
	iters:  900, epoch: 1 | loss: 1.8006806
	speed: 0.3000sec/iter | left time: 6090.5618sec
	iters: 1000, epoch: 1 | loss: 1.4501077
	speed: 0.3039sec/iter | left time: 6139.9072sec
Epoch: 1 cost time: 319.60659193992615
Epoch: 1, Step

Model(
  (decomp): series_decomp_multi(
    (layer): Linear(in_features=1, out_features=1, bias=True)
  )
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(9, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderL

### Testing

In [17]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9281
test shape: (9280, 720, 9) (9280, 720, 1)
mae:1.6407479047775269, mse:4.205647945404053, rmse:2.050767660140991, mape:38.43457794189453, mspe:1740615.25


---
# Working on PatchTST
---

## Trail 1: PatchTST, Dataset:Bitcoin,  Metric: 96
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [11]:
args.model = 'PatchTST'
# args.model_id = f"{args.data}_{args.seq_len}_{args.pred_len}"
args.fc_dropout = 0.3
args.head_dropout = 0
args.stride = 8
args.batch_size = 128
args.train_epochs = 100
args.patch_len= 16
args.pred_len = 96
args.checkpoints = './Checkpoints/PatchTST_checkpoints' # location of model checkpoints

Exp = exp_PatchTST

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of PatchTST_train_on_Bitcoin_96:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 100, 'batch_size': 128, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'PatchTST', 'checkpoints': './Checkpoints/PatchTST_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'fc_dropout': 0.3, 'head_dropout': 0, 'patch_len': 16, 'stride': 8}


### Training

In [9]:
%%time

crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34569
val 4906
test 9905
	iters: 100, epoch: 1 | loss: 0.7567123
	speed: 0.9609s/iter; left time: 25849.9150s
	iters: 200, epoch: 1 | loss: 1.1903340
	speed: 0.8620s/iter; left time: 23103.7629s
Epoch: 1 cost time: 241.19810676574707
Epoch: 1, Steps: 270 | Train Loss: 0.9842840 Vali Loss: 0.1459847 Test Loss: 0.3391561
>>> Validation loss decreased (inf --> 0.145985).  Saving model ...
Updating learning rate to 0.001
	iters: 100, epoch: 2 | loss: 0.8275940
	speed: 2.9759s/iter; left time: 79252.3304s
	iters: 200, epoch: 2 | loss: 1.1826084
	speed: 0.8650s/iter; left time: 22949.2938s
Epoch: 2 cost time: 233.50948476791382
Epoch: 2, Steps: 270 | Train Loss: 0.9946431 Vali Loss: 0.1458681 Test Loss: 0.3428465
>>> Validation loss decreased (0.145985 --> 0.145868).  Saving model ...
Updating learning rate to 0.0005
	iters: 100, epoch: 3 | loss: 1.0292318
	speed: 2.9831s/iter; left time: 78636.5626s
	iters: 200, epoch: 3 | loss: 0.8681535
	speed: 0.8490s/iter; left tim

Model(
  (model): PatchTST_backbone(
    (backbone): TSTiEncoder(
      (W_P): Linear(in_features=16, out_features=512, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (encoder): TSTEncoder(
        (layers): ModuleList(
          (0): TSTEncoderLayer(
            (self_attn): _MultiheadAttention(
              (W_Q): Linear(in_features=512, out_features=512, bias=True)
              (W_K): Linear(in_features=512, out_features=512, bias=True)
              (W_V): Linear(in_features=512, out_features=512, bias=True)
              (sdp_attn): _ScaledDotProductAttention(
                (attn_dropout): Dropout(p=0.0, inplace=False)
              )
              (to_out): Sequential(
                (0): Linear(in_features=512, out_features=512, bias=True)
                (1): Dropout(p=0.1, inplace=False)
              )
            )
            (dropout_attn): Dropout(p=0.1, inplace=False)
            (norm_attn): Sequential(
              (0): Transpose()
              

### Testing

In [10]:
%%time
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9905
mae:0.41688159108161926, mse:0.3428463935852051, rmse:0.5855308771133423, mape:1.9714272022247314, mspe:1722.3175048828125
CPU times: user 7min 18s, sys: 5.27 s, total: 7min 23s
Wall time: 1min 7s


---
## Trail 2: PatchTST, Dataset: Bitcoin , Metric: 192
### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary

In [11]:
args.pred_len = 192
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of PatchTST_train_on_Bitcoin_192:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 192, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 100, 'batch_size': 128, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'PatchTST', 'checkpoints': './Checkpoints/PatchTST_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'fc_dropout': 0.3, 'head_dropout': 0, 'patch_len': 16, 'stride': 8}


### Training

In [12]:
%%time
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34473
val 4810
test 9809
	iters: 100, epoch: 1 | loss: 0.9969887
	speed: 0.6459s/iter; left time: 17311.6293s
	iters: 200, epoch: 1 | loss: 1.0769274
	speed: 0.8640s/iter; left time: 23070.5431s
Epoch: 1 cost time: 208.7990472316742
Epoch: 1, Steps: 269 | Train Loss: 0.9951795 Vali Loss: 0.1488167 Test Loss: 0.3441015
>>> Validation loss decreased (inf --> 0.148817).  Saving model ...
Updating learning rate to 0.001
	iters: 100, epoch: 2 | loss: 0.9507551
	speed: 2.8520s/iter; left time: 75668.1173s
	iters: 200, epoch: 2 | loss: 0.7726432
	speed: 0.8410s/iter; left time: 22229.8411s
Epoch: 2 cost time: 226.69701552391052
Epoch: 2, Steps: 269 | Train Loss: 1.0136118 Vali Loss: 0.1482446 Test Loss: 0.3447173
>>> Validation loss decreased (0.148817 --> 0.148245).  Saving model ...
Updating learning rate to 0.0005
	iters: 100, epoch: 3 | loss: 0.9364120
	speed: 2.9020s/iter; left time: 76214.8082s
	iters: 200, epoch: 3 | loss: 0.9742250
	speed: 0.8510s/iter; left time

Model(
  (model): PatchTST_backbone(
    (backbone): TSTiEncoder(
      (W_P): Linear(in_features=16, out_features=512, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (encoder): TSTEncoder(
        (layers): ModuleList(
          (0): TSTEncoderLayer(
            (self_attn): _MultiheadAttention(
              (W_Q): Linear(in_features=512, out_features=512, bias=True)
              (W_K): Linear(in_features=512, out_features=512, bias=True)
              (W_V): Linear(in_features=512, out_features=512, bias=True)
              (sdp_attn): _ScaledDotProductAttention(
                (attn_dropout): Dropout(p=0.0, inplace=False)
              )
              (to_out): Sequential(
                (0): Linear(in_features=512, out_features=512, bias=True)
                (1): Dropout(p=0.1, inplace=False)
              )
            )
            (dropout_attn): Dropout(p=0.1, inplace=False)
            (norm_attn): Sequential(
              (0): Transpose()
              

### Testing

In [13]:
%%time
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9809
mae:0.4187060296535492, mse:0.3447172939777374, rmse:0.5871263146400452, mape:1.9410228729248047, mspe:1947.8111572265625
CPU times: user 7min 8s, sys: 4.89 s, total: 7min 13s
Wall time: 1min 4s


---
## Trail 3: PatchTST, Dataset: Bitcoin,  Metric: 336

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [12]:
args.pred_len = 336
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of PatchTST_train_on_Bitcoin_336:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 336, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 100, 'batch_size': 128, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'PatchTST', 'checkpoints': './Checkpoints/PatchTST_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'fc_dropout': 0.3, 'head_dropout': 0, 'patch_len': 16, 'stride': 8}


### Training

In [13]:
%%time
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34329
val 4666
test 9665
	iters: 100, epoch: 1 | loss: 0.9554790
	speed: 1.0150s/iter; left time: 27101.9716s
	iters: 200, epoch: 1 | loss: 0.9334428
	speed: 0.8709s/iter; left time: 23167.6501s
Epoch: 1 cost time: 246.69763350486755
Epoch: 1, Steps: 268 | Train Loss: 0.9995384 Vali Loss: 0.1503415 Test Loss: 0.3458355
>>> Validation loss decreased (inf --> 0.150341).  Saving model ...
Updating learning rate to 0.001
	iters: 100, epoch: 2 | loss: 0.9902376
	speed: 3.3930s/iter; left time: 89686.9430s
	iters: 200, epoch: 2 | loss: 0.9565613
	speed: 0.8410s/iter; left time: 22146.1216s
Epoch: 2 cost time: 224.69683027267456
Epoch: 2, Steps: 268 | Train Loss: 1.0128777 Vali Loss: 0.1490591 Test Loss: 0.3480502
>>> Validation loss decreased (0.150341 --> 0.149059).  Saving model ...
Updating learning rate to 0.0005
	iters: 100, epoch: 3 | loss: 0.9382138
	speed: 3.4000s/iter; left time: 88962.1409s
	iters: 200, epoch: 3 | loss: 0.9894266
	speed: 0.8500s/iter; left tim

Model(
  (model): PatchTST_backbone(
    (backbone): TSTiEncoder(
      (W_P): Linear(in_features=16, out_features=512, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (encoder): TSTEncoder(
        (layers): ModuleList(
          (0): TSTEncoderLayer(
            (self_attn): _MultiheadAttention(
              (W_Q): Linear(in_features=512, out_features=512, bias=True)
              (W_K): Linear(in_features=512, out_features=512, bias=True)
              (W_V): Linear(in_features=512, out_features=512, bias=True)
              (sdp_attn): _ScaledDotProductAttention(
                (attn_dropout): Dropout(p=0.0, inplace=False)
              )
              (to_out): Sequential(
                (0): Linear(in_features=512, out_features=512, bias=True)
                (1): Dropout(p=0.1, inplace=False)
              )
            )
            (dropout_attn): Dropout(p=0.1, inplace=False)
            (norm_attn): Sequential(
              (0): Transpose()
              

### Testing

In [14]:
%%time
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9665
mae:0.4199315309524536, mse:0.3488718867301941, rmse:0.5906537771224976, mape:1.9905039072036743, mspe:1961.1881103515625
CPU times: user 2min 23s, sys: 1.01 s, total: 2min 24s
Wall time: 8.32 s


---
## Trail 4: PatchTST, Dataset: Bitcoin,  Metric: 720

### Set hyperparameters
Set some parameters (Args) for the our experiment like dictionary


In [15]:
args.pred_len = 720
setting=f'{args.model}_train_on_{used_data}_{args.pred_len}'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of PatchTST_train_on_Bitcoin_720:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 720, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 100, 'batch_size': 128, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'PatchTST', 'checkpoints': './Checkpoints/PatchTST_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'fc_dropout': 0.3, 'head_dropout': 0, 'patch_len': 16, 'stride': 8}


### Training

In [16]:
%%time
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 33945
val 4282
test 9281
	iters: 100, epoch: 1 | loss: 0.9216278
	speed: 0.2816s/iter; left time: 7435.1876s
	iters: 200, epoch: 1 | loss: 0.8119629
	speed: 0.2869s/iter; left time: 7545.0952s
Epoch: 1 cost time: 75.71340847015381
Epoch: 1, Steps: 265 | Train Loss: 0.9875062 Vali Loss: 0.1421964 Test Loss: 0.3519789
>>> Validation loss decreased (inf --> 0.142196).  Saving model ...
Updating learning rate to 0.001
	iters: 100, epoch: 2 | loss: 1.0049590
	speed: 0.6027s/iter; left time: 15751.6418s
	iters: 200, epoch: 2 | loss: 0.8251871
	speed: 0.2903s/iter; left time: 7557.7829s
Epoch: 2 cost time: 76.73786735534668
Epoch: 2, Steps: 265 | Train Loss: 1.0122840 Vali Loss: 0.1411588 Test Loss: 0.3517562
>>> Validation loss decreased (0.142196 --> 0.141159).  Saving model ...
Updating learning rate to 0.0005
	iters: 100, epoch: 3 | loss: 0.9370153
	speed: 0.6224s/iter; left time: 16101.2852s
	iters: 200, epoch: 3 | loss: 0.8807594
	speed: 0.2899s/iter; left time: 74

Model(
  (model): PatchTST_backbone(
    (backbone): TSTiEncoder(
      (W_P): Linear(in_features=16, out_features=512, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (encoder): TSTEncoder(
        (layers): ModuleList(
          (0): TSTEncoderLayer(
            (self_attn): _MultiheadAttention(
              (W_Q): Linear(in_features=512, out_features=512, bias=True)
              (W_K): Linear(in_features=512, out_features=512, bias=True)
              (W_V): Linear(in_features=512, out_features=512, bias=True)
              (sdp_attn): _ScaledDotProductAttention(
                (attn_dropout): Dropout(p=0.0, inplace=False)
              )
              (to_out): Sequential(
                (0): Linear(in_features=512, out_features=512, bias=True)
                (1): Dropout(p=0.1, inplace=False)
              )
            )
            (dropout_attn): Dropout(p=0.1, inplace=False)
            (norm_attn): Sequential(
              (0): Transpose()
              

### Testing

In [17]:
%%time
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9281
mae:0.4204939603805542, mse:0.35175618529319763, rmse:0.5930903553962708, mape:1.695189118385315, mspe:994.0048828125
CPU times: user 2min 23s, sys: 983 ms, total: 2min 24s
Wall time: 8.04 s


### Conclusion



-------------------

In [9]:
args.model = 'PatchTST'
args.fc_dropout = 0.2
args.head_dropout = 0
args.dropout = 0.2
args.patch_len = 16
args.stride = 8
args.batch_size = 128
args.train_epochs = 100
args.pred_len = 96
args.e_layers = 3
args.d_ff=256
args.patience = 20
args.lradj = 'TST'
args.pct_start=0.4
args.learning_rate=0.0001
args.d_model=128
args.seq_len = 336
args.checkpoints = './Checkpoints/PatchTST_checkpoints' # location of model checkpoints

Exp = exp_PatchTST

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}_test'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of PatchTST_train_on_Bitcoin_96_test:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 128, 'n_heads': 8, 'factor': 5, 'dropout': 0.2, 'd_ff': 256, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 3, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 100, 'batch_size': 128, 'learning_rate': 0.0001, 'lradj': 'TST', 'loss': 'mse', 'patience': 20, 'des': 'test', 'itr': 1, 'model': 'PatchTST', 'checkpoints': './Checkpoints/PatchTST_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'fc_dropout': 0.2, 'head_dropout': 0, 'patch_len': 16, 'stride': 8, 'pct_start': 0.4}


### Training

In [10]:
%%time

crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34569
val 4906
test 9905
	iters: 100, epoch: 1 | loss: 0.9775555
	speed: 0.1217s/iter; left time: 3273.2790s
	iters: 200, epoch: 1 | loss: 1.2696576
	speed: 0.1130s/iter; left time: 3028.4225s
Epoch: 1 cost time: 31.478700399398804
Epoch: 1, Steps: 270 | Train Loss: 1.2981272 Vali Loss: 0.1659675 Test Loss: 0.3814417
>>> Validation loss decreased (inf --> 0.165967).  Saving model ...
Updating learning rate to 4.263013900663684e-06
	iters: 100, epoch: 2 | loss: 1.0677847
	speed: 0.2790s/iter; left time: 7429.5923s
	iters: 200, epoch: 2 | loss: 1.0509040
	speed: 0.1170s/iter; left time: 3104.8670s
Epoch: 2 cost time: 31.31606888771057
Epoch: 2, Steps: 270 | Train Loss: 1.1267041 Vali Loss: 0.1551446 Test Loss: 0.3583809
>>> Validation loss decreased (0.165967 --> 0.155145).  Saving model ...
Updating learning rate to 5.049173256323796e-06
	iters: 100, epoch: 3 | loss: 1.1935017
	speed: 0.2679s/iter; left time: 7062.4581s
	iters: 200, epoch: 3 | loss: 1.0124507
	spee

Model(
  (model): PatchTST_backbone(
    (backbone): TSTiEncoder(
      (W_P): Linear(in_features=16, out_features=128, bias=True)
      (dropout): Dropout(p=0.2, inplace=False)
      (encoder): TSTEncoder(
        (layers): ModuleList(
          (0-2): 3 x TSTEncoderLayer(
            (self_attn): _MultiheadAttention(
              (W_Q): Linear(in_features=128, out_features=128, bias=True)
              (W_K): Linear(in_features=128, out_features=128, bias=True)
              (W_V): Linear(in_features=128, out_features=128, bias=True)
              (sdp_attn): _ScaledDotProductAttention(
                (attn_dropout): Dropout(p=0.0, inplace=False)
              )
              (to_out): Sequential(
                (0): Linear(in_features=128, out_features=128, bias=True)
                (1): Dropout(p=0.2, inplace=False)
              )
            )
            (dropout_attn): Dropout(p=0.2, inplace=False)
            (norm_attn): Sequential(
              (0): Transpose()
        

### Testing

In [11]:
%%time
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9905
mae:0.4194701015949249, mse:0.3539362847805023, rmse:0.5949254631996155, mape:2.8835556507110596, mspe:5556.4580078125
CPU times: user 2min 16s, sys: 877 ms, total: 2min 17s
Wall time: 6.18 s


In [10]:
args.modes=32
args.moving_avg=[5]
args.embed_type == 1
args.enc_in = 5                 # encoder input size
args.dec_in = 5                 # decoder input size

args.model = 'FEDformer' 
args.checkpoints = './Checkpoints/FEDFormer_checkpoints' # location of model checkpoints
args.pred_len = 96
args.train_epochs = 20
args.learning_rate = 0.001
args.cols = ['Count','Open','High','VWAP','Target']  #['date','Count','Open','High','Low','Close','Volume','VWAP','Target']
Exp = Exp_FEDFormer

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}_cols'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of FEDformer_train_on_Bitcoin_96_cols:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'Target', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 5, 'dec_in': 5, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'FEDformer', 'checkpoints': './Checkpoints/FEDFormer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'cols': ['Count', 'Open', 'High', 'VWAP', 'Target']}


### Training

In [11]:
Exp = Exp_FEDFormer
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34569
val 4906
test 9905
	iters:  100, epoch: 1 | loss: 2.1128492
	speed: 0.2122sec/iter | left time: 4562.7615sec
	iters:  200, epoch: 1 | loss: 1.1595670
	speed: 0.2000sec/iter | left time: 4281.0912sec
	iters:  300, epoch: 1 | loss: 1.5373930
	speed: 0.1980sec/iter | left time: 4216.9920sec
	iters:  400, epoch: 1 | loss: 1.4198899
	speed: 0.1990sec/iter | left time: 4219.2588sec
	iters:  500, epoch: 1 | loss: 1.5996754
	speed: 0.1981sec/iter | left time: 4179.6526sec
	iters:  600, epoch: 1 | loss: 1.8892645
	speed: 0.1998sec/iter | left time: 4196.1762sec
	iters:  700, epoch: 1 | loss: 1.9419611
	speed: 0.2002sec/iter | left time: 4184.0741sec
	iters:  800, epoch: 1 | loss: 1.2503175
	speed: 0.2000sec/iter | left time: 4160.2466sec
	iters:  900, epoch: 1 | loss: 1.2112006
	speed: 0.1999sec/iter | left time: 4138.6109sec
	iters: 1000, epoch: 1 | loss: 1.3699380
	speed: 0.2001sec/iter | left time: 4122.0162sec
Epoch: 1 cost time: 216.83240962028503
Epoch: 1, Step

Model(
  (decomp): series_decomp_multi(
    (layer): Linear(in_features=1, out_features=1, bias=True)
  )
  (enc_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(5, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (dec_embedding): DataEmbedding(
    (value_embedding): TokenEmbedding(
      (tokenConv): Conv1d(5, 512, kernel_size=(3,), stride=(1,), padding=(1,), bias=False, padding_mode=circular)
    )
    (position_embedding): PositionalEmbedding()
    (temporal_embedding): TimeFeatureEmbedding(
      (embed): Linear(in_features=4, out_features=512, bias=False)
    )
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (attn_layers): ModuleList(
      (0): EncoderL

### Testing

In [12]:
crypto_exp.test(setting)
torch.cuda.empty_cache()

test 9905
test shape: (9888, 96, 5) (9888, 96, 1)
mae:1.4921517372131348, mse:3.8599140644073486, rmse:1.9646663665771484, mape:31.863056182861328, mspe:1194825.125


In [20]:
args.modes=32
args.moving_avg=[5]
args.embed_type == 1

args.model = 'FEDformer' 
args.checkpoints = './Checkpoints/FEDFormer_checkpoints' # location of model checkpoints
args.pred_len = 96
args.train_epochs = 20
args.learning_rate = 0.001
args.cols = None  #['date','Count','Open','High','Low','Close','Volume','VWAP','Target']
args.target = 'VWAP'
Exp = Exp_FEDFormer

setting=f'{args.model}_train_on_{used_data}_{args.pred_len}_allcols_VWAP'
print(f"Hyperparameter Combination of {setting}:\n") 
print(args)

Hyperparameter Combination of FEDformer_train_on_Bitcoin_96_allcols_VWAP:

{'use_multi_gpu': False, 'num_workers': 0, 'use_gpu': True, 'gpu': 0, 'data': 'custom', 'root_path': './Datasets/CustomData/', 'data_path': 'Bitcoin.csv', 'features': 'MS', 'target': 'VWAP', 'freq': 'h', 'embed': 'timeF', 'padding': 0, 'attn': 'prob', 'd_model': 512, 'n_heads': 8, 'factor': 5, 'dropout': 0.1, 'd_ff': 2048, 'activation': 'gelu', 'mix': True, 'enc_in': 9, 'dec_in': 9, 'c_out': 1, 'e_layers': 2, 'd_layers': 1, 'seq_len': 336, 'label_len': 48, 'pred_len': 96, 'distil': True, 'output_attention': False, 'use_amp': False, 'train_only': True, 'train_epochs': 20, 'batch_size': 32, 'learning_rate': 0.001, 'lradj': 'type1', 'loss': 'mse', 'patience': 3, 'des': 'test', 'itr': 1, 'model': 'FEDformer', 'checkpoints': './Checkpoints/FEDFormer_checkpoints', 'modes': 32, 'moving_avg': [5], 'embed_type': 1, 'cols': None}


### Training

In [None]:
Exp = Exp_FEDFormer
crypto_exp = Exp(args)
crypto_exp.train(setting)

Use GPU: cuda:0
train 34569
val 4906
test 9905
	iters:  100, epoch: 1 | loss: 0.9434859
	speed: 0.2018sec/iter | left time: 4338.9487sec
	iters:  200, epoch: 1 | loss: 0.9202161
	speed: 0.2049sec/iter | left time: 4385.0008sec
	iters:  300, epoch: 1 | loss: 0.8417870
	speed: 0.2030sec/iter | left time: 4324.6449sec
	iters:  400, epoch: 1 | loss: 0.7561420
	speed: 0.2029sec/iter | left time: 4301.4459sec
	iters:  500, epoch: 1 | loss: 0.6936189
	speed: 0.1991sec/iter | left time: 4202.2426sec
	iters:  600, epoch: 1 | loss: 1.7948225
	speed: 0.2050sec/iter | left time: 4304.6287sec
	iters:  700, epoch: 1 | loss: 0.8175589
	speed: 0.2030sec/iter | left time: 4243.1977sec
	iters:  800, epoch: 1 | loss: 0.7804035
	speed: 0.2019sec/iter | left time: 4199.6747sec
	iters:  900, epoch: 1 | loss: 0.7349441
	speed: 0.2020sec/iter | left time: 4181.3125sec
	iters: 1000, epoch: 1 | loss: 0.5693433
	speed: 0.2000sec/iter | left time: 4119.2689sec
Epoch: 1 cost time: 218.77358293533325
Epoch: 1, Step

### Testing

In [None]:
crypto_exp.test(setting)
torch.cuda.empty_cache()