# allennlp-simple-lstm-tagger-tutorials

参考自[官方示例](https://allennlp.org/tutorials).

从纯代码的方式使用`allennlp`，能够很好的理解`allennlp`在整个处理流程中，**不同模块**概念在**不同流程**中的作用，zhe y。

> 我一开始是从 `allennlp-train` 模式学起，然后其中的配置及其关系各种蒙圈，虽然能改点参数，
> 但并不明白其中的流程，于是开始看源码，自己一点一点倒腾，现在倒有点理解，在此分享出来。


# 先看看基本概念

In [12]:
# import libs
# 建议写代码时，多添些代码类型注释
from typing import Iterator, List, Dict

# AllenNlp 基于Pytorch编写，所以几乎可以在Allennlp中使用pytorch所有组件
# eg：modules，optimizer，operation ......
import torch
import torch.optim as optim
import numpy as np

# Instance

顾名思义，每一行文本转化为Instance对象。

比如：

```javascript
instance = Instance({
    "text"    : TextField(["I", "love", "you"]),
    "label"   : LabelField("happy"),
    "tags"    : SequenceLabelField(["Person","O","Person"])
})
```
Instance对象中可针对不同任务存储不同类型的数据。
比如：文本分类，情感分类等任务，每一个`instance`都需要一个`Label`，故将其分类数据存储为`LabelField`。
比如：`POS`，`NER`，`SlotFilling` 等任务中，`Instance`中的每个`Token`都需要一个`label`，故将其分类数据存储为`SequenceLabelField`。

In [13]:
from allennlp.data import Instance
from allennlp.data.fields import TextField, SequenceLabelField

# DatasetReader

`dataset_reader` 负责将数据文件读取成一个`Iterable(Instance)`集合。

而我们所需要做的就是重写`_read(file_path)`函数。

因其为`Iterable`对象故可转化为List内存对象，也可作为一个 `lazy generator`，进行延迟加载数据。


In [14]:
from allennlp.data.dataset_readers import DatasetReader


# cache_path

如果`file_path`是一个网络地址，则可自动将数据下载到`cache_dir`文件夹下。然后返回本地刚下载好的数据文件路径。

否则，直接读取本地文件。


In [15]:
from allennlp.common.file_utils import cached_path

# Tokenindexer


顾名思义，将`Token`转化为`index`，在不同模型中，`word-level`和`character-level`是需要对字符进行不同程度的映射。

比如：单词`cat`在`word-level`下的index可能为`34`。可在`character-level`下的index就可能是`[23,10,18]`。

In [16]:
from allennlp.data.token_indexers import TokenIndexer, SingleIdTokenIndexer
from allennlp.data.tokenizers import Token
from allennlp.data.vocabulary import Vocabulary

# Model

这个整个`Allennlp`框架的核心，也是我们最终模型算法的核心部分。

`Allennlp`给我们提供了多种基础模型，开箱即用，比如：`CrfTagger`，`BertForClassification`，`[BiMpm](https://arxiv.org/abs/1702.03814)`，`BidirectionalLanguageModel`，`MaskedLanguageModel` ...... 

哎呀，实在是数不过来，里面有太多提供了开箱即用的模型，希望大家能够多看[源码](https://github.com/allenai/allennlp)，了解其中最新的模型和组件。

In [17]:
from allennlp.models import Model

# TextFieldEmbedder

主要是用于将`Instance`中的`TextField`转化为词向量,

**首先**，将`Instance`中的allennlp.data.fields.TextField字段转化为allennlp.data.DataArray对象。

**其次**，当我们创建TextField的时候，是有传递一个`Dict`[`str`,`allennlp.data.Tokenindexer`]对象，这样让token使用不同的方式来构建索引。比如:


```python

token_indexers = {
    "words": SingleIdTokenIndexer(),
    "characters":TokenCharactersIndexer()
}

instance = Instance(
    "text":TextField(["I","love","you"],token_indexers),
    "label":LableField("happy")
)

```

此时就会使用两种方式来对text中的tokens构建索引

最后`Dict[str,TokenEmbedder]`将其转化为词向量，注意 ⚠️ ，此处可有多个TokenEmbedders的key值与token_indexer中的key值相对应，这样不同的token经过指定的tokenindexer后生成索引后，由对应的TokenEmbedder来映射到Embedding。

最后的最后，如果有多种token_indexer/token_embedder，会自动将不同embedding拼接到一起。
比如，以下配置：

```json
"dataset_reader": {
    "type": "ner_ontonotes",
    "label_namespace": "ontonotes_ner_labels",
    "coding_scheme": "BIOUL",
    "lazy": false,
    "token_indexers": {
        "tokens": {
            "type": "single_id",
            "lowercase_tokens": true
        },
        "token_characters": {
            "type": "characters"
        },
        "elmo": {
            "type": "elmo_characters"
        }
    }
},
"model": {
    "type": "ner",
    "text_field_embedder": {
        "token_embedders": {
            "tokens": {
                "type": "embedding",
                "pretrained_file": "./data/glove/glove.6B.100d.txt.gz",
                "embedding_dim": 100,
                "trainable": true
            },
            "elmo": {
                "type": "elmo_token_embedder",
                "options_file": "./data/elmo/2x4096_512_2048cnn_2xhighway_options.json",
                "weight_file": "./data/elmo/2x4096_512_2048cnn_2xhighway_weights.hdf5",
                "do_layer_norm": false,
                "dropout": 0,
                "requires_grad": false
            },
            "token_characters": {
                "type": "character_encoding",
                "embedding": {
                    "embedding_dim": 16
                },
                "encoder": {
                    "type": "cnn",
                    "embedding_dim": 16,
                    "num_filters": 64,
                    "ngram_filter_sizes": [
                        3
                    ]
                },
                "dropout": 0.1
            }
        }
    },
    "ner": {
        "encoder": {
            "type": "lstm",
            "bidirectional": true,
            "input_size": 1188,
            "hidden_size": 64,
            "num_layers": 2,
            "dropout": 0.2
        },
        "tagger": {
            "label_namespace": "ontonotes_ner_labels",
            "constraint_type": "BIOUL",
            "dropout": 0.2
        }
    }
},
```

*注意：token_indexers下的key必须与token_embedder下的key一致。*

这里有三种embedding，最后都会拼接成一个input-embedding，便从多种方式下获取特征。

# Modules

这个modules不同与Pytorch中的Module，而是内置来很多已经实现好的Module，提供我们在模型当中使用。

比如：多种Attention，多种Seq2SeqEncoder，多种TokenEmbedder，以及ConditionRandomField等等。对于我们复现论文模型和研究算法非常**有用**。

In [18]:
from allennlp.modules.text_field_embedders import TextFieldEmbedder, BasicTextFieldEmbedder
from allennlp.modules.token_embedders import Embedding
from allennlp.modules.seq2seq_encoders import Seq2SeqEncoder, PytorchSeq2SeqWrapper
from allennlp.nn.util import get_text_field_mask, sequence_cross_entropy_with_logits

# Iterators

将DatasetReader读取出来的Instance转化为Batch，然后塞给Model。

Iterators也有很多类型，不同的数据组装方式对于训练的过程也是有挺大影响的。

In [19]:
from allennlp.data.iterators import BucketIterator

# Trainning

对于训练的过程，在Allennlp封装的非常好，只需要一个train方法就可以完成类似于keras中的功能。

我最喜欢其中的功能就是：

- 自动生成log，这样就可以使用tensorboard查看不同的参数
- 自动保存best-checkpoint
- 生成良好的训练输出格式

In [20]:
from allennlp.training.metrics import CategoricalAccuracy
from allennlp.training.trainer import Trainer
from allennlp.predictors import SentenceTaggerPredictor
torch.manual_seed(1)

<torch._C.Generator at 0x12735f350>

# 开始看代码

In [23]:
class PosDatasetReader(DatasetReader):
    """
    读取数据文件，格式如下：
    
    The###DET dog###NN ate###V the###DET apple###NN
    
    """
    def __init__(self, token_indexers: Dict[str, TokenIndexer] = None) -> None:
        super().__init__(lazy=False)
        # token_indexs 将token映射到指定的索引上
        # 如果未指定token_indexers，则默认将每个单词映射到唯一id
        self.token_indexers = token_indexers or {"tokens": SingleIdTokenIndexer()}
        
    def text_to_instance(self, tokens: List[Token], tags: List[str] = None) -> Instance:
        
        # 只有TextField上才会传递token_indexers
        # LabelField 和 SequenceLabelField 不传递 token_indexers
        sentence_field = TextField(tokens, self.token_indexers)
        
        # fields 最后是转化为Instance
        # 同时将文本数据（TextField）放置在 "sentence" 键上，所以在模型的forward函数上，就应该有sentence参数。
        
        fields = {"sentence": sentence_field}
        # tags这里为什么可能是None？
        # 答：在train模式下就会传递tags数据，可如果是在对应的predict模型下，就不会喘息tags。
        if tags:
            label_field = SequenceLabelField(labels=tags, sequence_field=sentence_field)
            fields["labels"] = label_field

        return Instance(fields)
    def _read(self, file_path: str) -> Iterator[Instance]:
        with open(file_path) as f:
            for line in f:
                pairs = line.strip().split()
                sentence, tags = zip(*(pair.split("###") for pair in pairs))
                yield self.text_to_instance([Token(word) for word in sentence], tags)

In [24]:
class LstmTagger(Model):
    """
    最上层的模型，融合多种modules
    """
    def __init__(self,
                 word_embeddings: TextFieldEmbedder,
                 encoder: Seq2SeqEncoder,
                 vocab: Vocabulary) -> None:
        super().__init__(vocab)
        
        # 用于将token_index转化为embedding
        self.word_embeddings = word_embeddings
        
        # 派生于Seq2SeqEncoder，建议看看allennlp.modules.seq2seq下的多种模型
        # 官网上的api文档写的太简单了，看源码你会了解的更多。
        self.encoder = encoder
        
        # allennlp有一个很让人舒适的地方就是：大部分allennlp.modules下的module，都有一个`get_output_dim()`函数，
        # 这样在一定程度上减少模型的耦合度和配置复杂性
        self.hidden2tag = torch.nn.Linear(in_features=encoder.get_output_dim(),
                                          out_features=vocab.get_vocab_size('labels'))
        
        # 不是Loss，也不是Optimizer，不影响训练的learning-rate或grad
        # 而是在训练的过程中，输出训练效果分数，比如：accuracy，f1score，recall-score等
        self.accuracy = CategoricalAccuracy()
        
    def forward(self,
                sentence: Dict[str, torch.Tensor],
                labels: torch.Tensor = None) -> Dict[str, torch.Tensor]:
        
        # 这个非常重要，一个batch中不同文本有不同的长度，故需要获取mask来指定参数更新梯度
        mask = get_text_field_mask(sentence)
        
        # 将sentence数据映射到词向量。
        """
        注意，这里 sentence 类型是 Dict[str,torch.Tensor] 
        
        比如token_indexers设置了 word,charaters 两个不同的token_indexer，则此处的sentence也会有这两个key，
        
        并交给text_field_embedding（也包含word，charaters这两个TokenEmbedder）映射成词向量。
        """
        embeddings = self.word_embeddings(sentence)
        
        # seq2seq_encoder
        encoder_out = self.encoder(embeddings, mask)
        
        # shape : (batch_size, sequence_length , label_size)
        tag_logits = self.hidden2tag(encoder_out)
        
        
        output = {"tag_logits": tag_logits}
        if labels is not None:
            self.accuracy(tag_logits, labels, mask)
            output["loss"] = sequence_cross_entropy_with_logits(tag_logits, labels, mask)

        return output
    def get_metrics(self, reset: bool = False) -> Dict[str, float]:
        return {"accuracy": self.accuracy.get_metric(reset)}

In [28]:
reader = PosDatasetReader()
train_dataset = reader.read(cached_path('./data/train.txt'))
validation_dataset = reader.read(cached_path('./data/validation.txt'))

# 手动构造Vocabulary
vocab = Vocabulary.from_instances(train_dataset + validation_dataset)

# 自定义超参数
EMBEDDING_DIM = 6
HIDDEN_DIM = 6
token_embedding = Embedding(num_embeddings=vocab.get_vocab_size('tokens'),
                            embedding_dim=EMBEDDING_DIM)

# "tokens" 是需要和DataserReader中的token_indexers的 key 保持一致
word_embeddings = BasicTextFieldEmbedder({"tokens": token_embedding})

lstm = PytorchSeq2SeqWrapper(torch.nn.LSTM(EMBEDDING_DIM, HIDDEN_DIM, batch_first=True))
model = LstmTagger(word_embeddings, lstm, vocab)
if torch.cuda.is_available():
    cuda_device = 0
    model = model.cuda(cuda_device)
else:
    cuda_device = -1

2it [00:00, 2217.45it/s]
2it [00:00, 7489.83it/s]
100%|██████████| 4/4 [00:00<00:00, 27060.03it/s]


In [29]:
# 自定义优化器
optimizer = optim.SGD(model.parameters(), lr=0.1)
# 自定义数据迭代方式
iterator = BucketIterator(batch_size=2, sorting_keys=[("sentence", "num_tokens")])

# 现在回想一下，token_indexers 要想将token映射到index，那没有vocabulary如何映射呢？
# 这里就是给iterator设置vocab。
# 在iterator对数据进行组装的时候，会调用token_indexer函数并将vocab传递过去，此时token_indexer才会接触到vocab。
iterator.index_with(vocab)

# 这里的方法就很类似于keras的compile函数了
trainer = Trainer(model=model,
                  optimizer=optimizer,
                  iterator=iterator,
                  train_dataset=train_dataset,
                  validation_dataset=validation_dataset,
                  patience=10,
                  num_epochs=100,
                  cuda_device=cuda_device)
trainer.train()

accuracy: 0.3333, loss: 1.0939 ||: 100%|██████████| 1/1 [00:00<00:00, 64.83it/s]
accuracy: 0.3333, loss: 1.0910 ||: 100%|██████████| 1/1 [00:00<00:00, 372.99it/s]
accuracy: 0.3333, loss: 1.0910 ||: 100%|██████████| 1/1 [00:00<00:00, 123.25it/s]
accuracy: 0.3333, loss: 1.0884 ||: 100%|██████████| 1/1 [00:00<00:00, 528.58it/s]
accuracy: 0.3333, loss: 1.0884 ||: 100%|██████████| 1/1 [00:00<00:00, 163.00it/s]
accuracy: 0.3333, loss: 1.0860 ||: 100%|██████████| 1/1 [00:00<00:00, 427.38it/s]
accuracy: 0.3333, loss: 1.0860 ||: 100%|██████████| 1/1 [00:00<00:00, 200.11it/s]
accuracy: 0.3333, loss: 1.0838 ||: 100%|██████████| 1/1 [00:00<00:00, 248.33it/s]
accuracy: 0.3333, loss: 1.0838 ||: 100%|██████████| 1/1 [00:00<00:00, 191.90it/s]
accuracy: 0.3333, loss: 1.0818 ||: 100%|██████████| 1/1 [00:00<00:00, 372.89it/s]
accuracy: 0.3333, loss: 1.0818 ||: 100%|██████████| 1/1 [00:00<00:00, 150.90it/s]
accuracy: 0.3333, loss: 1.0800 ||: 100%|██████████| 1/1 [00:00<00:00, 379.92it/s]
accuracy: 0.3333,

accuracy: 0.4444, loss: 1.0558 ||: 100%|██████████| 1/1 [00:00<00:00, 506.07it/s]
accuracy: 0.4444, loss: 1.0558 ||: 100%|██████████| 1/1 [00:00<00:00, 204.91it/s]
accuracy: 0.4444, loss: 1.0556 ||: 100%|██████████| 1/1 [00:00<00:00, 570.73it/s]
accuracy: 0.4444, loss: 1.0556 ||: 100%|██████████| 1/1 [00:00<00:00, 163.29it/s]
accuracy: 0.4444, loss: 1.0554 ||: 100%|██████████| 1/1 [00:00<00:00, 505.34it/s]
accuracy: 0.4444, loss: 1.0554 ||: 100%|██████████| 1/1 [00:00<00:00, 154.63it/s]
accuracy: 0.4444, loss: 1.0552 ||: 100%|██████████| 1/1 [00:00<00:00, 424.83it/s]
accuracy: 0.4444, loss: 1.0552 ||: 100%|██████████| 1/1 [00:00<00:00, 194.18it/s]
accuracy: 0.4444, loss: 1.0550 ||: 100%|██████████| 1/1 [00:00<00:00, 294.11it/s]
accuracy: 0.4444, loss: 1.0550 ||: 100%|██████████| 1/1 [00:00<00:00, 163.13it/s]
accuracy: 0.4444, loss: 1.0548 ||: 100%|██████████| 1/1 [00:00<00:00, 325.22it/s]
accuracy: 0.4444, loss: 1.0548 ||: 100%|██████████| 1/1 [00:00<00:00, 150.94it/s]
accuracy: 0.4444

accuracy: 0.4444, loss: 1.0457 ||: 100%|██████████| 1/1 [00:00<00:00, 164.42it/s]
accuracy: 0.4444, loss: 1.0455 ||: 100%|██████████| 1/1 [00:00<00:00, 517.56it/s]
accuracy: 0.4444, loss: 1.0455 ||: 100%|██████████| 1/1 [00:00<00:00, 184.81it/s]
accuracy: 0.4444, loss: 1.0453 ||: 100%|██████████| 1/1 [00:00<00:00, 509.95it/s]
accuracy: 0.4444, loss: 1.0453 ||: 100%|██████████| 1/1 [00:00<00:00, 224.21it/s]
accuracy: 0.4444, loss: 1.0450 ||: 100%|██████████| 1/1 [00:00<00:00, 441.55it/s]
accuracy: 0.4444, loss: 1.0450 ||: 100%|██████████| 1/1 [00:00<00:00, 210.26it/s]
accuracy: 0.4444, loss: 1.0448 ||: 100%|██████████| 1/1 [00:00<00:00, 504.49it/s]
accuracy: 0.4444, loss: 1.0448 ||: 100%|██████████| 1/1 [00:00<00:00, 217.45it/s]
accuracy: 0.4444, loss: 1.0445 ||: 100%|██████████| 1/1 [00:00<00:00, 521.68it/s]
accuracy: 0.4444, loss: 1.0445 ||: 100%|██████████| 1/1 [00:00<00:00, 220.01it/s]
accuracy: 0.4444, loss: 1.0443 ||: 100%|██████████| 1/1 [00:00<00:00, 504.55it/s]
accuracy: 0.4444

accuracy: 0.4444, loss: 1.0310 ||: 100%|██████████| 1/1 [00:00<00:00, 522.46it/s]
accuracy: 0.4444, loss: 1.0310 ||: 100%|██████████| 1/1 [00:00<00:00, 191.31it/s]
accuracy: 0.4444, loss: 1.0306 ||: 100%|██████████| 1/1 [00:00<00:00, 522.72it/s]
accuracy: 0.4444, loss: 1.0306 ||: 100%|██████████| 1/1 [00:00<00:00, 184.60it/s]
accuracy: 0.4444, loss: 1.0302 ||: 100%|██████████| 1/1 [00:00<00:00, 460.15it/s]
accuracy: 0.4444, loss: 1.0302 ||: 100%|██████████| 1/1 [00:00<00:00, 202.38it/s]
accuracy: 0.4444, loss: 1.0298 ||: 100%|██████████| 1/1 [00:00<00:00, 356.66it/s]
accuracy: 0.4444, loss: 1.0298 ||: 100%|██████████| 1/1 [00:00<00:00, 218.73it/s]
accuracy: 0.4444, loss: 1.0294 ||: 100%|██████████| 1/1 [00:00<00:00, 305.15it/s]
accuracy: 0.4444, loss: 1.0294 ||: 100%|██████████| 1/1 [00:00<00:00, 206.88it/s]
accuracy: 0.4444, loss: 1.0290 ||: 100%|██████████| 1/1 [00:00<00:00, 474.09it/s]
accuracy: 0.4444, loss: 1.0290 ||: 100%|██████████| 1/1 [00:00<00:00, 156.49it/s]
accuracy: 0.4444

accuracy: 0.4444, loss: 1.0052 ||: 100%|██████████| 1/1 [00:00<00:00, 178.32it/s]
accuracy: 0.4444, loss: 1.0045 ||: 100%|██████████| 1/1 [00:00<00:00, 447.63it/s]
accuracy: 0.4444, loss: 1.0045 ||: 100%|██████████| 1/1 [00:00<00:00, 164.31it/s]
accuracy: 0.4444, loss: 1.0037 ||: 100%|██████████| 1/1 [00:00<00:00, 373.56it/s]
accuracy: 0.4444, loss: 1.0037 ||: 100%|██████████| 1/1 [00:00<00:00, 177.84it/s]
accuracy: 0.4444, loss: 1.0030 ||: 100%|██████████| 1/1 [00:00<00:00, 552.10it/s]
accuracy: 0.4444, loss: 1.0030 ||: 100%|██████████| 1/1 [00:00<00:00, 179.70it/s]
accuracy: 0.4444, loss: 1.0022 ||: 100%|██████████| 1/1 [00:00<00:00, 471.01it/s]
accuracy: 0.4444, loss: 1.0022 ||: 100%|██████████| 1/1 [00:00<00:00, 207.13it/s]
accuracy: 0.4444, loss: 1.0015 ||: 100%|██████████| 1/1 [00:00<00:00, 373.42it/s]
accuracy: 0.4444, loss: 1.0015 ||: 100%|██████████| 1/1 [00:00<00:00, 188.92it/s]
accuracy: 0.4444, loss: 1.0007 ||: 100%|██████████| 1/1 [00:00<00:00, 524.62it/s]
accuracy: 0.4444

accuracy: 0.5556, loss: 0.9522 ||: 100%|██████████| 1/1 [00:00<00:00, 445.40it/s]
accuracy: 0.5556, loss: 0.9522 ||: 100%|██████████| 1/1 [00:00<00:00, 153.50it/s]
accuracy: 0.5556, loss: 0.9507 ||: 100%|██████████| 1/1 [00:00<00:00, 468.17it/s]
accuracy: 0.5556, loss: 0.9507 ||: 100%|██████████| 1/1 [00:00<00:00, 154.99it/s]
accuracy: 0.5556, loss: 0.9492 ||: 100%|██████████| 1/1 [00:00<00:00, 543.73it/s]
accuracy: 0.5556, loss: 0.9492 ||: 100%|██████████| 1/1 [00:00<00:00, 169.58it/s]
accuracy: 0.6667, loss: 0.9476 ||: 100%|██████████| 1/1 [00:00<00:00, 485.90it/s]
accuracy: 0.6667, loss: 0.9476 ||: 100%|██████████| 1/1 [00:00<00:00, 183.55it/s]
accuracy: 0.6667, loss: 0.9461 ||: 100%|██████████| 1/1 [00:00<00:00, 488.05it/s]
accuracy: 0.6667, loss: 0.9461 ||: 100%|██████████| 1/1 [00:00<00:00, 208.11it/s]
accuracy: 0.6667, loss: 0.9444 ||: 100%|██████████| 1/1 [00:00<00:00, 317.49it/s]
accuracy: 0.6667, loss: 0.9444 ||: 100%|██████████| 1/1 [00:00<00:00, 192.49it/s]
accuracy: 0.6667

accuracy: 0.6667, loss: 0.8472 ||: 100%|██████████| 1/1 [00:00<00:00, 200.36it/s]
accuracy: 0.6667, loss: 0.8443 ||: 100%|██████████| 1/1 [00:00<00:00, 479.84it/s]
accuracy: 0.6667, loss: 0.8443 ||: 100%|██████████| 1/1 [00:00<00:00, 183.14it/s]
accuracy: 0.6667, loss: 0.8414 ||: 100%|██████████| 1/1 [00:00<00:00, 349.15it/s]
accuracy: 0.6667, loss: 0.8414 ||: 100%|██████████| 1/1 [00:00<00:00, 128.11it/s]
accuracy: 0.6667, loss: 0.8385 ||: 100%|██████████| 1/1 [00:00<00:00, 528.92it/s]
accuracy: 0.6667, loss: 0.8385 ||: 100%|██████████| 1/1 [00:00<00:00, 195.89it/s]
accuracy: 0.6667, loss: 0.8355 ||: 100%|██████████| 1/1 [00:00<00:00, 457.10it/s]
accuracy: 0.6667, loss: 0.8355 ||: 100%|██████████| 1/1 [00:00<00:00, 208.82it/s]
accuracy: 0.6667, loss: 0.8325 ||: 100%|██████████| 1/1 [00:00<00:00, 526.92it/s]
accuracy: 0.6667, loss: 0.8325 ||: 100%|██████████| 1/1 [00:00<00:00, 224.63it/s]
accuracy: 0.6667, loss: 0.8295 ||: 100%|██████████| 1/1 [00:00<00:00, 437.82it/s]
accuracy: 0.6667

accuracy: 0.7778, loss: 0.6709 ||: 100%|██████████| 1/1 [00:00<00:00, 466.86it/s]
accuracy: 0.7778, loss: 0.6709 ||: 100%|██████████| 1/1 [00:00<00:00, 215.70it/s]
accuracy: 0.7778, loss: 0.6669 ||: 100%|██████████| 1/1 [00:00<00:00, 434.60it/s]
accuracy: 0.7778, loss: 0.6669 ||: 100%|██████████| 1/1 [00:00<00:00, 198.02it/s]
accuracy: 0.7778, loss: 0.6628 ||: 100%|██████████| 1/1 [00:00<00:00, 347.35it/s]
accuracy: 0.7778, loss: 0.6628 ||: 100%|██████████| 1/1 [00:00<00:00, 190.41it/s]
accuracy: 0.7778, loss: 0.6587 ||: 100%|██████████| 1/1 [00:00<00:00, 481.50it/s]
accuracy: 0.7778, loss: 0.6587 ||: 100%|██████████| 1/1 [00:00<00:00, 155.13it/s]
accuracy: 0.7778, loss: 0.6546 ||: 100%|██████████| 1/1 [00:00<00:00, 457.79it/s]
accuracy: 0.7778, loss: 0.6546 ||: 100%|██████████| 1/1 [00:00<00:00, 160.00it/s]
accuracy: 0.7778, loss: 0.6506 ||: 100%|██████████| 1/1 [00:00<00:00, 546.13it/s]
accuracy: 0.7778, loss: 0.6506 ||: 100%|██████████| 1/1 [00:00<00:00, 193.46it/s]
accuracy: 0.7778

accuracy: 0.8889, loss: 0.4703 ||: 100%|██████████| 1/1 [00:00<00:00, 205.01it/s]
accuracy: 0.8889, loss: 0.4663 ||: 100%|██████████| 1/1 [00:00<00:00, 405.68it/s]
accuracy: 0.8889, loss: 0.4663 ||: 100%|██████████| 1/1 [00:00<00:00, 196.58it/s]
accuracy: 0.8889, loss: 0.4623 ||: 100%|██████████| 1/1 [00:00<00:00, 419.51it/s]
accuracy: 0.8889, loss: 0.4623 ||: 100%|██████████| 1/1 [00:00<00:00, 179.80it/s]
accuracy: 0.8889, loss: 0.4584 ||: 100%|██████████| 1/1 [00:00<00:00, 498.25it/s]
accuracy: 0.8889, loss: 0.4584 ||: 100%|██████████| 1/1 [00:00<00:00, 141.51it/s]
accuracy: 0.8889, loss: 0.4544 ||: 100%|██████████| 1/1 [00:00<00:00, 612.40it/s]
accuracy: 0.8889, loss: 0.4544 ||: 100%|██████████| 1/1 [00:00<00:00, 193.49it/s]
accuracy: 0.8889, loss: 0.4504 ||: 100%|██████████| 1/1 [00:00<00:00, 483.33it/s]
accuracy: 0.8889, loss: 0.4504 ||: 100%|██████████| 1/1 [00:00<00:00, 141.54it/s]
accuracy: 0.8889, loss: 0.4464 ||: 100%|██████████| 1/1 [00:00<00:00, 297.98it/s]
accuracy: 0.8889

accuracy: 1.0000, loss: 0.2642 ||: 100%|██████████| 1/1 [00:00<00:00, 619.54it/s]
accuracy: 1.0000, loss: 0.2642 ||: 100%|██████████| 1/1 [00:00<00:00, 192.67it/s]
accuracy: 1.0000, loss: 0.2607 ||: 100%|██████████| 1/1 [00:00<00:00, 427.55it/s]
accuracy: 1.0000, loss: 0.2607 ||: 100%|██████████| 1/1 [00:00<00:00, 204.46it/s]
accuracy: 1.0000, loss: 0.2572 ||: 100%|██████████| 1/1 [00:00<00:00, 351.72it/s]
accuracy: 1.0000, loss: 0.2572 ||: 100%|██████████| 1/1 [00:00<00:00, 222.07it/s]
accuracy: 1.0000, loss: 0.2538 ||: 100%|██████████| 1/1 [00:00<00:00, 321.95it/s]
accuracy: 1.0000, loss: 0.2538 ||: 100%|██████████| 1/1 [00:00<00:00, 199.12it/s]
accuracy: 1.0000, loss: 0.2505 ||: 100%|██████████| 1/1 [00:00<00:00, 364.56it/s]
accuracy: 1.0000, loss: 0.2505 ||: 100%|██████████| 1/1 [00:00<00:00, 195.61it/s]
accuracy: 1.0000, loss: 0.2472 ||: 100%|██████████| 1/1 [00:00<00:00, 498.73it/s]
accuracy: 1.0000, loss: 0.2472 ||: 100%|██████████| 1/1 [00:00<00:00, 149.48it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.1509 ||: 100%|██████████| 1/1 [00:00<00:00, 172.17it/s]
accuracy: 1.0000, loss: 0.1495 ||: 100%|██████████| 1/1 [00:00<00:00, 496.84it/s]
accuracy: 1.0000, loss: 0.1495 ||: 100%|██████████| 1/1 [00:00<00:00, 173.61it/s]
accuracy: 1.0000, loss: 0.1480 ||: 100%|██████████| 1/1 [00:00<00:00, 310.07it/s]
accuracy: 1.0000, loss: 0.1480 ||: 100%|██████████| 1/1 [00:00<00:00, 166.47it/s]
accuracy: 1.0000, loss: 0.1466 ||: 100%|██████████| 1/1 [00:00<00:00, 428.08it/s]
accuracy: 1.0000, loss: 0.1466 ||: 100%|██████████| 1/1 [00:00<00:00, 169.09it/s]
accuracy: 1.0000, loss: 0.1452 ||: 100%|██████████| 1/1 [00:00<00:00, 448.40it/s]
accuracy: 1.0000, loss: 0.1452 ||: 100%|██████████| 1/1 [00:00<00:00, 139.55it/s]
accuracy: 1.0000, loss: 0.1439 ||: 100%|██████████| 1/1 [00:00<00:00, 449.84it/s]
accuracy: 1.0000, loss: 0.1439 ||: 100%|██████████| 1/1 [00:00<00:00, 211.37it/s]
accuracy: 1.0000, loss: 0.1425 ||: 100%|██████████| 1/1 [00:00<00:00, 549.14it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0987 ||: 100%|██████████| 1/1 [00:00<00:00, 426.08it/s]
accuracy: 1.0000, loss: 0.0987 ||: 100%|██████████| 1/1 [00:00<00:00, 214.44it/s]
accuracy: 1.0000, loss: 0.0980 ||: 100%|██████████| 1/1 [00:00<00:00, 449.65it/s]
accuracy: 1.0000, loss: 0.0980 ||: 100%|██████████| 1/1 [00:00<00:00, 217.81it/s]
accuracy: 1.0000, loss: 0.0972 ||: 100%|██████████| 1/1 [00:00<00:00, 280.74it/s]
accuracy: 1.0000, loss: 0.0972 ||: 100%|██████████| 1/1 [00:00<00:00, 208.86it/s]
accuracy: 1.0000, loss: 0.0965 ||: 100%|██████████| 1/1 [00:00<00:00, 468.01it/s]
accuracy: 1.0000, loss: 0.0965 ||: 100%|██████████| 1/1 [00:00<00:00, 184.07it/s]
accuracy: 1.0000, loss: 0.0958 ||: 100%|██████████| 1/1 [00:00<00:00, 430.27it/s]
accuracy: 1.0000, loss: 0.0958 ||: 100%|██████████| 1/1 [00:00<00:00, 159.57it/s]
accuracy: 1.0000, loss: 0.0951 ||: 100%|██████████| 1/1 [00:00<00:00, 521.68it/s]
accuracy: 1.0000, loss: 0.0951 ||: 100%|██████████| 1/1 [00:00<00:00, 160.20it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0714 ||: 100%|██████████| 1/1 [00:00<00:00, 145.41it/s]
accuracy: 1.0000, loss: 0.0710 ||: 100%|██████████| 1/1 [00:00<00:00, 470.11it/s]
accuracy: 1.0000, loss: 0.0710 ||: 100%|██████████| 1/1 [00:00<00:00, 187.72it/s]
accuracy: 1.0000, loss: 0.0706 ||: 100%|██████████| 1/1 [00:00<00:00, 607.87it/s]
accuracy: 1.0000, loss: 0.0706 ||: 100%|██████████| 1/1 [00:00<00:00, 209.25it/s]
accuracy: 1.0000, loss: 0.0702 ||: 100%|██████████| 1/1 [00:00<00:00, 278.10it/s]
accuracy: 1.0000, loss: 0.0702 ||: 100%|██████████| 1/1 [00:00<00:00, 172.83it/s]
accuracy: 1.0000, loss: 0.0698 ||: 100%|██████████| 1/1 [00:00<00:00, 420.82it/s]
accuracy: 1.0000, loss: 0.0698 ||: 100%|██████████| 1/1 [00:00<00:00, 146.41it/s]
accuracy: 1.0000, loss: 0.0694 ||: 100%|██████████| 1/1 [00:00<00:00, 529.92it/s]
accuracy: 1.0000, loss: 0.0694 ||: 100%|██████████| 1/1 [00:00<00:00, 175.69it/s]
accuracy: 1.0000, loss: 0.0690 ||: 100%|██████████| 1/1 [00:00<00:00, 368.08it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0547 ||: 100%|██████████| 1/1 [00:00<00:00, 380.37it/s]
accuracy: 1.0000, loss: 0.0547 ||: 100%|██████████| 1/1 [00:00<00:00, 162.63it/s]
accuracy: 1.0000, loss: 0.0545 ||: 100%|██████████| 1/1 [00:00<00:00, 380.06it/s]
accuracy: 1.0000, loss: 0.0545 ||: 100%|██████████| 1/1 [00:00<00:00, 123.16it/s]
accuracy: 1.0000, loss: 0.0542 ||: 100%|██████████| 1/1 [00:00<00:00, 364.69it/s]
accuracy: 1.0000, loss: 0.0542 ||: 100%|██████████| 1/1 [00:00<00:00, 184.77it/s]
accuracy: 1.0000, loss: 0.0539 ||: 100%|██████████| 1/1 [00:00<00:00, 438.41it/s]
accuracy: 1.0000, loss: 0.0539 ||: 100%|██████████| 1/1 [00:00<00:00, 189.11it/s]
accuracy: 1.0000, loss: 0.0537 ||: 100%|██████████| 1/1 [00:00<00:00, 390.31it/s]
accuracy: 1.0000, loss: 0.0537 ||: 100%|██████████| 1/1 [00:00<00:00, 156.20it/s]
accuracy: 1.0000, loss: 0.0534 ||: 100%|██████████| 1/1 [00:00<00:00, 506.37it/s]
accuracy: 1.0000, loss: 0.0534 ||: 100%|██████████| 1/1 [00:00<00:00, 145.45it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0440 ||: 100%|██████████| 1/1 [00:00<00:00, 179.69it/s]
accuracy: 1.0000, loss: 0.0439 ||: 100%|██████████| 1/1 [00:00<00:00, 504.79it/s]
accuracy: 1.0000, loss: 0.0439 ||: 100%|██████████| 1/1 [00:00<00:00, 190.66it/s]
accuracy: 1.0000, loss: 0.0437 ||: 100%|██████████| 1/1 [00:00<00:00, 568.49it/s]
accuracy: 1.0000, loss: 0.0437 ||: 100%|██████████| 1/1 [00:00<00:00, 198.85it/s]
accuracy: 1.0000, loss: 0.0435 ||: 100%|██████████| 1/1 [00:00<00:00, 474.36it/s]
accuracy: 1.0000, loss: 0.0435 ||: 100%|██████████| 1/1 [00:00<00:00, 215.47it/s]
accuracy: 1.0000, loss: 0.0433 ||: 100%|██████████| 1/1 [00:00<00:00, 455.61it/s]
accuracy: 1.0000, loss: 0.0433 ||: 100%|██████████| 1/1 [00:00<00:00, 208.59it/s]
accuracy: 1.0000, loss: 0.0432 ||: 100%|██████████| 1/1 [00:00<00:00, 297.95it/s]
accuracy: 1.0000, loss: 0.0432 ||: 100%|██████████| 1/1 [00:00<00:00, 206.19it/s]
accuracy: 1.0000, loss: 0.0430 ||: 100%|██████████| 1/1 [00:00<00:00, 347.58it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0365 ||: 100%|██████████| 1/1 [00:00<00:00, 471.48it/s]
accuracy: 1.0000, loss: 0.0365 ||: 100%|██████████| 1/1 [00:00<00:00, 199.44it/s]
accuracy: 1.0000, loss: 0.0363 ||: 100%|██████████| 1/1 [00:00<00:00, 452.07it/s]
accuracy: 1.0000, loss: 0.0363 ||: 100%|██████████| 1/1 [00:00<00:00, 199.33it/s]
accuracy: 1.0000, loss: 0.0362 ||: 100%|██████████| 1/1 [00:00<00:00, 400.18it/s]
accuracy: 1.0000, loss: 0.0362 ||: 100%|██████████| 1/1 [00:00<00:00, 184.23it/s]
accuracy: 1.0000, loss: 0.0361 ||: 100%|██████████| 1/1 [00:00<00:00, 411.81it/s]
accuracy: 1.0000, loss: 0.0361 ||: 100%|██████████| 1/1 [00:00<00:00, 143.00it/s]
accuracy: 1.0000, loss: 0.0360 ||: 100%|██████████| 1/1 [00:00<00:00, 444.45it/s]
accuracy: 1.0000, loss: 0.0360 ||: 100%|██████████| 1/1 [00:00<00:00, 162.26it/s]
accuracy: 1.0000, loss: 0.0358 ||: 100%|██████████| 1/1 [00:00<00:00, 376.91it/s]
accuracy: 1.0000, loss: 0.0358 ||: 100%|██████████| 1/1 [00:00<00:00, 179.54it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0310 ||: 100%|██████████| 1/1 [00:00<00:00, 195.92it/s]
accuracy: 1.0000, loss: 0.0309 ||: 100%|██████████| 1/1 [00:00<00:00, 378.75it/s]
accuracy: 1.0000, loss: 0.0309 ||: 100%|██████████| 1/1 [00:00<00:00, 216.08it/s]
accuracy: 1.0000, loss: 0.0308 ||: 100%|██████████| 1/1 [00:00<00:00, 492.64it/s]
accuracy: 1.0000, loss: 0.0308 ||: 100%|██████████| 1/1 [00:00<00:00, 240.39it/s]
accuracy: 1.0000, loss: 0.0308 ||: 100%|██████████| 1/1 [00:00<00:00, 516.54it/s]
accuracy: 1.0000, loss: 0.0308 ||: 100%|██████████| 1/1 [00:00<00:00, 214.14it/s]
accuracy: 1.0000, loss: 0.0307 ||: 100%|██████████| 1/1 [00:00<00:00, 449.69it/s]
accuracy: 1.0000, loss: 0.0307 ||: 100%|██████████| 1/1 [00:00<00:00, 219.64it/s]
accuracy: 1.0000, loss: 0.0306 ||: 100%|██████████| 1/1 [00:00<00:00, 300.73it/s]
accuracy: 1.0000, loss: 0.0306 ||: 100%|██████████| 1/1 [00:00<00:00, 200.52it/s]
accuracy: 1.0000, loss: 0.0305 ||: 100%|██████████| 1/1 [00:00<00:00, 467.33it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0268 ||: 100%|██████████| 1/1 [00:00<00:00, 301.84it/s]
accuracy: 1.0000, loss: 0.0268 ||: 100%|██████████| 1/1 [00:00<00:00, 167.32it/s]
accuracy: 1.0000, loss: 0.0268 ||: 100%|██████████| 1/1 [00:00<00:00, 456.60it/s]
accuracy: 1.0000, loss: 0.0268 ||: 100%|██████████| 1/1 [00:00<00:00, 168.22it/s]
accuracy: 1.0000, loss: 0.0267 ||: 100%|██████████| 1/1 [00:00<00:00, 514.70it/s]
accuracy: 1.0000, loss: 0.0267 ||: 100%|██████████| 1/1 [00:00<00:00, 190.11it/s]
accuracy: 1.0000, loss: 0.0266 ||: 100%|██████████| 1/1 [00:00<00:00, 325.42it/s]
accuracy: 1.0000, loss: 0.0266 ||: 100%|██████████| 1/1 [00:00<00:00, 183.61it/s]
accuracy: 1.0000, loss: 0.0266 ||: 100%|██████████| 1/1 [00:00<00:00, 262.75it/s]
accuracy: 1.0000, loss: 0.0266 ||: 100%|██████████| 1/1 [00:00<00:00, 156.69it/s]
accuracy: 1.0000, loss: 0.0265 ||: 100%|██████████| 1/1 [00:00<00:00, 503.03it/s]
accuracy: 1.0000, loss: 0.0265 ||: 100%|██████████| 1/1 [00:00<00:00, 163.39it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0236 ||: 100%|██████████| 1/1 [00:00<00:00, 184.85it/s]
accuracy: 1.0000, loss: 0.0236 ||: 100%|██████████| 1/1 [00:00<00:00, 624.99it/s]
accuracy: 1.0000, loss: 0.0236 ||: 100%|██████████| 1/1 [00:00<00:00, 188.71it/s]
accuracy: 1.0000, loss: 0.0235 ||: 100%|██████████| 1/1 [00:00<00:00, 459.35it/s]
accuracy: 1.0000, loss: 0.0235 ||: 100%|██████████| 1/1 [00:00<00:00, 136.36it/s]
accuracy: 1.0000, loss: 0.0235 ||: 100%|██████████| 1/1 [00:00<00:00, 377.08it/s]
accuracy: 1.0000, loss: 0.0235 ||: 100%|██████████| 1/1 [00:00<00:00, 179.27it/s]
accuracy: 1.0000, loss: 0.0234 ||: 100%|██████████| 1/1 [00:00<00:00, 267.51it/s]
accuracy: 1.0000, loss: 0.0234 ||: 100%|██████████| 1/1 [00:00<00:00, 187.72it/s]
accuracy: 1.0000, loss: 0.0233 ||: 100%|██████████| 1/1 [00:00<00:00, 266.59it/s]
accuracy: 1.0000, loss: 0.0233 ||: 100%|██████████| 1/1 [00:00<00:00, 132.80it/s]
accuracy: 1.0000, loss: 0.0233 ||: 100%|██████████| 1/1 [00:00<00:00, 443.47it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0210 ||: 100%|██████████| 1/1 [00:00<00:00, 396.55it/s]
accuracy: 1.0000, loss: 0.0210 ||: 100%|██████████| 1/1 [00:00<00:00, 149.64it/s]
accuracy: 1.0000, loss: 0.0210 ||: 100%|██████████| 1/1 [00:00<00:00, 313.01it/s]
accuracy: 1.0000, loss: 0.0210 ||: 100%|██████████| 1/1 [00:00<00:00, 173.20it/s]
accuracy: 1.0000, loss: 0.0209 ||: 100%|██████████| 1/1 [00:00<00:00, 241.15it/s]
accuracy: 1.0000, loss: 0.0209 ||: 100%|██████████| 1/1 [00:00<00:00, 131.36it/s]
accuracy: 1.0000, loss: 0.0209 ||: 100%|██████████| 1/1 [00:00<00:00, 586.12it/s]
accuracy: 1.0000, loss: 0.0209 ||: 100%|██████████| 1/1 [00:00<00:00, 148.17it/s]
accuracy: 1.0000, loss: 0.0208 ||: 100%|██████████| 1/1 [00:00<00:00, 484.78it/s]
accuracy: 1.0000, loss: 0.0208 ||: 100%|██████████| 1/1 [00:00<00:00, 192.30it/s]
accuracy: 1.0000, loss: 0.0208 ||: 100%|██████████| 1/1 [00:00<00:00, 355.75it/s]
accuracy: 1.0000, loss: 0.0208 ||: 100%|██████████| 1/1 [00:00<00:00, 158.75it/s]
accuracy: 1.0000

accuracy: 1.0000, loss: 0.0189 ||: 100%|██████████| 1/1 [00:00<00:00, 176.28it/s]
accuracy: 1.0000, loss: 0.0189 ||: 100%|██████████| 1/1 [00:00<00:00, 537.59it/s]
accuracy: 1.0000, loss: 0.0189 ||: 100%|██████████| 1/1 [00:00<00:00, 187.13it/s]
accuracy: 1.0000, loss: 0.0188 ||: 100%|██████████| 1/1 [00:00<00:00, 325.82it/s]
accuracy: 1.0000, loss: 0.0188 ||: 100%|██████████| 1/1 [00:00<00:00, 152.88it/s]
accuracy: 1.0000, loss: 0.0188 ||: 100%|██████████| 1/1 [00:00<00:00, 288.94it/s]
accuracy: 1.0000, loss: 0.0188 ||: 100%|██████████| 1/1 [00:00<00:00, 149.52it/s]
accuracy: 1.0000, loss: 0.0188 ||: 100%|██████████| 1/1 [00:00<00:00, 307.21it/s]
accuracy: 1.0000, loss: 0.0188 ||: 100%|██████████| 1/1 [00:00<00:00, 162.76it/s]
accuracy: 1.0000, loss: 0.0187 ||: 100%|██████████| 1/1 [00:00<00:00, 541.13it/s]
accuracy: 1.0000, loss: 0.0187 ||: 100%|██████████| 1/1 [00:00<00:00, 167.84it/s]
accuracy: 1.0000, loss: 0.0187 ||: 100%|██████████| 1/1 [00:00<00:00, 365.52it/s]
accuracy: 1.0000

{'best_epoch': 999,
 'peak_cpu_memory_MB': 207.331328,
 'training_duration': '0:00:19.899799',
 'training_start_epoch': 0,
 'training_epochs': 999,
 'epoch': 999,
 'training_accuracy': 1.0,
 'training_loss': 0.01857762038707733,
 'training_cpu_memory_MB': 207.331328,
 'validation_accuracy': 1.0,
 'validation_loss': 0.018540192395448685,
 'best_validation_accuracy': 1.0,
 'best_validation_loss': 0.018540192395448685}