### ALBERT 와 BERT 의 임베딩 파라미터 수 비교

In [2]:
from transformers import BertModel, AlbertModel

In [3]:
bert = BertModel.from_pretrained("bert-base-uncased")
albert = AlbertModel.from_pretrained("albert-base-v2")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/45.2M [00:00<?, ?B/s]

Some weights of the model checkpoint at albert-base-v2 were not used when initializing AlbertModel: ['predictions.LayerNorm.bias', 'predictions.bias', 'predictions.LayerNorm.weight', 'predictions.dense.weight', 'predictions.decoder.weight', 'predictions.decoder.bias', 'predictions.dense.bias']
- This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [4]:
def num_model_param(m):
    return sum(mi.numel() for mi in m.parameters())

In [6]:
bert_embedding = num_model_param(bert.embeddings)
print('number of BERT Embedding Parameters: {}'.format(bert_embedding))

number of BERT Embedding Parameters: 23837184


In [7]:
albert_embedding = num_model_param(albert.embeddings) + num_model_param(albert.encoder.embedding_hidden_mapping_in)
print('number of ALBERT Embedding Parameters: {}'.format(albert_embedding))

number of ALBERT Embedding Parameters: 4005120


In [9]:
100 * (albert_embedding / bert_embedding)

16.801984663960308

### ALBERT 와 BERT 의 인코더 파라미터 수 비교

In [10]:
bert_encoder = num_model_param(bert.encoder)
print('number of BERT Encoder Parameters: {}'.format(bert_encoder))

number of BERT Encoder Parameters: 85054464


In [11]:
albert_encoder = num_model_param(albert.encoder)
print('number of ALBERT Encoder Parameters: {}'.format(albert_encoder))

number of ALBERT Encoder Parameters: 7186944


In [12]:
100 * (albert_encoder / bert_encoder)

8.44981399212627