## 11.3.5	Fine Tuning vortrainierter Netze

### 01 - Das vortrainierte Modell mit spezifischem Kopf laden

In [1]:
from transformers import ( AutoTokenizer,
                           TFAutoModelForSequenceClassification)

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
model = TFAutoModelForSequenceClassification.from_pretrained( 
                 'bert-base-cased', num_labels=2 )

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


#### Modell inspzieren

In [2]:
model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bert (TFBertMainLayer)       multiple                  108310272 
_________________________________________________________________
dropout_37 (Dropout)         multiple                  0         
_________________________________________________________________
classifier (Dense)           multiple                  1538      
Total params: 108,311,810
Trainable params: 108,311,810
Non-trainable params: 0
_________________________________________________________________


### 02 - Eine Durchleitung organisieren

In [3]:
text = "It's impossible to watch this movie for longer than 5 minutes"
inputs = tokenizer(text, return_tensors='tf')
output = model(**inputs)
output

TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[-0.19170247, -0.01770693]], dtype=float32)>, hidden_states=None, attentions=None)

#### Ausgabe mit Softmax aktivieren

In [4]:
import tensorflow as tf

predictions = tf.math.softmax(output['logits'], axis=-1)
predictions

<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0.45661053, 0.5433895 ]], dtype=float32)>

### 03 - Teile des Netzes auf nicht-trainierbar einstellen

#### Den kompletten Encoder auf nicht-trainierbar einstellen

In [5]:
model.get_layer(index=0).trainable = False
model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bert (TFBertMainLayer)       multiple                  108310272 
_________________________________________________________________
dropout_37 (Dropout)         multiple                  0         
_________________________________________________________________
classifier (Dense)           multiple                  1538      
Total params: 108,311,810
Trainable params: 1,538
Non-trainable params: 108,310,272
_________________________________________________________________


#### Die Layer des Encoders checken

In [6]:
main_model = model.get_layer(index=0)
for idx, layer in enumerate(main_model.encoder.layer):
   print(idx, layer)

0 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4D5D588>
1 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4D41EC8>
2 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4DAC408>
3 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4DBF988>
4 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4DD7EC8>
5 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4DF03C8>
6 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4E06A08>
7 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4E200C8>
8 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4E36708>
9 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4E4BC88>
10 <transformers.models.bert.modeling_tf_bert.TFBertLayer object at 0x00000155F4E653C8>
11 <transformers.models.bert.modeling_tf_b

#### Die letzten drei Schichten auf *not trainable* setzen

In [7]:
main_model.trainable = True
for idx, layer in enumerate(main_model.encoder.layer):
   if idx in range(0, 9):
      layer.trainable = False
model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bert (TFBertMainLayer)       multiple                  108310272 
_________________________________________________________________
dropout_37 (Dropout)         multiple                  0         
_________________________________________________________________
classifier (Dense)           multiple                  1538      
Total params: 108,311,810
Trainable params: 44,520,962
Non-trainable params: 63,790,848
_________________________________________________________________


### 4 - Das Modell nachtrainieren

#### IMDB-Review-Daten laden

In [8]:
import pandas as pd
from os.path import join

path = r'..\Data'
file = 'IMDB dataset.csv'

df = pd.read_csv(join(path, file))

from sklearn.model_selection import train_test_split

X = df['review'].values
y = df['sentiment'].map(lambda x: 1 if x=='positive' else 0).values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.1, random_state=11)
df.head(), X_train.shape, y_train.shape, X_test.shape, y_test.shape

(                                              review sentiment
 0  One of the other reviewers has mentioned that ...  positive
 1  A wonderful little production. <br /><br />The...  positive
 2  I thought this was a wonderful way to spend ti...  positive
 3  Basically there's a family where a little boy ...  negative
 4  Petter Mattei's "Love in the Time of Money" is...  positive,
 (45000,),
 (45000,),
 (5000,),
 (5000,))

#### Die x-Daten tokenisieren

In [9]:
X_train_tok = dict(tokenizer(X_train.tolist(), padding="max_length", 
                        truncation=True, max_length=50, return_tensors='tf'
                        ))
X_test_tok = dict(tokenizer(X_test.tolist(), padding="max_length", 
                       truncation=True,  max_length=50, return_tensors='tf'
                       ))
X_train_tok['input_ids'].shape, X_test_tok['input_ids'].shape

(TensorShape([45000, 50]), TensorShape([5000, 50]))

#### Das Modell kompilieren

In [10]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy

model.compile(
    optimizer=Adam(learning_rate=5e-5),
    loss=SparseCategoricalCrossentropy(from_logits=True),
    metrics="accuracy"
)

#### Den kompletten Encoder wider auf not trainable setzen

In [11]:
model.get_layer(index=0).trainable = False
model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bert (TFBertMainLayer)       multiple                  108310272 
_________________________________________________________________
dropout_37 (Dropout)         multiple                  0         
_________________________________________________________________
classifier (Dense)           multiple                  1538      
Total params: 108,311,810
Trainable params: 1,538
Non-trainable params: 108,310,272
_________________________________________________________________


#### Den Anlernprozess starten
**Hinweis:** Trainingsprozess ist hier nur angedeutet

In [12]:
model.fit( X_train_tok, 
          y_train, 
           epochs=1, 
           batch_size=16, 
           validation_data=(X_test_tok, y_test))

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
  96/2813 [>.............................] - ETA: 3:45:32 - loss: 0.6287 - accuracy: 0.6191

#### Angelerntes Modell abspeichern bzw. laden

In [13]:
### Speichern
model.save_pretrained('bert_sentiment_trained')

#### Laden
model = TFAutoModelForSequenceClassification.from_pretrained(
                                         'bert_sentiment_trained')

Some layers from the model checkpoint at bert_sentiment_trained were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at bert_sentiment_trained.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


#### Schätungen durcführen

In [14]:
import tensorflow as tf

text = "It's impossible to watch this movie for longer than 5 minutes"
inputs = tokenizer(text, return_tensors='tf')
output = model(**inputs)
predictions = tf.math.softmax(output['logits'], axis=-1)
predictions

<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[0.72925985, 0.27074018]], dtype=float32)>

Interpretation: Class 0 (negative): 72,9%, Class 1 (positive): 27,1%