Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong transition in crf when doing a sequence labeling task #32

Open
SefaZeng opened this issue Oct 25, 2018 · 4 comments
Open

Wrong transition in crf when doing a sequence labeling task #32

SefaZeng opened this issue Oct 25, 2018 · 4 comments

Comments

@SefaZeng
Copy link

SefaZeng commented Oct 25, 2018

I use the ChainCRF.py as the CRF Layer in my model to do a sequence labeling task using the OBIE as the tags ,but I meet a problemthat there are some unexpected transition in the predict like E to I.
And it doesn't show up in the train data.
The keras version is 2.2.2.And tensorflow is 1.10.0
the code:

from keras.preprocessing import text, sequence
from keras.layers import *
from keras.models import *
from keras.callbacks import EarlyStopping,ModelCheckpoint
from ChainCRF import ChainCRF
from keras import backend as K

def Bilstm_CNN_Crf(maxlen,nb_words,class_label_count,embedding_weights=None,is_train=True):
    word_input=Input(shape=(maxlen,),dtype='int32',name='word_input')
    word_emb=Embedding(nb_words+1,output_dim=100,\
                    input_length=maxlen,\
                    embeddings_initializer = 'uniform',
                    name='word_emb')(word_input)
    # bilstm
    bilstm=Bidirectional(LSTM(64,return_sequences=True))(word_emb)
    bilstm_d=Dropout(0.1)(bilstm)

    # cnn
    half_window_size=2
    padding_layer=ZeroPadding1D(padding=half_window_size)(word_emb)
    conv=Conv1D(nb_filter=50,filter_length=2*half_window_size+1,\
            padding='valid')(padding_layer)
    conv_d=Dropout(0.1)(conv)
    dense_conv=TimeDistributed(Dense(50))(conv_d)

    # merge
    rnn_cnn_merge=concatenate([bilstm_d,dense_conv])
    dense=TimeDistributed(Dense(class_label_count))(rnn_cnn_merge)

    # crf
    crf = ChainCRF(name='CRF_Layer')
    crf_output=crf(dense)

    # build model
    model=Model(inputs=[word_input],outputs=[crf_output])

    model.compile(loss=crf.loss,optimizer='adam',metrics=['accuracy'])

    # model.summary()

    return model

model = Bilstm_CNN_Crf(maxlen, nb_words, 5)
earlystop = EarlyStopping(monitor='val_acc',patience=2,verbose=1)
checkpoint = ModelCheckpoint('best_model.hdf5',monitor='val_acc',verbose=1,save_best_only=True,period=1,save_weights_only=True)
model.fit(x_train_1, y, epochs=epochs, batch_size=64, verbose=1,validation_data=(x_train_1,y),callbacks=[earlystop,checkpoint])
model.load_weights('best_model.hdf5')
pred_prob = model.predict(x_train_1)
pred = np.argmax(pred_prob, axis=2)

Is there something wrong with the model?Or somet badcase that i didnt find in the data?
Any help is appreciate!Thx!

@nreimers
Copy link
Member

Hi @SefaZeng
This issue also happens with my code: in-valid transitions (e.g. O I-PER) are produced by the BiLSTM-CRF model.

The issue is sadly not trivial and I don't know how to fix it.

The CRF is initialized with random probabilities for the transitions, i.e. O I-PER can be as likely as O B-PER. Of course, the CRF does not know anything from the encoding and about allowed transitions.

During training, these transition probabilities are updated, so that the CRF learns that O I-PER is unlikely. However, it converges rather slowly to a 0 probability. This makes sense, as how should the CRF be able to distinguish that O I-PER is not possible at all and 'it is rare but I haven't seen enough data'.

With more epochs, the number of invalid tags usually converge to a low number or even to zero in my experiments.

As I solution what I use is a post-processing step: The code checks whether the tags from the CRF are valid BIO-encoded. If it finds an invalid tag, it sets this tag to O.

@SefaZeng
Copy link
Author

Hi @SefaZeng
This issue also happens with my code: in-valid transitions (e.g. O I-PER) are produced by the BiLSTM-CRF model.

The issue is sadly not trivial and I don't know how to fix it.

The CRF is initialized with random probabilities for the transitions, i.e. O I-PER can be as likely as O B-PER. Of course, the CRF does not know anything from the encoding and about allowed transitions.

During training, these transition probabilities are updated, so that the CRF learns that O I-PER is unlikely. However, it converges rather slowly to a 0 probability. This makes sense, as how should the CRF be able to distinguish that O I-PER is not possible at all and 'it is rare but I haven't seen enough data'.

With more epochs, the number of invalid tags usually converge to a low number or even to zero in my experiments.

As I solution what I use is a post-processing step: The code checks whether the tags from the CRF are valid BIO-encoded. If it finds an invalid tag, it sets this tag to O.

Can I set the initial states to zero to avoid this problem?

@nreimers
Copy link
Member

@SefaZeng I think that could work, however, you would need to ensure to get the mapping right. Especially when the number of tags changes (e.g. you add B-LOC and I-LOC to your tagset), you must ensure that you set the zeros at the right place. Otherwise it can easily happen that B-LOC => I-LOC is initialized with a zero probability.

Further, the CRF is bi-directional, i.e. not only the previous label is important but also the next label determines which label is produced. This can make it rather complicated to initialize the CRF correctly.

@SefaZeng
Copy link
Author

@nreimers Emmm.. I set the initializer of U, b_start, b_end and initial state in the viterbi_decode to zeros,but it doesn't work.Maybe post-processing is the only way.
But I am still confusing why it will happen.Because in statistic opinion, if the in-valid transitions never appear in the data,the probability or maybe the weights in the neural network should be very low or only zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants