GPU usage of model, if it's possible to increase the max length of the text to 256 or 512? #3

LiuHao-THU · 2020-09-22T07:21:49Z

span based model suffer from to large max_len of text
The batch size of your model = 6
max_length of text = 128
num_classes of relation
you predict a 3d Matrix shaped 6 * 128 * 128 * 50
if it's possible to increase the max length of the text to 256 or 512?

X-jun-0130 · 2020-09-24T07:01:52Z

Yes，of course

LiuHao-THU · 2020-09-24T08:49:19Z

Yes，of course

thanks for your reply!

  I tried to run the model in max_len = 256 using 2080ti, the batch size of the model can set to 6 max.  but I don't quite understand the tf.round in the following code.

def extra_sujects(self, ner_label): ner = ner_label[0] ner = tf.round(ner) ner = [tf.argmax(ner[k]) for k in range(ner.shape[0])] new_ner = list(np.array(ner)) ner = list(np.array(ner))[1:-1] ner.append(0)#防止最后一位不为0 text_list = [key for key in self.text] subject = [] for i, k in enumerate(text_list): if int(ner[i]) == 0 or int(ner[i]) == 2: continue elif int(ner[i]) == 1: ner_back = [int(j) for j in ner[i + 1:]] if 1 in ner_back and 0 in ner_back: indics_1 = ner_back.index(1) + i indics_0 = ner_back.index(0) + i subject.append((''.join(text_list[i: min(indics_0, indics_1) + 1]), i + 1)) elif 1 not in ner_back: indics = ner_back.index(0) + i subject.append((''.join(text_list[i:indics + 1]), i + 1)) return subject, new_ner

LiuHao-THU · 2020-09-27T01:16:09Z

the f1 increase slowly after 30 epoch trains on the joint model. using training data 30000 and dev data 100. if it's better to change the learning rate of bert and relation extraction network?

Epoch: 1
测试集: F 0.000000: P 1.000000: R 0.000000:
Epoch: 2
测试集: F 0.017467: P 0.181818: R 0.017467:
Epoch: 3
测试集: F 0.281525: P 0.390244: R 0.281525:
Epoch: 4
测试集: F 0.529002: P 0.535211: R 0.529002:
saving_model
Epoch: 5
测试集: F 0.524731: P 0.493927: R 0.524731:
Epoch: 6
测试集: F 0.554745: P 0.590674: R 0.554745:
saving_model
Epoch: 7
测试集: F 0.668103: P 0.630081: R 0.668103:
saving_model
Epoch: 8
测试集: F 0.611973: P 0.592275: R 0.611973:
Epoch: 9
测试集: F 0.628821: P 0.600000: R 0.628821:
Epoch: 10
测试集: F 0.617169: P 0.624413: R 0.617169:
Epoch: 11
测试集: F 0.600462: P 0.604651: R 0.600462:
Epoch: 12
测试集: F 0.606593: P 0.582278: R 0.606593:
Epoch: 13
测试集: F 0.638695: P 0.649289: R 0.638695:
Epoch: 14
测试集: F 0.631090: P 0.638498: R 0.631090:
Epoch: 15
测试集: F 0.632258: P 0.595142: R 0.632258:
Epoch: 16
测试集: F 0.671024: P 0.639004: R 0.671024:
saving_model
Epoch: 17
测试集: F 0.635347: P 0.620087: R 0.635347:
Epoch: 18
测试集: F 0.657596: P 0.650224: R 0.657596:
Epoch: 19
测试集: F 0.647826: P 0.615702: R 0.647826:
Epoch: 20
测试集: F 0.653422: P 0.629787: R 0.653422:
Epoch: 21
测试集: F 0.665188: P 0.643777: R 0.665188:
Epoch: 22
测试集: F 0.654867: P 0.632479: R 0.654867:
Epoch: 23
测试集: F 0.644231: P 0.676768: R 0.644231:
Epoch: 24
测试集: F 0.632794: P 0.637209: R 0.632794:
Epoch: 25
测试集: F 0.641026: P 0.600000: R 0.641026:
Epoch: 26
测试集: F 0.684932: P 0.681818: R 0.684932:
saving_model
Epoch: 27
测试集: F 0.642032: P 0.646512: R 0.642032:
Epoch: 28
测试集: F 0.690265: P 0.666667: R 0.690265:
saving_model
Epoch: 29
测试集: F 0.708972: P 0.677824: R 0.708972:
saving_model
Epoch: 30
测试集: F 0.660422: P 0.674641: R 0.660422:
Epoch: 31
测试集: F 0.693157: P 0.668085: R 0.693157:
Epoch: 32
测试集: F 0.601695: P 0.559055: R 0.601695:

X-jun-0130 · 2020-09-27T09:17:18Z

but I don't quite understand the tf.round in the following code.
tf.round(0.6) = 1; tf.round(0.4) = 0;
网络中，softmax输出是各标签分布的概率值，并不是整数，所以ner输出值在(0,1)内;需要将其转换为整数才能进行subject提取；

2.if it's better to change the learning rate of bert and relation extraction network?
我试过不同的learning rate但结果并没有提升多少；网络提升速度慢是因为第二层网络输出是一个相对大而十分稀疏的矩阵，训练难度大。如果不适用bert，仅用lstm等常规网络，训练速度更慢。

原论文作者在训练网络时，使用了更多的epochs，当然也有更多的数据。

LiuHao-THU · 2020-09-27T09:24:08Z

but I don't quite understand the tf.round in the following code.
tf.round(0.6) = 1; tf.round(0.4) = 0;
网络中，softmax输出是各标签分布的概率值，并不是整数，所以ner输出值在(0,1)内;需要将其转换为整数才能进行subject提取；

2.if it's better to change the learning rate of bert and relation extraction network?
我试过不同的learning rate但结果并没有提升多少；网络提升速度慢是因为第二层网络输出是一个相对大而十分稀疏的矩阵，训练难度大。如果不适用bert，仅用lstm等常规网络，训练速度更慢。

原论文作者在训练网络时，使用了更多的epochs，当然也有更多的数据。

您好，多谢回复。

softmax直接去argmax就可以取出标签了啊，你这样会不会如果[0.4, 0.41, 0...]会造成二选一的情况
您尝试不同的lr是整体调节的还是只在第二个关系抽取模型中改的啊
我现在使用了64000数据，使用的label embedding的网络进行的抽取，现在训练到epoch = 18 最大是 75% 左右，感觉应该继续训练还能涨，您那个描述是不是写错了？

·模型使用3000条训练数据，最后对测试集前100进行验证，最大F1值是81.8%;
模型太重，没有取过多数据训练，应该还可以继续提高的。·

X-jun-0130 · 2020-09-27T09:34:57Z

1.argmax([0.2,0.3,0.4,0.5,0.3]) = 3
tf.round(0.2,0.3,0.4,0.5,0.3]) = [0,0,0,1,0]
后者的形式是我要的，ner序列;
2. 整体抽取模型和单独进行关系抽取的都进行过几次lr测试；但没有太细致的去研究。
3.不是3000，是30000，写错了。感谢提醒

LiuHao-THU · 2020-09-27T09:40:49Z

`class Ner_model(tf.keras.Model):
def init(self, bert_model):
super(Ner_model, self).init()
self.bert = bert_model
#self.dense_fuc = tf.keras.layers.Dense(100, use_bias=False) #全连接层
self.dense = tf.keras.layers.Dense(label_class)

def call(self, inputs, mask, segment):
    output_encode, _ = self.bert([inputs, mask, segment])
    #x = self.dense_fuc(output_encode)
    x = self.dense(output_encode)
    **x = tf.nn.softmax(x)**
    return x, output_encode`

您好，多谢回复

个人感觉您还是说的有点问题

这里tf.nn.softmax(x)之后的维度应该是batch * max_len * num_classes_of_entity

按照您的做法的前面应该是sigmod才对

如果是softmax的话，直接在num_classes_of_entity做softmax是最好的。

X-jun-0130 · 2020-09-27T09:52:19Z

ner = ner_label[0]
ner = tf.round(ner) #你说的对的，这步多余了，可以删掉
ner = [tf.argmax(ner[k]) for k in range(ner.shape[0])]
ner = list(np.array(ner))[1:-1]

X-jun-0130 closed this as completed Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU usage of model, if it's possible to increase the max length of the text to 256 or 512? #3

GPU usage of model, if it's possible to increase the max length of the text to 256 or 512? #3

LiuHao-THU commented Sep 22, 2020

X-jun-0130 commented Sep 24, 2020

LiuHao-THU commented Sep 24, 2020

LiuHao-THU commented Sep 27, 2020

X-jun-0130 commented Sep 27, 2020

LiuHao-THU commented Sep 27, 2020

X-jun-0130 commented Sep 27, 2020

LiuHao-THU commented Sep 27, 2020

X-jun-0130 commented Sep 27, 2020

GPU usage of model, if it's possible to increase the max length of the text to 256 or 512? #3

GPU usage of model, if it's possible to increase the max length of the text to 256 or 512? #3

Comments

LiuHao-THU commented Sep 22, 2020

X-jun-0130 commented Sep 24, 2020

LiuHao-THU commented Sep 24, 2020

LiuHao-THU commented Sep 27, 2020

X-jun-0130 commented Sep 27, 2020

LiuHao-THU commented Sep 27, 2020

X-jun-0130 commented Sep 27, 2020

LiuHao-THU commented Sep 27, 2020

X-jun-0130 commented Sep 27, 2020