Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU usage of model, if it's possible to increase the max length of the text to 256 or 512? #3

Closed
LiuHao-THU opened this issue Sep 22, 2020 · 8 comments

Comments

@LiuHao-THU
Copy link

span based model suffer from to large max_len of text
The batch size of your model = 6
max_length of text = 128
num_classes of relation
you predict a 3d Matrix shaped 6 * 128 * 128 * 50
if it's possible to increase the max length of the text to 256 or 512?

@X-jun-0130
Copy link
Owner

Yes,of course

@LiuHao-THU
Copy link
Author

Yes,of course

thanks for your reply!

  I tried to run the model in max_len = 256 using 2080ti, the batch size of the model can set to 6 max.  but I don't quite understand the tf.round in the following code.

def extra_sujects(self, ner_label): ner = ner_label[0] ner = tf.round(ner) ner = [tf.argmax(ner[k]) for k in range(ner.shape[0])] new_ner = list(np.array(ner)) ner = list(np.array(ner))[1:-1] ner.append(0)#防止最后一位不为0 text_list = [key for key in self.text] subject = [] for i, k in enumerate(text_list): if int(ner[i]) == 0 or int(ner[i]) == 2: continue elif int(ner[i]) == 1: ner_back = [int(j) for j in ner[i + 1:]] if 1 in ner_back and 0 in ner_back: indics_1 = ner_back.index(1) + i indics_0 = ner_back.index(0) + i subject.append((''.join(text_list[i: min(indics_0, indics_1) + 1]), i + 1)) elif 1 not in ner_back: indics = ner_back.index(0) + i subject.append((''.join(text_list[i:indics + 1]), i + 1)) return subject, new_ner

@LiuHao-THU
Copy link
Author

the f1 increase slowly after 30 epoch trains on the joint model. using training data 30000 and dev data 100. if it's better to change the learning rate of bert and relation extraction network?

Epoch: 1
测试集: F 0.000000: P 1.000000: R 0.000000:
Epoch: 2
测试集: F 0.017467: P 0.181818: R 0.017467:
Epoch: 3
测试集: F 0.281525: P 0.390244: R 0.281525:
Epoch: 4
测试集: F 0.529002: P 0.535211: R 0.529002:
saving_model
Epoch: 5
测试集: F 0.524731: P 0.493927: R 0.524731:
Epoch: 6
测试集: F 0.554745: P 0.590674: R 0.554745:
saving_model
Epoch: 7
测试集: F 0.668103: P 0.630081: R 0.668103:
saving_model
Epoch: 8
测试集: F 0.611973: P 0.592275: R 0.611973:
Epoch: 9
测试集: F 0.628821: P 0.600000: R 0.628821:
Epoch: 10
测试集: F 0.617169: P 0.624413: R 0.617169:
Epoch: 11
测试集: F 0.600462: P 0.604651: R 0.600462:
Epoch: 12
测试集: F 0.606593: P 0.582278: R 0.606593:
Epoch: 13
测试集: F 0.638695: P 0.649289: R 0.638695:
Epoch: 14
测试集: F 0.631090: P 0.638498: R 0.631090:
Epoch: 15
测试集: F 0.632258: P 0.595142: R 0.632258:
Epoch: 16
测试集: F 0.671024: P 0.639004: R 0.671024:
saving_model
Epoch: 17
测试集: F 0.635347: P 0.620087: R 0.635347:
Epoch: 18
测试集: F 0.657596: P 0.650224: R 0.657596:
Epoch: 19
测试集: F 0.647826: P 0.615702: R 0.647826:
Epoch: 20
测试集: F 0.653422: P 0.629787: R 0.653422:
Epoch: 21
测试集: F 0.665188: P 0.643777: R 0.665188:
Epoch: 22
测试集: F 0.654867: P 0.632479: R 0.654867:
Epoch: 23
测试集: F 0.644231: P 0.676768: R 0.644231:
Epoch: 24
测试集: F 0.632794: P 0.637209: R 0.632794:
Epoch: 25
测试集: F 0.641026: P 0.600000: R 0.641026:
Epoch: 26
测试集: F 0.684932: P 0.681818: R 0.684932:
saving_model
Epoch: 27
测试集: F 0.642032: P 0.646512: R 0.642032:
Epoch: 28
测试集: F 0.690265: P 0.666667: R 0.690265:
saving_model
Epoch: 29
测试集: F 0.708972: P 0.677824: R 0.708972:
saving_model
Epoch: 30
测试集: F 0.660422: P 0.674641: R 0.660422:
Epoch: 31
测试集: F 0.693157: P 0.668085: R 0.693157:
Epoch: 32
测试集: F 0.601695: P 0.559055: R 0.601695:

@X-jun-0130
Copy link
Owner

  1. but I don't quite understand the tf.round in the following code.
    tf.round(0.6) = 1; tf.round(0.4) = 0;
    网络中,softmax输出是各标签分布的概率值,并不是整数,所以ner输出值在(0,1)内;需要将其转换为整数才能进行subject提取;

2.if it's better to change the learning rate of bert and relation extraction network?
我试过不同的learning rate但结果并没有提升多少;网络提升速度慢是因为第二层网络输出是一个相对大而十分稀疏的矩阵,训练难度大。如果不适用bert,仅用lstm等常规网络,训练速度更慢。

原论文作者在训练网络时,使用了更多的epochs,当然也有更多的数据。

@LiuHao-THU
Copy link
Author

  1. but I don't quite understand the tf.round in the following code.
    tf.round(0.6) = 1; tf.round(0.4) = 0;
    网络中,softmax输出是各标签分布的概率值,并不是整数,所以ner输出值在(0,1)内;需要将其转换为整数才能进行subject提取;

2.if it's better to change the learning rate of bert and relation extraction network?
我试过不同的learning rate但结果并没有提升多少;网络提升速度慢是因为第二层网络输出是一个相对大而十分稀疏的矩阵,训练难度大。如果不适用bert,仅用lstm等常规网络,训练速度更慢。

原论文作者在训练网络时,使用了更多的epochs,当然也有更多的数据。

您好,多谢回复。

  1. softmax直接去argmax就可以取出标签了啊,你这样会不会如果[0.4, 0.41, 0...]会造成二选一的情况
  2. 您尝试不同的lr是整体调节的还是只在第二个关系抽取模型中改的啊
  3. 我现在使用了64000数据,使用的label embedding的网络进行的抽取,现在训练到epoch = 18 最大是 75% 左右,感觉应该继续训练还能涨,您那个描述是不是写错了?

·模型使用3000条训练数据,最后对测试集前100进行验证,最大F1值是81.8%;
模型太重,没有取过多数据训练,应该还可以继续提高的。·

@X-jun-0130
Copy link
Owner

1.argmax([0.2,0.3,0.4,0.5,0.3]) = 3
tf.round(0.2,0.3,0.4,0.5,0.3]) = [0,0,0,1,0]
后者的形式是我要的,ner序列;
2. 整体抽取模型 和 单独进行关系抽取的都进行过几次lr测试;但没有太细致的去研究。
3.不是3000,是30000,写错了。感谢提醒

@LiuHao-THU
Copy link
Author

`class Ner_model(tf.keras.Model):
def init(self, bert_model):
super(Ner_model, self).init()
self.bert = bert_model
#self.dense_fuc = tf.keras.layers.Dense(100, use_bias=False) #全连接层
self.dense = tf.keras.layers.Dense(label_class)

def call(self, inputs, mask, segment):
    output_encode, _ = self.bert([inputs, mask, segment])
    #x = self.dense_fuc(output_encode)
    x = self.dense(output_encode)
    **x = tf.nn.softmax(x)**
    return x, output_encode`

您好,多谢回复

个人感觉您还是说的有点问题

这里tf.nn.softmax(x)之后的维度应该是batch * max_len * num_classes_of_entity

按照您的做法的前面应该是sigmod才对

如果是softmax的话,直接在num_classes_of_entity做softmax是最好的。

@X-jun-0130
Copy link
Owner

ner = ner_label[0]
ner = tf.round(ner) #你说的对的,这步多余了,可以删掉
ner = [tf.argmax(ner[k]) for k in range(ner.shape[0])]
ner = list(np.array(ner))[1:-1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants