BERT example pipelines by tommywei110 · Pull Request #2036 · tensorflow/tfx

tommywei110 · 2020-06-23T17:40:53Z

Bert example pipeline on Cola dataset

tfx/examples/bert_cola/bert_models.py

tfx/examples/bert_cola/bert_utils.py

tfx/examples/bert_cola/bert_models.py

davidzats-eng · 2020-06-26T06:25:36Z

tfx/examples/bert_cola/bert_utils.py

+def _tokenize(feature):
+    """Tokenize the two sentences and insert appropriate tokens"""
+    asset_dir = os.path.join(os.environ['HOME'], 'bert_cola/assets')
+    vocab_dir = os.path.join(asset_dir, 'vocab.txt') 


Isn't the vocab a function of the BERT model we pick? If so, then how does it get populated dynamically?

kept for reference

tfx/examples/bert_cola/bert_models.py

davidzats-eng · 2020-06-26T07:07:49Z

tfx/examples/bert_cola/bert_tokenizer_utils.py

+    def tokenize_single_sentence(
+        self,
+        sequence,
+        max_len=128,


The max_length may be unnecessarily large or too small for Cola. Lets figure out what is appropriate.

oh and also, this is capped by Bert at 512

tfx/examples/bert_cola/bert_utils.py

davidzats-eng · 2020-06-26T07:16:53Z

tfx/examples/bert_cola/bert_models.py

+        input_mask_layer,
+        input_type_ids_layer])
+
+    hidden = pooled_output


Should we be using the sequence output corresponding to the first token (CLS) or pooled_output?

seems like pooled_output

yes, lets write a comment saying that the pooled input for this model is a dense layer on top of the CLS token so that others know about the rationale.

tfx/examples/bert_cola/bert_models.py

tfx/examples/bert/cola/bert_cola_pipeline.py

tfx/examples/bert/utils/bert_tokenizer_utils.py

tfx/examples/bert/cola/bert_cola_utils.py

tfx/examples/bert/mrpc/bert_mrpc_pipeline.py

tfx/examples/bert/cola/bert_cola_pipeline.py

tfx/examples/bert/utils/bert_tokenizer_utils.py

tfx/examples/bert/mrpc/bert_mrpc_utils.py

tfx/examples/bert/utils/bert_models.py

tfx/examples/bert/utils/bert_tokenizer_utils.py

liamcrawford · 2020-07-07T00:30:59Z

tfx/examples/bert/utils/bert_tokenizer_utils.py

+        sequence_b,
+        sentence_len,
+        False,
+        True


I'm not 100% certain, but I think [CLS] sentence_A [SEP] sentence_B [SEP] is correct.

tfx/examples/bert/utils/bert_tokenizer_utils.py

tfx/examples/bert/cola/bert_cola_pipeline.py

tfx/examples/bert/cola/bert_cola_utils.py

tfx/examples/bert/mrpc/bert_mrpc_pipeline.py

tfx/examples/bert/utils/bert_tokenizer_utils.py

tfx/examples/bert/utils/bert_models.py

davidzats-eng · 2020-07-23T16:32:21Z

tfx/examples/bert/utils/bert_models.py

+  )
+
+def build_and_compile_bert_classifier(
+    bert_layer,


how about we pass the link to the hub module instead? feels more consistent

so the thought here is if users where to provide their own pretrained BERT layer, they can still use it here.

tfx/examples/bert/utils/bert_models.py

tfx/examples/bert/utils/bert_tokenizer_utils.py

tommywei110 · 2020-07-24T16:59:23Z

can you do this?

self._pad_id = lines.index(_PAD)
self._cls_id = lines.index(_PAD)
self._pad_id = lines.index(_PAD)

Yes. good suggestion!

tfx/examples/bert/cola/bert_cola_pipeline.py

tfx/examples/bert/cola/bert_cola_utils.py

PiperOrigin-RevId: 324077454

PiperOrigin-RevId: 325288648

github-actions · 2020-08-30T01:32:38Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

bert first commit

daa4397

googlebot added the cla: yes label Jun 23, 2020

sees 70% accuracy

0a56957

davidzats-eng self-requested a review June 26, 2020 03:15

davidzats-eng reviewed Jun 26, 2020

View reviewed changes

davidzats-eng requested a review from liamcrawford June 26, 2020 07:17

davidzats-eng reviewed Jun 26, 2020

View reviewed changes

tfx/examples/bert_cola/bert_models.py Outdated Show resolved Hide resolved

tommywei added 13 commits June 26, 2020 09:34

address a few comments

5d9ba49

temp save, work on imdb

38c636f

address a few comments, move to ct to test models

c0e8d45

fix a lot of pylint issues

38c05a7

intent issue

81eefcc

working, actually gonna train on ct

6f490ae

match paper acc

850bf7b

address comments

ef60c7b

pylint fix

3d10281

more pylint fixes

89f0fc2

more lint

a631719

final lint

b3079cf

use eager model, dynamically use model's vocab

e3a9465

tommywei110 changed the title ~~BERT Cola example pipeline~~ BERT example pipelines Jun 30, 2020

tommywei added 4 commits June 30, 2020 14:30

Merge branch 'master' of https://github.com/tensorflow/tfx into bert

dc14da7

change file structure, add mrpc

874e4dd

some small changes

aa4da74

80% val acc for mrpc

8942740

davidzats-eng suggested changes Jul 6, 2020

View reviewed changes

tommywei added 2 commits July 6, 2020 13:56

address some commnents

3b2ae30

last comment to address

8c98f2a

liamcrawford suggested changes Jul 7, 2020

View reviewed changes

code up ragged padding

15f90a4

tommywei added 7 commits July 21, 2020 14:00

change cola input split, change model API

ccfd095

fix lint

1c5859f

add readme, switch to small version of the datasets

8eebc38

change example thresholds

0a70c9a

dont default array

3e89ee2

change number of worker

ab5ac39

fix indentation

ad9ea1f

davidzats-eng approved these changes Jul 23, 2020

View reviewed changes

Tommy Wei added 2 commits July 23, 2020 14:12

address comments

c4d4942

address comments

863e4a8

change tokenize a bit

05e22cb

1025KB reviewed Jul 24, 2020

View reviewed changes

tfx/examples/bert/cola/bert_cola_pipeline.py Show resolved Hide resolved

tfx/examples/bert/cola/bert_cola_pipeline.py Show resolved Hide resolved

tommywei added 5 commits July 24, 2020 13:58

fix lint

8c59650

add tests

3f1cf29

add comments

e1fd68b

fix lint

69818bb

fix test

7e2735d

1025KB approved these changes Jul 27, 2020

View reviewed changes

fix more lint

8ab86f6

zoyahav approved these changes Jul 28, 2020

View reviewed changes

tfx/examples/bert/cola/bert_cola_utils.py Outdated Show resolved Hide resolved

thuang513 approved these changes Jul 30, 2020

View reviewed changes

tfx/examples/bert/cola/bert_cola_utils.py Outdated Show resolved Hide resolved

final lint commit

be2189c

davidzats-eng approved these changes Jul 30, 2020

View reviewed changes

copybara-service bot pushed a commit that referenced this pull request Aug 6, 2020

PR #2036: BERT example pipelines

3105325

PiperOrigin-RevId: 324077454

copybara-service bot mentioned this pull request Aug 6, 2020

PR #2036: BERT example pipelines #2290

Merged

copybara-service bot pushed a commit that referenced this pull request Aug 6, 2020

PR #2036: BERT example pipelines

ac60129

PiperOrigin-RevId: 325288648

github-actions bot added the stale label Aug 30, 2020

github-actions bot closed this Sep 4, 2020

Conversation

tommywei110 commented Jun 23, 2020

Bert example pipeline on Cola dataset

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tommywei110 commented Jul 24, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants