Skip to content

BERT example pipelines#2036

Closed
tommywei110 wants to merge 49 commits intotensorflow:masterfrom
tommywei110:bert
Closed

BERT example pipelines#2036
tommywei110 wants to merge 49 commits intotensorflow:masterfrom
tommywei110:bert

Conversation

@tommywei110
Copy link
Copy Markdown
Contributor

Bert example pipeline on Cola dataset

@davidzats-eng davidzats-eng self-requested a review June 26, 2020 03:15
def _tokenize(feature):
"""Tokenize the two sentences and insert appropriate tokens"""
asset_dir = os.path.join(os.environ['HOME'], 'bert_cola/assets')
vocab_dir = os.path.join(asset_dir, 'vocab.txt')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the vocab a function of the BERT model we pick? If so, then how does it get populated dynamically?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept for reference

def tokenize_single_sentence(
self,
sequence,
max_len=128,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max_length may be unnecessarily large or too small for Cola. Lets figure out what is appropriate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh and also, this is capped by Bert at 512

input_mask_layer,
input_type_ids_layer])

hidden = pooled_output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the sequence output corresponding to the first token (CLS) or pooled_output?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like pooled_output

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, lets write a comment saying that the pooled input for this model is a dense layer on top of the CLS token so that others know about the rationale.

@tommywei110 tommywei110 changed the title BERT Cola example pipeline BERT example pipelines Jun 30, 2020
sequence_b,
sentence_len,
False,
True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% certain, but I think [CLS] sentence_A [SEP] sentence_B [SEP] is correct.

)

def build_and_compile_bert_classifier(
bert_layer,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we pass the link to the hub module instead? feels more consistent

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the thought here is if users where to provide their own pretrained BERT layer, they can still use it here.

@tommywei110
Copy link
Copy Markdown
Contributor Author

can you do this?

self._pad_id = lines.index(_PAD)
self._cls_id = lines.index(_PAD)
self._pad_id = lines.index(_PAD)

Yes. good suggestion!

copybara-service bot pushed a commit that referenced this pull request Aug 6, 2020
PiperOrigin-RevId: 324077454
copybara-service bot pushed a commit that referenced this pull request Aug 6, 2020
PiperOrigin-RevId: 325288648
@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the stale label Aug 30, 2020
@github-actions github-actions bot closed this Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants