Conversation
tfx/examples/bert_cola/bert_utils.py
Outdated
| def _tokenize(feature): | ||
| """Tokenize the two sentences and insert appropriate tokens""" | ||
| asset_dir = os.path.join(os.environ['HOME'], 'bert_cola/assets') | ||
| vocab_dir = os.path.join(asset_dir, 'vocab.txt') |
There was a problem hiding this comment.
Isn't the vocab a function of the BERT model we pick? If so, then how does it get populated dynamically?
There was a problem hiding this comment.
kept for reference
| def tokenize_single_sentence( | ||
| self, | ||
| sequence, | ||
| max_len=128, |
There was a problem hiding this comment.
The max_length may be unnecessarily large or too small for Cola. Lets figure out what is appropriate.
There was a problem hiding this comment.
oh and also, this is capped by Bert at 512
| input_mask_layer, | ||
| input_type_ids_layer]) | ||
|
|
||
| hidden = pooled_output |
There was a problem hiding this comment.
Should we be using the sequence output corresponding to the first token (CLS) or pooled_output?
There was a problem hiding this comment.
seems like pooled_output
There was a problem hiding this comment.
yes, lets write a comment saying that the pooled input for this model is a dense layer on top of the CLS token so that others know about the rationale.
| sequence_b, | ||
| sentence_len, | ||
| False, | ||
| True |
There was a problem hiding this comment.
I'm not 100% certain, but I think [CLS] sentence_A [SEP] sentence_B [SEP] is correct.
| ) | ||
|
|
||
| def build_and_compile_bert_classifier( | ||
| bert_layer, |
There was a problem hiding this comment.
how about we pass the link to the hub module instead? feels more consistent
There was a problem hiding this comment.
so the thought here is if users where to provide their own pretrained BERT layer, they can still use it here.
Yes. good suggestion! |
PiperOrigin-RevId: 324077454
PiperOrigin-RevId: 325288648
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days |
Bert example pipeline on Cola dataset