Skip to content

Commit 00b8b02

Browse files
committedJan 10, 2021
[ENHANCEMENT] Fixes issue #19 partly
1 parent 43ef8bd commit 00b8b02

File tree

3 files changed

+59
-16
lines changed

3 files changed

+59
-16
lines changed
 

‎Ch6/02_BERT_ATIS.ipynb

+1-4
Original file line numberDiff line numberDiff line change
@@ -345,8 +345,6 @@
345345
}
346346
],
347347
"source": [
348-
"# from utils import fetch_data, read_method\n",
349-
"\n",
350348
"sents,labels,intents = fetch_data('atis.test.w-intent.iob')\n",
351349
"\n",
352350
"test_sentences = [\" \".join(i) for i in sents]\n",
@@ -1080,7 +1078,6 @@
10801078
"optimizer = BertAdam(optimizer_grouped_parameters, lr=3e-5)\n",
10811079
"\n",
10821080
"\n",
1083-
"\n",
10841081
"# Function to calculate the accuracy of our predictions vs labels\n",
10851082
"def flat_accuracy(preds, labels):\n",
10861083
" pred_flat = np.argmax(preds, axis=1).flatten()\n",
@@ -1179,7 +1176,7 @@
11791176
"name": "python",
11801177
"nbconvert_exporter": "python",
11811178
"pygments_lexer": "ipython3",
1182-
"version": "3.6.10"
1179+
"version": "3.6.12"
11831180
}
11841181
},
11851182
"nbformat": 4,

‎Ch6/03_BERT_ATIS_Binary.ipynb

+57-11
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@
1111
"In this notebook we build a binarry classifier for the ATIS Dataset using [BERT](https://arxiv.org/abs/1810.04805), a pre-Trained NLP model open soucred by google in late 2018 that can be used for [Transfer Learning](https://towardsdatascience.com/transfer-learning-in-nlp-fecc59f546e4) on text data. This notebook has been adapted from this [Article](https://towardsdatascience.com/bert-for-dummies-step-by-step-tutorial-fb90890ffe03). The link for the dataset can be found [here](https://www.kaggle.com/siddhadev/ms-cntk-atis/data#).<br> This notebook requires a GPU to get setup. We suggest you to run this on your local machine only if you have a GPU setup or else you can use google colab."
1212
]
1313
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"## Imports"
19+
]
20+
},
1421
{
1522
"cell_type": "code",
1623
"execution_count": 0,
@@ -115,8 +122,8 @@
115122
}
116123
],
117124
"source": [
118-
"#importing a few necessary packages and setting the DATA directory\n",
119125
"\n",
126+
"#if not using colab, comment below line\n",
120127
"%tensorflow_version 1.x\n",
121128
"\n",
122129
"from torch.nn import Adam\n",
@@ -150,6 +157,13 @@
150157
"torch.cuda.get_device_name(0)"
151158
]
152159
},
160+
{
161+
"cell_type": "markdown",
162+
"metadata": {},
163+
"source": [
164+
"## Data Loading"
165+
]
166+
},
153167
{
154168
"cell_type": "code",
155169
"execution_count": 0,
@@ -345,6 +359,13 @@
345359
"query_data_test, intent_data_test, intent_data_label_test, slot_data_test = load_atis('atis.test.pkl')\n"
346360
]
347361
},
362+
{
363+
"cell_type": "markdown",
364+
"metadata": {},
365+
"source": [
366+
"Let's look at a few training queries."
367+
]
368+
},
348369
{
349370
"cell_type": "code",
350371
"execution_count": 0,
@@ -381,6 +402,14 @@
381402
"query_data_train"
382403
]
383404
},
405+
{
406+
"cell_type": "markdown",
407+
"metadata": {},
408+
"source": [
409+
"## Data Pre-processing\n",
410+
"We need to convert the sentences to tensors."
411+
]
412+
},
384413
{
385414
"cell_type": "code",
386415
"execution_count": 0,
@@ -431,15 +460,11 @@
431460
]
432461
},
433462
{
434-
"cell_type": "code",
435-
"execution_count": 0,
436-
"metadata": {
437-
"colab": {},
438-
"colab_type": "code",
439-
"id": "S9SMEwslo-ve"
440-
},
441-
"outputs": [],
442-
"source": []
463+
"cell_type": "markdown",
464+
"metadata": {},
465+
"source": [
466+
"BERT expects data to be in a specific format, i.e, [CLS] token1,token2,....[SEP]"
467+
]
443468
},
444469
{
445470
"cell_type": "code",
@@ -508,6 +533,13 @@
508533
"input_ids = pad_sequences(input_ids, maxlen=MAX_LEN, dtype=\"long\", truncating=\"post\", padding=\"post\")"
509534
]
510535
},
536+
{
537+
"cell_type": "markdown",
538+
"metadata": {},
539+
"source": [
540+
"Creating the BERT attention masks"
541+
]
542+
},
511543
{
512544
"cell_type": "code",
513545
"execution_count": 0,
@@ -579,6 +611,13 @@
579611
"validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size)\n"
580612
]
581613
},
614+
{
615+
"cell_type": "markdown",
616+
"metadata": {},
617+
"source": [
618+
"## Training"
619+
]
620+
},
582621
{
583622
"cell_type": "code",
584623
"execution_count": 0,
@@ -913,6 +952,13 @@
913952
"model.cuda()"
914953
]
915954
},
955+
{
956+
"cell_type": "markdown",
957+
"metadata": {},
958+
"source": [
959+
"## Fine-Tuning BERT"
960+
]
961+
},
916962
{
917963
"cell_type": "code",
918964
"execution_count": 0,
@@ -1149,7 +1195,7 @@
11491195
"name": "python",
11501196
"nbconvert_exporter": "python",
11511197
"pygments_lexer": "ipython3",
1152-
"version": "3.6.10"
1198+
"version": "3.6.12"
11531199
}
11541200
},
11551201
"nbformat": 4,

‎Ch6/04_CRF_SNIPS_slots.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -2681,7 +2681,7 @@
26812681
"name": "python",
26822682
"nbconvert_exporter": "python",
26832683
"pygments_lexer": "ipython3",
2684-
"version": "3.8.3"
2684+
"version": "3.6.12"
26852685
},
26862686
"toc": {
26872687
"base_numbering": 1,

0 commit comments

Comments
 (0)
Failed to load comments.