Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new notebook: load text with tf.data #449

Merged
merged 30 commits into from Apr 9, 2019

Conversation

Projects
None yet
5 participants
@adammichaelwood
Copy link
Collaborator

commented Apr 3, 2019

No description provided.

adammichaelwood added some commits Mar 21, 2019

@review-notebook-app

This comment has been minimized.

Copy link

commented Apr 3, 2019

Check out this pull request on ReviewNB: https://app.reviewnb.com/tensorflow/docs/pull/449

Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows.

@googlebot googlebot added the cla: yes label Apr 3, 2019

Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb Outdated
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb Outdated
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb Outdated
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb Outdated
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb
Show resolved Hide resolved site/en/r2/tutorials/load_data/text.ipynb Outdated

@adammichaelwood adammichaelwood requested a review from lamberta Apr 4, 2019

review edits done

@lamberta
Copy link
Member

left a comment

Nice, job. Thanks!

@lamberta lamberta added ready to pull and removed ready to pull labels Apr 8, 2019

@lamberta lamberta requested a review from yashk2810 Apr 8, 2019

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

Maybe add a easy and advanced section. Easy section would show how to use the inbuilt tokenizer in tfds and the advanced section would show building tokenizers from scratch?

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

Maybe use the gpu version?


pip install tensorflow-gpu==2.0.0-alpha0

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

Maybe use tf.keras.utils.get_file?

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

I think this is much cleaner than creating functions within functions


```

def labeler(example, index):

 return example, tf.cast(index, tf.int64)  


labeled_data_sets = []

for i, file_name in enumerate(FILE_NAMES):

 lines_dataset = tf.data.TextLineDataset(file_name)

 labeled_dataset = lines_dataset.map(lambda ex: labeler(ex, i))

 labeled_data_sets.append(labeled_dataset)

```

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

Let's do this


```

all_labeled_data = labeled_data_sets[0].concatenate(labeled_data_sets[1])


for index, val in enumerate(labeled_data_sets[2:]):

 all_labeled_data = all_labeled_data.concatenate(val)

```


so that if you have more than 3 files, you won't have to manually concatenate it.


Also, shuffling here is a bad idea since you do .take() below. I would recommend doing the shuffle after .take() and .skip()

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

Maybe show how to use SubwordsTokenizer too, since that's easier to build and it is used more than a normal tokenizer.

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

print (encoded_example), so that it prints in one line.

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

use string activations ('relu') instead of tf.keras.backend.relu

Reply via ReviewNB

@@ -0,0 +1,708 @@
{

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 8, 2019

Member

Remove the first sentence (This model running on this data produces decent results (about 83%) after just two epochs and levels off quickly. A third epoch lowers accuracy by a tiny amount.)


Put this sentence in Next Steps (If you edit the model, try running several epochs to see when improvement levels off and overfitting sets in.)

Reply via ReviewNB

"cell_type": "code",
"source": [
"def encode(text_tensor, label):\n",
" encoded_text = encoder.encode(text_tensor.numpy())\n",

This comment has been minimized.

Copy link
@yashk2810

yashk2810 Apr 9, 2019

Member

use tokenizer instead of encoder

@TensorFlow-Docs-Copybara TensorFlow-Docs-Copybara merged commit 6eec78a into tensorflow:master Apr 9, 2019

2 checks passed

cla/google All necessary CLAs are signed
import/copybara Change imported to the internal review system
Details

TensorFlow-Docs-Copybara pushed a commit that referenced this pull request Apr 9, 2019

Copybara-Service
Merge pull request #449 from adammichaelwood:text-load
PiperOrigin-RevId: 242663703
@adammichaelwood

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 9, 2019

@yashk2810 I didn't see your comments here before copybara completed the merge cycle. I'll address these in a separate PR.

@yashk2810

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

@adammichaelwood can you send a CL instead with the changes? Its easier for me to review there.

@adammichaelwood

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 9, 2019

no problem

@adammichaelwood adammichaelwood deleted the adammichaelwood:text-load branch Apr 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.