# LSTM on Recipe Data

**The notebook has been adapted from the notebook provided in David Foster's Generative Deep Learning, 2nd Edition.**

- Book: [Amazon](https://www.amazon.com/Generative-Deep-Learning-Teaching-Machines/dp/1098134184/ref=sr_1_1?keywords=generative+deep+learning%2C+2nd+edition&qid=1684708209&sprefix=generative+de%2Caps%2C93&sr=8-1)
- Original notebook (tensorflow and keras): [Github](https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/blob/main/notebooks/05_autoregressive/01_lstm/lstm.ipynb)
- Dataset: [Kaggle](https://www.kaggle.com/datasets/hugodarwood/epirecipes)

In [8]:
import re
import string
import json

import numpy as np
import jax
import jax.numpy as jnp
from tensorflow.data import Dataset
from tensorflow.keras.layers import TextVectorization
from tensorflow.keras import utils

import flax.linen as nn

## 0. Train parameters

In [11]:
DATA_DIR = '../../data/epirecipes/full_format_recipes.json'

EMBEDDING_DIM = 100
HIDDEN_DIM = 128
VALIDATION_SPLIT = 0.2
SEED = 1024
BATCH_SIZE = 32
EPOCHS = 30
VOCAB_SIZE = 10000

MAX_PAD_LEN = 200
MAX_VAL_TOKENS = 100 # Max number of tokens when generating texts

## 1. Load dataset

In [3]:
def pad_punctuation(sentence):
    sentence = re.sub(f'([{string.punctuation}])', r' \1 ', sentence)
    sentence = re.sub(' +', ' ', sentence)
    return sentence

In [4]:
# load dataset
with open(DATA_DIR, 'r+') as f:
    recipe_data = json.load(f)

In [19]:
# preprocess dataset
filtered_data = [
    'Recipe for ' + x['title'] + ' | ' + ' '.join(x['directions'])
    for x in recipe_data
    if 'title' in x and x['title']
    and 'directions' in x and x['directions']
]

text_ds = [pad_punctuation(sentence) for sentence in filtered_data]
print(f'Total recipe loaded: {len(text_ds)}')

Total recipe loaded: 20098


In [20]:
print('Sample data:')
sample_data = np.random.choice(text_ds)
print(sample_data)

Sample data:
Recipe for Clay - Pot Miso Chicken | Preheat oven to 500°F with rack in middle . Pat chicken dry , then roast , skin side up , in 1 layer in a 17 - by 12 - inch shallow baking pan until skin is golden brown , 35 to 40 minutes . While chicken roasts , soak wood ear mushrooms in 4 cups water until softened , about 15 minutes . Drain in a sieve , then rinse well and discard any hard pieces . Drain well , squeezing out excess water . Transfer roasted chicken to a bowl and pour pan juices through a fine - mesh sieve into a 1 - quart glass measure . Let stand until fat rises to top , 1 to 2 minutes , then skim off and discard fat . Add enough stock to bring total to 4 cups liquid . Reduce oven to 300°F and move rack to lower third . Peel burdock root , and , if more than 1 - inch - thick , halve lengthwise . Cut crosswise into 1 - inch pieces . Transfer burdock root to a bowl , then add vinegar and 2 cups water . Heat oil in a 7 - to 8 - quart heavy pot over medium - high heat u

## 2. Build vocabularies

In [21]:
# conver texts list to tf dataset
text_ds_tf = Dataset.from_tensor_slices(text_ds)

vectorize_layer = TextVectorization(
    standardize='lower',
    max_tokens=VOCAB_SIZE,
    output_mode='int',
    output_sequence_length=MAX_PAD_LEN+1
)

In [25]:
vectorize_layer.adapt(text_ds_tf)
vocab = vectorize_layer.get_vocabulary()

# First 10 items in the vocabulary
for i, word in enumerate(vocab[:10]):
    print(f'{i}: {word}')

0: 
1: [UNK]
2: .
3: ,
4: and
5: to
6: in
7: the
8: with
9: a


In [29]:
sample_data_tokenized = vectorize_layer(sample_data)
print('Source text:')
print(sample_data)
print('\n')
print('Mapped sample:')
print(sample_data_tokenized.numpy())

Source text:
Recipe for Clay - Pot Miso Chicken | Preheat oven to 500°F with rack in middle . Pat chicken dry , then roast , skin side up , in 1 layer in a 17 - by 12 - inch shallow baking pan until skin is golden brown , 35 to 40 minutes . While chicken roasts , soak wood ear mushrooms in 4 cups water until softened , about 15 minutes . Drain in a sieve , then rinse well and discard any hard pieces . Drain well , squeezing out excess water . Transfer roasted chicken to a bowl and pour pan juices through a fine - mesh sieve into a 1 - quart glass measure . Let stand until fat rises to top , 1 to 2 minutes , then skim off and discard fat . Add enough stock to bring total to 4 cups liquid . Reduce oven to 300°F and move rack to lower third . Peel burdock root , and , if more than 1 - inch - thick , halve lengthwise . Cut crosswise into 1 - inch pieces . Transfer burdock root to a bowl , then add vinegar and 2 cups water . Heat oil in a 7 - to 8 - quart heavy pot over medium - high heat u