# LSTM on Recipe Data

**The notebook has been adapted from the notebook provided in David Foster's Generative Deep Learning, 2nd Edition.**

- Book: [Amazon](https://www.amazon.com/Generative-Deep-Learning-Teaching-Machines/dp/1098134184/ref=sr_1_1?keywords=generative+deep+learning%2C+2nd+edition&qid=1684708209&sprefix=generative+de%2Caps%2C93&sr=8-1)
- Original notebook (tensorflow and keras): [Github](https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition/blob/main/notebooks/05_autoregressive/01_lstm/lstm.ipynb)
- Dataset: [Kaggle](https://www.kaggle.com/datasets/hugodarwood/epirecipes)

In [1]:
import numpy as np
import json
import re
import string

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

## 0. Train parameters

In [2]:
DATA_DIR = '../../data/epirecipes/full_format_recipes.json'

VOCAB_SIZE = 10_000
MAX_LEN = 200
EMBEDDING_DIM = 100
N_UNITS = 128
VALIDATION_SPLIT = 0.2
SEED = 1024
LOAD_MODEL = False
BATCH_SIZE = 32
EPOCHS = 25

## 1. Load dataset

In [6]:
def pad_punctuation(sentence):
    sentence = re.sub(f'([{string.punctuation}])', r' \1 ', sentence)
    sentence = re.sub(' +', ' ', sentence)
    return sentence

In [7]:
# Load dataset
with open(DATA_DIR, 'r+') as f:
    recipe_data = json.load(f)

In [8]:
# preprocess dataset
filtered_data = [
    'Recipe for ' + x['title'] + ' | ' + ' '.join(x['directions'])
    for x in recipe_data
    if 'title' in x and x['title']
    and 'directions' in x and x['directions']
]

text_ds = [pad_punctuation(sentence) for sentence in filtered_data]

print(f'Total recipe loaded: {len(text_ds)}')

Total recipe loaded: 20098


In [14]:
print('Sample data:')
print(np.random.choice(text_ds))

Sample data:
Recipe for Salmon Paillards with Lettuce and Pea Salad | Stir together 2 tablespoons oil and 1 / 2 teaspoon zest in a bowl and set aside . Combine lemon juice , garlic , 1 / 8 teaspoon salt , 1 / 8 teaspoon pepper , and remaining 1 / 4 teaspoon zest in a small bowl , then add remaining 2 tablespoons oil in a slow stream , whisking until combined . Whisk in cream and chives . Blanch sugar snaps in a 3 - to 4 - quart saucepan of boiling salted water , uncovered , until crisp - tender , about 2 minutes . Transfer sugar snaps with a slotted spoon to a bowl of ice and cold water to stop cooking . Keep cooking water at a boil . Once sugar snaps are cool , pat dry between paper towels . Cook baby peas in boiling water 5 minutes , then drain in a sieve and set sieve in ice water to stop cooking . Drain peas , then pat dry between paper towels . Cut sugar snaps diagonally into 1 / 4 - inch - thick slices and transfer to a bowl , then toss with peas and lettuce . Preheat broiler . A

## 2. Build vocabularies

In [12]:
# The iterator that yields tokenized data
def yield_tokens(data_iter, tokenizer):
    for sample in data:
        yield tokenizer(sample)

# Building vocabulary
def build_vocab(dataset, tokenizer):
    pass

In [30]:
# vocab.set_default_index('<pad>')
vocab.append_token('<pad>')

In [47]:
vocab.set_default_index(vocab['<pad>'])
vocab.set_default_index(vocab['<unk>'])

In [48]:
list(vocab.get_stoi().keys())

['<pad>',
 'ﬁrst',
 '‿-inch-thick',
 '‟-cup',
 '“toast”',
 '“lid',
 '“beefiness”',
 '—just',
 '—for',
 '—before',
 '¿13-inch',
 '¼-inch',
 '\xadrefrigerator',
 '\xad',
 '¬medium--low',
 '}',
 'zuni',
 'zucchini-parmesan',
 'zucchini-lamb',
 'zucchini-herb',
 'zones-one',
 'zipper-top',
 'zip-loc',
 'zest-and-juice',
 'zeppole',
 'zen-like',
 'zeke',
 'yum-yum',
 'yucca',
 'you…',
 'yolk/cream',
 'yolk-peach',
 'yogurts',
 'yogurt-peach',
 'yogurt-like',
 'yogurt-dill',
 'ynez',
 'yemenite',
 'yellow-potato',
 'yankee',
 'yangzhou',
 'yaki',
 'xiñómavro',
 'x9',
 'x2-inch',
 'wth',
 'wreaths',
 'worthy',
 'worktable–and',
 'workable',
 'woodwork',
 'wood-smoked',
 'wonderfully',
 'withthe',
 'withstrawberries',
 'witha',
 'wish—a',
 'wisconsin',
 'winkeler',
 'winglass',
 'wine–marinated',
 'wineries',
 'winemaking',
 'wine-unfriendly',
 'wine-saffron',
 'wine-peppercorn',
 'wine-macerated',
 'wine-glasses',
 'wine-glass',
 'wine-baked',
 'wiled',
 'wildflower-honey',
 'wildflower',
 'w