Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in validation_next_word_loop when running main.py #6

Closed
yxinli92 opened this issue May 23, 2020 · 4 comments
Closed

KeyError in validation_next_word_loop when running main.py #6

yxinli92 opened this issue May 23, 2020 · 4 comments

Comments

@yxinli92
Copy link

Hi Vladimir! Hope you are doing well.

I was running your main.py script. There is the following error saying KeyError. Am I missing something? Thanks a lot!

Traceback (most recent call last):
File "main.py", line 572, in
main(cfg)
File "main.py", line 281, in main
cfg.use_categories
File "/home/tuf72841/MDVC/epoch_loop/run_epoch.py", line 336, in validation_next_word_loop
for i, batch in enumerate(tqdm(loader, desc=f'{time} {phase} ({epoch})')):
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/tqdm/std.py", line 1127, in iter
for obj in iterable:
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in next
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/tuf72841/MDVC/dataset/dataset.py", line 443, in getitem
caption_data = next(self.caption_loader_iter)
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in iter
yield Batch(minibatch, self.dataset, self.device)
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process
tensor = self.numericalize(padded, device=device)
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
KeyError: 'stairclimber'

@v-iashin
Copy link
Owner

Hi Xinli,

Thanks for reporting.

I installed the env on another machine with 1080Ti and I couldn't reproduce the problem after training it for 6 epochs.

I also found that conda saves the spacy model in the environment under pip package but fails to install it and all other packages which are expected to be installed after (torchtext in our case). I fixed it in 7873bea.

Anyway, let's see why do you have such a problem. It seems that it is related to text-processing parts. Please share:

  1. When do you get this error? How many epochs have you run it for?
  2. Which version of torchtext, spacy are you using?

@v-iashin
Copy link
Owner

v-iashin commented Jun 6, 2020

Assuming the problem was local. Please reopen if you think otherwise and provide more details.

@v-iashin v-iashin closed this as completed Jun 6, 2020
@VP-0822
Copy link

VP-0822 commented Jun 18, 2020

Hi @yxinli92,
cc: @v-iashin
I tried to train the model on my own and also stumble across this problem. I noticed that in the latest version of PyTorch/text module there is an issue with the unknown token being used. Please refer PyTorch/Text Unknown token for more details. In short, if you specify unknown token explicitly at least I don't reproduce this issue,

self.ASR_SUBTITLES_FIELD = data.ReversibleField( tokenize='spacy', init_token=self.start_token, eos_token=self.end_token, pad_token=self.pad_token, lower=True, batch_first=True, unk_token='<unk>')

If you already solved it, then please ignore it. I just wanted to point the root cause for other if they stumble across this problem.

@v-iashin
Copy link
Owner

@VP-0822 This is a valuable comment. Thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants