Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textsum:AssertionError: Empty filelist. #370

Closed
loretoparisi opened this issue Aug 30, 2016 · 5 comments
Closed

textsum:AssertionError: Empty filelist. #370

loretoparisi opened this issue Aug 30, 2016 · 5 comments
Assignees
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener

Comments

@loretoparisi
Copy link

Please let us know which model this issue is about (specify the top-level directory)

textsum

I get this error while doing the training

Exception in thread Thread-152:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Volumes/MacHDD2/Developmemt/ParisiLabs/ML/models/textsum/batch_reader.py", line 135, in _FillInputQueue
    (article, abstract) = input_gen.next()
  File "/Volumes/MacHDD2/Developmemt/ParisiLabs/ML/models/textsum/batch_reader.py", line 244, in _TextGenerator
    e = example_gen.next()
  File "/Volumes/MacHDD2/Developmemt/ParisiLabs/ML/models/textsum/data.py", line 90, in ExampleGen
    assert filelist, 'Empty filelist.'
AssertionError: Empty filelist.

I have my data and vocab and the WORKSPACE in the same dir:

(tensorflow) admin@macbookproloreto:~/Developmemt/ParisiLabs/ML/models/data$ ls -l
total 320
-rw-r--r--  1 admin  staff   33582 30 Ago 17:45 data
-rw-r--r--  1 admin  staff  124934 30 Ago 17:45 vocab

The command launched was

$ bazel-bin/textsum/seq2seq_attention --mode=train --article_key=article --abstract_key=abstract --data_path=data/training-* --vocab_path=data/vocab --log_root=textsum/log_root --train_dir=textsum/log_root/train

@poxvoculi poxvoculi added the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Aug 30, 2016
@loretoparisi
Copy link
Author

loretoparisi commented Aug 31, 2016

I have tried the same on

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.4 LTS
Release:    14.04
Codename:   trusty

My dir structure is

ubuntu@ip-10-169-182-86:~/tensorflow/lyrics$ ls -R
.:
bazel-bin  bazel-genfiles  bazel-lyrics  bazel-out  bazel-testlogs  data  textsum  WORKSPACE

./data:
data  vocab

./textsum:
batch_reader.py   beam_search.py   BUILD  data.py   README.md                    seq2seq_attention_decode.pyc  seq2seq_attention_model.pyc  seq2seq_lib.py
batch_reader.pyc  beam_search.pyc  data   data.pyc  seq2seq_attention_decode.py  seq2seq_attention_model.py    seq2seq_attention.py         seq2seq_lib.pyc

./textsum/data:
data  vocab

I do not have any training- folder in the --data_path path data/training-*.

test-0  training-0  training-1  validation-0 ...(omitted)

So I assume that was the source of this issue.

Command was

$ bazel-bin/textsum/seq2seq_attention   --mode=train   --article_key=article   --abstract_key=abstract   --data_path=data/training-*   --vocab_path=data/vocab   --log_root=textsum/log_root   --train_dir=textsum/log_root/train

@jamcar23
Copy link

@loretoparisi you need to give it the correct path to your data. You're telling it to look for data/training-* but you don't have anything that matches that. It looks like you need to use --data_path=data/data

@peterjliu
Copy link
Contributor

Yes thanks @jamcar23, the issue is the toy data provided is data/data. We don't provide the full training data in the repo.

@loretoparisi
Copy link
Author

@peterjliu ok thanks, where we get the training data then?

@monajalal
Copy link

@panyx0718 @peterjliu considering the fact that this data is not free and apparently there's no other data in this format, would it make sense to use the command below in order to train?

jalal@klein:~/computer_vision/tensorflow/models$ bazel-bin/textsum/seq2seq_attention --mode=train --article_key=article --abstract_key=abstract --data_path=textsum/data/data --vocab_path=textsum/data/vocab --log_root=textsum/log_root --train_dir=textsum/log_root/train

What are other tricks to get data in the data folder?

http://stackoverflow.com/questions/39182142/tensorflow-text-summarization-setup-what-is-a-workspace-file/39184573#39184573

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener
Projects
None yet
Development

No branches or pull requests

5 participants