Data Directory used when running test_phrase_grammar.py #24

YianZhang · 2020-04-19T05:39:32Z

Hi Yikang and other Contributors,

Thank you for making public the source code! I am trying to reproduce your results, but I am not sure what path to use as the command line argument of test_phrase_grammar --data. I downloaded PTB data and I am currently using treebank_3/parsed/mrg as the data argument. It does not work.

The listings under treebank_3/parsed/mrg:
atis brown readme.mrg swbd wsj

The listings under treebank_3/parsed/mrg/wsj:

00 06 12 18 24
01 07 13 19 MERGE.LOG
02 08 14 20
03 09 15 21
04 10 16 22
05 11 17 23

Thank you for your time!
Ian

yikangshen · 2020-04-20T13:54:28Z

Hi Ian,
You need to copy the wsj folder to ~/nltk_data/corpora/ptb/WSJ.

YianZhang · 2020-04-20T21:04:42Z

Hi Yikang,

Thanks for the response! I figured that out. However, what is args.data in test_phrase_grammar used for?

Thanks,
Ian

yikangshen · 2020-04-21T17:12:11Z

It points to the dictionary that the model actually uses.

YianZhang · 2020-04-28T03:11:46Z

It points to the dictionary that the model actually uses.

Thanks for the response! Do you mean "directory" or "dictionary"?

Best,
Ian

yikangshen · 2020-04-28T03:12:23Z

Dictionary

yikangshen · 2020-04-28T03:13:04Z

While testing parsing F1, the model still needs to load dictionary from training corpus

YianZhang · 2020-04-28T03:23:12Z

Thanks for your prompt response!

After carefully checking your code, I believe the dictionary is loaded from a fixed path:

Ordered-Neurons/test_phrase_grammar.py

Lines 279 to 282 in 46d63cd

    
           fn = 'corpus.{}.data'.format(hashlib.md5('data/penn'.encode()).hexdigest()) 
        
           print('Loading cached dataset...') 
        
           corpus = torch.load(fn) 
        
           dictionary = corpus.dictionary

And args.data is used as the directory of the test data:

Ordered-Neurons/test_phrase_grammar.py

Line 293 in 46d63cd

corpus = data_ptb.Corpus(args.data)

Am I correct?

Thanks for your help again! It would be appreciated if you can also check the other issue of mine: #25. As far as I know, this problem also confuses other researchers.

Best,
Ian

shawntan · 2020-04-28T07:06:17Z

The code assumes you have the cached dataset in the directory, and it would be cached if the training script was run prior to test_phrase_grammar.py.

But yes, you are correct.

YianZhang · 2020-04-29T00:21:18Z

Thanks a lot!

yikangshen closed this as completed Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Directory used when running test_phrase_grammar.py #24

Data Directory used when running test_phrase_grammar.py #24

YianZhang commented Apr 19, 2020

yikangshen commented Apr 20, 2020

YianZhang commented Apr 20, 2020

yikangshen commented Apr 21, 2020

YianZhang commented Apr 28, 2020

yikangshen commented Apr 28, 2020

yikangshen commented Apr 28, 2020

YianZhang commented Apr 28, 2020 •

edited

shawntan commented Apr 28, 2020

YianZhang commented Apr 29, 2020

Data Directory used when running test_phrase_grammar.py #24

Data Directory used when running test_phrase_grammar.py #24

Comments

YianZhang commented Apr 19, 2020

yikangshen commented Apr 20, 2020

YianZhang commented Apr 20, 2020

yikangshen commented Apr 21, 2020

YianZhang commented Apr 28, 2020

yikangshen commented Apr 28, 2020

yikangshen commented Apr 28, 2020

YianZhang commented Apr 28, 2020 • edited

shawntan commented Apr 28, 2020

YianZhang commented Apr 29, 2020

YianZhang commented Apr 28, 2020 •

edited