Running with Custom Dataset #3

aflah02 · 2022-08-16T18:12:24Z

Hey!
Great Paper!!
Can you share some instructions for formatting a custom dataset as well?

hzhwcmhf · 2022-08-17T01:52:13Z

You can download the preprocessed yelp dataset and format your custom dataset following the instruction in Readmes.

Please feel free to ask if you have any further questions.

aflah02 · 2022-08-18T08:57:02Z

@hzhwcmhf Thanks for the instructions!

aflah02 · 2022-08-18T17:50:28Z

@hzhwcmhf Can the test files be run without multiple human references as well? I see the paper mentions Luo et al. (2019) for the Yelp dataset as they provided multiple references but for GYAFC there is no such mention. I don't have multiple human references hence would like to know if the code already auto handles single references or would I need to make the changes manually?

hzhwcmhf · 2022-08-19T05:42:26Z

Hi, @aflah02

First, we use multiple human references as well for GYAFC. You can find the references here. Multiple references are recommended in evaluating style transfer models since they can cover more possible transferred phrases, leading to reliable results.

Second, it should be ok if you test files only contain one reference per sample. For example, the test file can be

ever since joes has changed hands it 's just gotten worse and worse .
ever since joes has changed hands it 's gotten better and better .

there is definitely not enough room in that part of the venue .
there is so much room in that part of the venue

......
(NOTE: THE BLANK LINE IS REQUIRED)

(If it does not work, please tell me. I will figure out the problem.)

Moreover, you can change the format of input file here

NAST/styletransformer/main.py

Lines 35 to 42 in ef765d4

    
           data_arg.fields = { 
        
           			"train_0": OrderedDict([("sent", "SentenceDefault")]), 
        
           			"train_1": OrderedDict([("sent", "SentenceDefault")]), 
        
           			"dev_0": OrderedDict([("sent", "SentenceDefault")]), 
        
           			"dev_1": OrderedDict([("sent", "SentenceDefault")]), 
        
           			"test_0": OrderedDict([("sent", "SentenceDefault"), ("ref", "SessionDefault")]), 
        
           			"test_1": OrderedDict([("sent", "SentenceDefault"), ("ref", "SessionDefault")]), 
        
           		}

where SentenceDefault indicates a line, and SessionDefault indicates mutliple lines with an empty line as ending.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running with Custom Dataset #3

Running with Custom Dataset #3

aflah02 commented Aug 16, 2022

hzhwcmhf commented Aug 17, 2022

aflah02 commented Aug 18, 2022

aflah02 commented Aug 18, 2022

hzhwcmhf commented Aug 19, 2022

Running with Custom Dataset #3

Running with Custom Dataset #3

Comments

aflah02 commented Aug 16, 2022

hzhwcmhf commented Aug 17, 2022

aflah02 commented Aug 18, 2022

aflah02 commented Aug 18, 2022

hzhwcmhf commented Aug 19, 2022