[BUG] The result of SUMBT model #56

zyds · 2020-07-17T01:51:17Z

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.

My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

The text was updated successfully, but these errors were encountered:

zyds · 2020-07-17T09:46:23Z

def reformat_state(state):
    if 'belief_state' in state:
        state = state['belief_state']
    new_state = []
    for domain in state.keys():
        domain_data = state[domain]
        if 'semi' in domain_data:
            domain_data = domain_data['semi']
            for slot in domain_data.keys():
                val = domain_data[slot]
                if val is not None and val not in ['', 'not mentioned', '未提及', '未提到', '没有提到']:
                    new_state.append(domain + '-' + slot + '-' + val)
    # lower
    new_state = [item.lower() for item in new_state]
    return new_state

This code is in dst/evaluate.py, I want to know about dialog state, can it be calculated using only the semi part?

zqwerty · 2020-07-17T11:28:03Z

@function2-llx please look at this

zqwerty · 2020-07-18T02:29:15Z

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.

My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

How do you get the above result? by running dst/evaluate.py ?

zyds · 2020-07-18T02:38:44Z

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.
My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

How do you get the above result? by running dst/evaluate.py ?

Yes, but the results I report are not using the pre training model provided by the project. However, using the pre training model provided by the project, I also got similar results.

zqwerty · 2020-08-06T10:45:03Z

update SUMBT & test results #69

zyds added the bug Something isn't working label Jul 17, 2020

zqwerty assigned zz-jacob Jul 18, 2020

zqwerty closed this as completed Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The result of SUMBT model #56

[BUG] The result of SUMBT model #56

zyds commented Jul 17, 2020

zyds commented Jul 17, 2020

zqwerty commented Jul 17, 2020

zqwerty commented Jul 18, 2020

zyds commented Jul 18, 2020

zqwerty commented Aug 6, 2020

[BUG] The result of SUMBT model #56

[BUG] The result of SUMBT model #56

Comments

zyds commented Jul 17, 2020

zyds commented Jul 17, 2020

zqwerty commented Jul 17, 2020

zqwerty commented Jul 18, 2020

zyds commented Jul 18, 2020

zqwerty commented Aug 6, 2020