Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The result of SUMBT model #56

Closed
zyds opened this issue Jul 17, 2020 · 5 comments
Closed

[BUG] The result of SUMBT model #56

zyds opened this issue Jul 17, 2020 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@zyds
Copy link

zyds commented Jul 17, 2020

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.

My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

@zyds zyds added the bug Something isn't working label Jul 17, 2020
@zyds
Copy link
Author

zyds commented Jul 17, 2020

def reformat_state(state):
    if 'belief_state' in state:
        state = state['belief_state']
    new_state = []
    for domain in state.keys():
        domain_data = state[domain]
        if 'semi' in domain_data:
            domain_data = domain_data['semi']
            for slot in domain_data.keys():
                val = domain_data[slot]
                if val is not None and val not in ['', 'not mentioned', '未提及', '未提到', '没有提到']:
                    new_state.append(domain + '-' + slot + '-' + val)
    # lower
    new_state = [item.lower() for item in new_state]
    return new_state

This code is in dst/evaluate.py, I want to know about dialog state, can it be calculated using only the semi part?

@zqwerty
Copy link
Member

zqwerty commented Jul 17, 2020

@function2-llx please look at this

@zqwerty
Copy link
Member

zqwerty commented Jul 18, 2020

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.

My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

How do you get the above result? by running dst/evaluate.py ?

@zyds
Copy link
Author

zyds commented Jul 18, 2020

Hi! When I tried to evaluate the translation training SUMBT model, I found that the eval mode was not set, which had a certain impact on the results. According to the results of my local test, I found that there is a difference of two points on the MutliWOZ-zh human-val dataset. I think it may be necessary to re-evaluate the SUMBT model after modifying the code. The current results
are not real model performance.
My Local Result on MultiWOZ-zh
not set eval mode
{'Joint Acc': 0.4821722435545804, 'Turn Acc': 0.9738983360760534, 'Joint F1': 0.8826705748001639}
set eval mode
{'Joint Acc': 0.49972572682391664, 'Turn Acc': 0.9751935149631128, 'Joint F1': 0.8885012208542876}

How do you get the above result? by running dst/evaluate.py ?

Yes, but the results I report are not using the pre training model provided by the project. However, using the pre training model provided by the project, I also got similar results.

@zqwerty
Copy link
Member

zqwerty commented Aug 6, 2020

update SUMBT & test results #69

@zqwerty zqwerty closed this as completed Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants