-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about argument detection evaluation #3
Comments
For second case, in our original experiments, we used tuple, i.e. (event type, role), to do evaluation, and classify N/A class only and if only event type is N/A or role is N/A. This code is fully rewritten because the origin code is quite dirty and I apologized that I missed this point in this rewritten code. Now I fixed this point , please check commit history to find out the modification, I also tested the modified codes and the results remained same as the paper. Just a reminder, once you use tuple to do classification, since the evaluation is more strict, you need to do some more process in dataset to reach 53.5 in DMCNN, such as cutting sentences if its length is more than a threshold, please do such process by yourself. Also, from the experiments results, once DMCNN reaches 53.5 using these two evaluation metrics separately, then other models will perform similarly using these two evaluation metrics separately, which means it is just a trick in data preprocess, and this is the one of main disadvantages of ACE2005. For first case, when a trigger is classified with N/A, it is not necessarily to do EAE stage with this trigger because all entities are N/A in such situation and are totally noises when testing EAE. Also, actually all models are tested in this way, if you want to see what will happen if you take the first case into consideration, you can simply comment line 167,168 in |
Thank you for your response. I'm okay with the second case and thank you for modification. For the first case, if we mislabel a trigger as None type, the trigger and its arguments are not included in EAE stage for performance calculation. I got your point but it doesn't looks like a right way. Thank you again for modification. |
Another question is what event detection model is used when you report HMEAE(CNN) and HMEAE(BERT) in Table 3/4? Is DMCNN used as ED model? The experimental setting is confusing for me since you used two event detection models:
Or HMEAE(CNN) used DMCNN as ED model and HMEAE(BERT) used model from Wang et al. (2019) as ED model? Thank you. |
Second one, i.e. DMCNN as ED + HMEAE(CNN) as EAE, DMBERT as ED + HMEAE(BERT) as EAE. |
Got it and Thanks! |
Thank you very much for releasing source code about this paper.
However, I notice you used func/f_score to calculate argument detection performance, which basically consider if predicted roles and gold roles match. The event types is ignored in evaluation. I think there is something wrong considering the criteria is as follows:
There are some cases you probably miss:
Correct me if I'm wrong and thank you again.
The text was updated successfully, but these errors were encountered: