Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difference between implementation and LCF-paper #10

Closed
fhamborg opened this issue Aug 6, 2020 · 3 comments
Closed

difference between implementation and LCF-paper #10

fhamborg opened this issue Aug 6, 2020 · 3 comments

Comments

@fhamborg
Copy link

fhamborg commented Aug 6, 2020

There is a difference between LCF-BERT as described in the paper and the implementation here. Specifically, the paper mentions three self attentions, see for example Figure 3 in https://www.mdpi.com/2076-3417/9/16/3389/htm, one for the output of local BERT, one for the output of global BERT, and one lastly during the feature interactive learning layer.

However, in the implementation here, there is only one self attention, which is only applied to the BERT local output. So, self attention on global BERT and self attention on features interactive learning layer are missing currently in the code. Could you let me know why that is?

@fhamborg
Copy link
Author

fhamborg commented Aug 6, 2020

Not sure if relevant, but the implementation at https://github.com/songyouwei/ABSA-PyTorch/blob/master/models/lcf_bert.py is different to the implementation here and different to the description in the paper. It uses also only one self attention (compared to the three mentioned in the paper) but after applying linear_double (https://github.com/songyouwei/ABSA-PyTorch/blob/b925abcdf43c51e475a64ed04984d4911b88676d/models/lcf_bert.py#L116) whereas in this repository the self attention is applied before linear_double (https://github.com/yangheng95/LC-ABSA/blob/c945a94e0f86116c5578245aa9ad36c46c7b9c4a/models/lc_apc/lcf_bert.py#L56)

@yangheng95
Copy link
Owner

Thank you for your concern. I modified the model when I cleaned and refactored the code because I found that the model could achieve better results in some cases without adding the last MHSA, as well as reduce some computation. Since the paper has been published, we have merely access to revise some minor problems in the aforementioned paper.

@fhamborg
Copy link
Author

fhamborg commented Aug 7, 2020

Alright, thanks for letting me know. Since you mentioned the current implementation seems to be better than the one described in the paper, I'll also stick with the current implementation. FYI: Note the other issue opened in ABSA-PyTorch repository, where there is also a difference between their implementation and your implementation here.

@fhamborg fhamborg closed this as completed Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants