New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducibility #3
Comments
Thanks for reporting these bugs. Sorry for the late response. What you said is correct, sorry for the bad running experience. I have fixed them in this version. Thanks again for your reporting. |
Thanks for the fix! However, I'm still having issues getting the same results as those in your paper. Namely, when I want to reproduce WN18RR, I do (as stated in the readme)
Now, taking average of these, e.g. hits1 is not the same as in the paper: see, (tail-hits1 + head-hits1) / 2 = 0.24712827058072752, meanwhile there is 0.459 in the paper (table 4). Please, can you provide more information (e.g. hyperparameter setup for RotatE's model) to get the exact results as you have in the paper? Or, did I use the commands incorrectly somewhere? (For example, not executing the last one, ensemble/run.py, twice with different modes, but the first time with train and the second time with --init?) I'd like to use StAR model but I need to have correct results for the start. best |
Your running commands seems correct. And I just use the official hyperparameter of RotatE to train the model on WN18RR. The data about the trained model reported in paper was lost. I will reproduce the results recently and then tell you the results. By the way, how about your obtained results of StAR and RotatE on WN18RR? |
Final lines from train.log |
And the results of StAR? |
Sorry, and thanks for help. Here is content of WN18RR_roberta-large/link_prediction_metrics.txt Which seems quite similar to what's in table 4. RotatE's results are also quite similar to what's in table 4. |
Got it. I will try to find out the reason and tell you later. |
Hi, any success reproducing the results? Meanwhile, I have another question regarding the ensembling model. It is learned twice for tail and head prediction tasks, right? So, if one has a little bit different prediction task to predict the value of a triple (e1, r2, e3), he has to average outputs for both queries (e1, r2, e3) to head-learned and tail-learned model, right? thx |
Sorry for the very late response. There were some bugs in the codes and commands before. Thanks for reporting.
By the way, the performance of ensemble model may not be stable enough. For the command in 5.2, you can just use ‘add’ for –feature_method and do_prediction only to get a suboptimal result which is corresponding to the StAR (Ensemble) in Table 4 of the paper. For your second question, I think what you said is a way to solve the triple classification task. Or you can modify the code to adapt to the task. You can refer to the code of KG-BERT who implements triple classification. |
Sorry. I fixed a small bug just now. If you have followed the last version, the generated files are saved in the wrong paths and names. You can move the file to the correct directory. |
Hi guys, good work, but I struggle a bit with reproducing your results. It's nothing serious, but it would be better to have clone-and-use approach. So far I encounter these little obstacles:
Please, can you provide a fix for the similarity_score_mtx.npy missing file? I could simply remove the commented line but there is no mention of how to use get_ensembled_data.py.
best
Martin
The text was updated successfully, but these errors were encountered: