Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility #3

Closed
martinsvat opened this issue Sep 10, 2021 · 10 comments
Closed

Reproducibility #3

martinsvat opened this issue Sep 10, 2021 · 10 comments

Comments

@martinsvat
Copy link

martinsvat commented Sep 10, 2021

Hi guys, good work, but I struggle a bit with reproducing your results. It's nothing serious, but it would be better to have clone-and-use approach. So far I encounter these little obstacles:

  • hardcoded paths, e.g. USER_HOME+"/workspace/StAR/data/"
  • It's a handy thing to have the exact commands to reproduce WN18RR in the readme, but I guess there are some incorrect paths, e.g. to run.py in RotatE/ folder with ./result/WN18RR_roberta-large/ (that should be ../StAR/results/WN18RR_roberta-large, right?); also guessing there is a typo in --output_dir ./result/FB15k-237_roberta-largel
  • When I followed your readme's commands exactly, I ended up with FileNotFoundError: [Errno 2] No such file or directory: './result/UMLS_model_roberta-large/similarity_score_mtx.npy'. The only place where */similarity_score_mtx.npy is saved is in get_ensembled_data.py which is not invocated anywhere in the project (although commented in get_ensembled_data.py).

Please, can you provide a fix for the similarity_score_mtx.npy missing file? I could simply remove the commented line but there is no mention of how to use get_ensembled_data.py.

best
Martin

@wangbo9719
Copy link
Owner

Thanks for reporting these bugs. Sorry for the late response.

What you said is correct, sorry for the bad running experience. I have fixed them in this version.
Now, the similarity_score_mtx.npy can be obtained by removing the comment of the get_similarity() function in get_ensembled_data.py.

Thanks again for your reporting.

@martinsvat
Copy link
Author

martinsvat commented Sep 17, 2021

Thanks for the fix! However, I'm still having issues getting the same results as those in your paper. Namely, when I want to reproduce WN18RR, I do (as stated in the readme)

  • get_new_dev_dict.py (twice with different parameters)
  • run_link_prediction.py (4.1)
  • learning RotatE using their best hyperparameter setup: bash run.sh train RotatE wn18rr 0 0 512 1024 500 6.0 0.5 0.00005 80000 8 -de
  • run_get_ensemble_data.py
  • ./codes/run.py
  • ./ensemble/run.py once with tail and once with head mode
    In the end, I end up with
    head Hits @1: 0.20357370772176134
    head Hits @3: 0.4572431397574984
    head Hits @10: 0.6726228462029356
    headMean rank: 57.47383535417996
    headMean reciprocal rank: 0.3644130875627938
    and
    ---------tail, test, lr=0.001, ep=3.0, nt=5, margin=0.6, bs=32 feature=mix metric ----------
    tail Hits @1: 0.2906828334396937
    tail Hits @3: 0.5370134014039566
    tail Hits @10: 0.7565411614550096
    tailMean rank: 54.801850670070195
    tailMean reciprocal rank: 0.44718844813012876

Now, taking average of these, e.g. hits1 is not the same as in the paper: see, (tail-hits1 + head-hits1) / 2 = 0.24712827058072752, meanwhile there is 0.459 in the paper (table 4).

Please, can you provide more information (e.g. hyperparameter setup for RotatE's model) to get the exact results as you have in the paper? Or, did I use the commands incorrectly somewhere? (For example, not executing the last one, ensemble/run.py, twice with different modes, but the first time with train and the second time with --init?) I'd like to use StAR model but I need to have correct results for the start.

best
Martin

@wangbo9719
Copy link
Owner

Your running commands seems correct. And I just use the official hyperparameter of RotatE to train the model on WN18RR.

The data about the trained model reported in paper was lost. I will reproduce the results recently and then tell you the results.

By the way, how about your obtained results of StAR and RotatE on WN18RR?

@martinsvat
Copy link
Author

By the way, how about your obtained results of StAR and RotatE on WN18RR?

Final lines from train.log
Valid MRR at step 79999: 0.478470
Valid MR at step 79999: 3284.908372
Valid HITS@1 at step 79999: 0.432597
Valid HITS@3 at step 79999: 0.493243
Valid HITS@10 at step 79999: 0.571523
Evaluating on Test Dataset...
...
Test MRR at step 79999: 0.476083
Test MR at step 79999: 3369.924059
Test HITS@1 at step 79999: 0.428207
Test HITS@3 at step 79999: 0.494416
Test HITS@10 at step 79999: 0.571315

@wangbo9719
Copy link
Owner

And the results of StAR?

@martinsvat
Copy link
Author

martinsvat commented Sep 18, 2021

Sorry, and thanks for help. Here is content of WN18RR_roberta-large/link_prediction_metrics.txt
Hits left @1: 0.20261646458200383
Hits right @1: 0.2782386726228462
###Hits @1: 0.240427568602425
Hits left @3: 0.45213784301212506
Hits right @3: 0.5188257817485641
###Hits @3: 0.4854818123803446
Hits left @10: 0.6668793873643906
Hits right @10: 0.7479259731971921
###Hits @10: 0.7074026802807913
Mean rank left: 57.20835992342055
Mean rank right: 53.99298021697511
###Mean rank: 55.60067007019783
Mean reciprocal rank left: 0.3616734820860267
Mean reciprocal rank right: 0.4341342479524534
###Mean reciprocal rank: 0.39790386501924

Which seems quite similar to what's in table 4. RotatE's results are also quite similar to what's in table 4.

@wangbo9719
Copy link
Owner

Got it. I will try to find out the reason and tell you later.

@martinsvat
Copy link
Author

Hi, any success reproducing the results?

Meanwhile, I have another question regarding the ensembling model. It is learned twice for tail and head prediction tasks, right? So, if one has a little bit different prediction task to predict the value of a triple (e1, r2, e3), he has to average outputs for both queries (e1, r2, e3) to head-learned and tail-learned model, right?

thx

@wangbo9719
Copy link
Owner

Sorry for the very late response.

There were some bugs in the codes and commands before. Thanks for reporting.
I have updated this repo. To reproduce the ensemble results, please follow the new version and rerun the last command in 5.1:

CUDA_VISIBLE_DEVICES=3 python ./codes/run.py \
 	--cuda --init ./models/RotatE_wn18rr_0 \
 	--test_batch_size 16 \
 	--star_info_path /home/wangbo/workspace/StAR_KGC-master/StAR/result/WN18RR_roberta-large \
 	--get_scores --get_model_dataset 

By the way, the performance of ensemble model may not be stable enough. For the command in 5.2, you can just use ‘add’ for –feature_method and do_prediction only to get a suboptimal result which is corresponding to the StAR (Ensemble) in Table 4 of the paper.

For your second question, I think what you said is a way to solve the triple classification task. Or you can modify the code to adapt to the task. You can refer to the code of KG-BERT who implements triple classification.

@wangbo9719
Copy link
Owner

Sorry. I fixed a small bug just now. If you have followed the last version, the generated files are saved in the wrong paths and names. You can move the file to the correct directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants