You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is to document a buy brought to me by Victoria Lin from Salesforce Research. She noted the following:
The problem is caused by the design of the dictionary keys. For both directions, the relation part of the key is the same. This causes some false positives to be mixed into the ground truth sets.
Consider a relation of the construct:
(A, father of, B)*
(B, father of, C)*
The statement d_egraph[(e2, rel)].add(e1) added A as a correct answer for (B, father of, ?). As a result, A does not trigger a rank penalty in evaluation while it should. A model that predicts an entity ranking list [A, C, ...] receives a measure of rank 1 (while the correct measure should be 2).
* example altered for clarity.
In other words, for the test triples:
(Mike, father of, John)
(John, father of, Tom)
We would have at test time for the masks of existing triples (as computed in wrangle_KG.py):
Fixing the issue was not simple since ConvE is, unlike other link predictors, directional due to 1-N scoring. If we want to score (E, rel, e2) in ConvE, where E are all entities, then we can only do this by computing (e2, rel, E). One can simply ignore the issue of an directional model and provide different masks for correctness, but this decreases the scoring for ConvE, since it would predict for (e1, rel, E) and (e2, rel, E) the same values although the labels are different.
The solution that I opted for was to introduce a "reverse relation" to indicate the direction of evaluation. If ConvE is evaluated from right to left, that is, (E, rel, e2) then we would compute the ConvE score with (e2, rel_reverse, E); for evaluations from left to right, the scoring remains the same (e1, rel, E).
Currently, I do not have the compute resources to compute an grid search for new values, but I found the following differences in scores. Here + means an indirect in score (good for Hits and MRR) and - means a decrease in score (good for MR).
There seems to be something wrong with the FB15k scores. And I have to investigate what that exactly is. I am currently still computing YAGO3-10 scores.
I will update the paper once I have all the scores.
The text was updated successfully, but these errors were encountered:
This is to document a buy brought to me by Victoria Lin from Salesforce Research. She noted the following:
In other words, for the test triples:
We would have at test time for the masks of existing triples (as computed in
wrangle_KG.py
):while the correct masks should be:
Fixing the issue was not simple since ConvE is, unlike other link predictors, directional due to 1-N scoring. If we want to score
(E, rel, e2)
in ConvE, whereE
are all entities, then we can only do this by computing(e2, rel, E)
. One can simply ignore the issue of an directional model and provide different masks for correctness, but this decreases the scoring for ConvE, since it would predict for (e1, rel, E) and (e2, rel, E) the same values although the labels are different.The solution that I opted for was to introduce a "reverse relation" to indicate the direction of evaluation. If ConvE is evaluated from right to left, that is,
(E, rel, e2)
then we would compute the ConvE score with(e2, rel_reverse, E)
; for evaluations from left to right, the scoring remains the same(e1, rel, E)
.This bugfix was implemented in d830ddf.
New Results
Currently, I do not have the compute resources to compute an grid search for new values, but I found the following differences in scores. Here + means an indirect in score (good for Hits and MRR) and - means a decrease in score (good for MR).
Better
Almost No change
Worse
There seems to be something wrong with the FB15k scores. And I have to investigate what that exactly is. I am currently still computing YAGO3-10 scores.
I will update the paper once I have all the scores.
The text was updated successfully, but these errors were encountered: