Rewrite entity ranking evaluation to sample sp/po pairs and add loss #2

samuelbroscheit · 2019-04-02T16:35:08Z

https://github.com/rufex2001/kge/blob/9d83e43f5085e4a0d30d70536e4c1772389907cd/kge/job/entity_ranking.py#L72

rgemulla · 2019-04-02T19:31:59Z

Clarification: for probabilistic models, add computation of cross entropy loss to evaluation

rgemulla · 2019-04-04T16:54:14Z

On 2nd thought, not sure how useful this is (you'd win by always predicting 1). Samuel, please clarify or close.

samuelbroscheit · 2019-04-04T16:57:59Z

How would you win for labels [0,0,1] with prediction [1,1,1] vs prediction [0,0,1] with BCE?

rgemulla · 2019-04-04T17:06:57Z

I was referring to the cross entropy of the test triples (all label 1) and their predicted confidence. Anyway, it's not clear to me what "loss" should be added for the evaluation.

(Recall: the evaluation is currently triple based, not sp/po based.).

samuelbroscheit · 2019-04-04T17:12:36Z

I think usually you want the same loss that is used during training.

rgemulla · 2019-04-04T17:27:20Z

Yes, but during evaluation, we batch triples not so/po pairs. If this is needed, we need a different evaluation job (and another one for negative sampling, I guess).

samuelbroscheit · 2019-04-04T17:39:33Z

Or just get the collate func from the Trainer and join it with the eval collate func? Shouldn't be that difficult.

def get_collate_func(trainer_collate_func):
    def my_collate_func(batch):
        my_result = doing stuff
        trainer_result =  trainer_collate_func(batch)
        return my_result, trainer_result
    return my_collate_func

I always implent validation loss automatically, but I am not going to fight for it if you are not convinced.

rgemulla · 2019-04-05T08:06:30Z

It's not so easy because (i) the evaluation code has two distinguish two types of things and (ii) we are scoring the same thing multiple times.

The best approach may be: rewrite the evaluation to not use triples but the sp/po approach (i.e., the 1-to-N collate as is). Then compute the MRR/HITS implementation on top of this. In addition to enable the computation of the loss, it should also be faster than our current approach since it computes less scores (e.g., if spo_1 and spo_2 occur in the validation data, we currently compute sp? twice).

rufex2001 · 2019-04-05T13:25:58Z

To me this is similar to the including test data during model selection: people do it but perhaps they shouldn't. While there is nothing wrong with selecting models based on the standard metrics, perhaps selecting based on loss is better. I'd personally like to see the difference, so if is not too much change, we could keep the option. Then, selecting what is the default behavior is the only required decision.

rgemulla · 2019-04-05T13:49:45Z

No it's different: it's perfectly fine to use the loss on validation data. The problem is that we cannot easily compute it right now without changing the way we implemented validation. I was suggesting to change how our implementation to (1) support loss computation and (2) make it faster.

rgemulla · 2019-07-09T21:00:06Z

The validation is currently by far the slowest part in training (when multiple jobs are run in parallel so that the GPU is saturated, most of the time is spent in validation). Addressing this issue should also help here. I'll add a priority tag because of this.

Add se loss

rgemulla added the enhancement New feature or request label Apr 2, 2019

rgemulla changed the title ~~What about the loss?~~ Rewrite entity ranking evaluation to sample sp/po pairs and add loss Jul 4, 2019

rgemulla added the high priority label Jul 9, 2019

rgemulla removed the high priority label Sep 30, 2019

rgemulla added the low priority label Oct 21, 2019

rgemulla added a commit that referenced this issue May 14, 2020

Merge pull request #2 from Nzteb/mse

455a5e9

Add se loss

samuelbroscheit mentioned this issue May 16, 2020

Add loss/cost on validation data https://github.com/uma-pi1/kge/issues/2 #99

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite entity ranking evaluation to sample sp/po pairs and add loss #2

Rewrite entity ranking evaluation to sample sp/po pairs and add loss #2

samuelbroscheit commented Apr 2, 2019

rgemulla commented Apr 2, 2019

rgemulla commented Apr 4, 2019

samuelbroscheit commented Apr 4, 2019

rgemulla commented Apr 4, 2019

samuelbroscheit commented Apr 4, 2019

rgemulla commented Apr 4, 2019

samuelbroscheit commented Apr 4, 2019

rgemulla commented Apr 5, 2019

rufex2001 commented Apr 5, 2019

rgemulla commented Apr 5, 2019

rgemulla commented Jul 9, 2019

Rewrite entity ranking evaluation to sample sp/po pairs and add loss #2

Rewrite entity ranking evaluation to sample sp/po pairs and add loss #2

Comments

samuelbroscheit commented Apr 2, 2019

rgemulla commented Apr 2, 2019

rgemulla commented Apr 4, 2019

samuelbroscheit commented Apr 4, 2019

rgemulla commented Apr 4, 2019

samuelbroscheit commented Apr 4, 2019

rgemulla commented Apr 4, 2019

samuelbroscheit commented Apr 4, 2019

rgemulla commented Apr 5, 2019

rufex2001 commented Apr 5, 2019

rgemulla commented Apr 5, 2019

rgemulla commented Jul 9, 2019