Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite entity ranking evaluation to sample sp/po pairs and add loss #2

Open
samuelbroscheit opened this issue Apr 2, 2019 · 11 comments
Labels
enhancement New feature or request low priority

Comments

@samuelbroscheit
Copy link
Member

https://github.com/rufex2001/kge/blob/9d83e43f5085e4a0d30d70536e4c1772389907cd/kge/job/entity_ranking.py#L72

@rgemulla
Copy link
Member

rgemulla commented Apr 2, 2019

Clarification: for probabilistic models, add computation of cross entropy loss to evaluation

@rgemulla rgemulla added the enhancement New feature or request label Apr 2, 2019
@rgemulla
Copy link
Member

rgemulla commented Apr 4, 2019

On 2nd thought, not sure how useful this is (you'd win by always predicting 1). Samuel, please clarify or close.

@samuelbroscheit
Copy link
Member Author

How would you win for labels [0,0,1] with prediction [1,1,1] vs prediction [0,0,1] with BCE?

@rgemulla
Copy link
Member

rgemulla commented Apr 4, 2019

I was referring to the cross entropy of the test triples (all label 1) and their predicted confidence. Anyway, it's not clear to me what "loss" should be added for the evaluation.

(Recall: the evaluation is currently triple based, not sp/po based.).

@samuelbroscheit
Copy link
Member Author

I think usually you want the same loss that is used during training.

@rgemulla
Copy link
Member

rgemulla commented Apr 4, 2019

Yes, but during evaluation, we batch triples not so/po pairs. If this is needed, we need a different evaluation job (and another one for negative sampling, I guess).

@samuelbroscheit
Copy link
Member Author

Or just get the collate func from the Trainer and join it with the eval collate func? Shouldn't be that difficult.

def get_collate_func(trainer_collate_func):
    def my_collate_func(batch):
        my_result = doing stuff
        trainer_result =  trainer_collate_func(batch)
        return my_result, trainer_result
    return my_collate_func

I always implent validation loss automatically, but I am not going to fight for it if you are not convinced.

@rgemulla
Copy link
Member

rgemulla commented Apr 5, 2019

It's not so easy because (i) the evaluation code has two distinguish two types of things and (ii) we are scoring the same thing multiple times.

The best approach may be: rewrite the evaluation to not use triples but the sp/po approach (i.e., the 1-to-N collate as is). Then compute the MRR/HITS implementation on top of this. In addition to enable the computation of the loss, it should also be faster than our current approach since it computes less scores (e.g., if spo_1 and spo_2 occur in the validation data, we currently compute sp? twice).

@rufex2001
Copy link
Member

To me this is similar to the including test data during model selection: people do it but perhaps they shouldn't. While there is nothing wrong with selecting models based on the standard metrics, perhaps selecting based on loss is better. I'd personally like to see the difference, so if is not too much change, we could keep the option. Then, selecting what is the default behavior is the only required decision.

@rgemulla
Copy link
Member

rgemulla commented Apr 5, 2019

No it's different: it's perfectly fine to use the loss on validation data. The problem is that we cannot easily compute it right now without changing the way we implemented validation. I was suggesting to change how our implementation to (1) support loss computation and (2) make it faster.

@rgemulla rgemulla changed the title What about the loss? Rewrite entity ranking evaluation to sample sp/po pairs and add loss Jul 4, 2019
@rgemulla
Copy link
Member

rgemulla commented Jul 9, 2019

The validation is currently by far the slowest part in training (when multiple jobs are run in parallel so that the GPU is saturated, most of the time is spent in validation). Addressing this issue should also help here. I'll add a priority tag because of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low priority
Projects
None yet
Development

No branches or pull requests

3 participants