Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed unseen s/p/o 's #205

Closed
MatthewGleeson opened this issue May 11, 2021 · 8 comments
Closed

Embed unseen s/p/o 's #205

MatthewGleeson opened this issue May 11, 2021 · 8 comments

Comments

@MatthewGleeson
Copy link

Thanks for publishing this great repo!

I'm trying to use the pretrained KGE models in this repo to create embeddings for unseen objects and predicates but I'm having trouble figuring out how to do so.

The "Use a pretrained model in an application" portion of the README has been helpful, but I want to be able to pass a trio of s/p/o strings such as 'dirt' 'component of', 'clay' to a pretrained ComplEx model instead of passing an index to a value in the Wordnet database.

Is there a way to do this? Would I need to do some sort of transfer learning on the embedderr first? I've got a dataset I can use if that is the case.

@rufex2001
Copy link
Member

rufex2001 commented May 11, 2021 via email

@MatthewGleeson
Copy link
Author

MatthewGleeson commented May 12, 2021

Yep I'm hoping to adapt some of the pretrained KGE models that are listed in the README for a knowledge graph-informed RL application(specifically, https://github.com/minqi/wordcraft). I've already checked and many of the objects I need embeddings for do not exist in the lookup table for some of the datasets(which I'm assuming would be required to use the entity_ids.del file you're talking about). So in that case, I'd need to either train the KGE models from scratch or make use of transfer learning to calculate the embeddings somehow, right? I can't pass these unseen strings to the entity/relationship embedders?

@rufex2001
Copy link
Member

rufex2001 commented May 12, 2021 via email

@AdrianKs
Copy link
Collaborator

you could create a new dataset extending the one the model is currently trained on. Then train on the new dataset and load the embeddings that are already trained with our load_pretrained option.
Additionnally you could try and also freeze the pretrained embedding, but this option is not yet in the master branch but a in PR #136

@rgemulla
Copy link
Member

You may either retrain on your dataset or use a KGE model that constructs entity/relations embeddings from textual representations. (We may add such an implementation to LibKGE soon.)

@esulaiman
Copy link

you could create a new dataset extending the one the model is currently trained on. Then train on the new dataset and load the embeddings that are already trained with our load_pretrained option.
Additionnally you could try and also freeze the pretrained embedding, but this option is not yet in the master branch but a in PR #136

I am training my own dataset using the model provided by this library to obtain KGE. Can you kindly help with how to use load_pretrained option.

@MatthewGleeson
Copy link
Author

@esulaiman it would be better to open a separate issue for this problem. But take a look at #174, and the code in the README section titled "use your own dataset"

@MatthewGleeson
Copy link
Author

MatthewGleeson commented Jun 6, 2021

I'm closing this issue but I wanted to leave a note of what I did:
-created a compatible dataset from the WordCraft environment matching data/toy/*.txt ( entities, relations, test, train, valid)
-trained many kge models from this repo on this dataset
-Used the kge repo's model.score_spo function in my WordCraft agent to inform its decisions in the mdp
-This was for a group project for my NLP class, anyone interested can take a look at our demo notebook:
https://colab.research.google.com/drive/1bL3U19pmd9l-1nG_lmdGx-AjVKwpmd3N?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants