Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Guidance Needed on Training Models on Own Datasets #2

Closed
AlexMRuch opened this issue Mar 8, 2022 · 3 comments
Closed

More Guidance Needed on Training Models on Own Datasets #2

AlexMRuch opened this issue Mar 8, 2022 · 3 comments
Labels
status: completed topic: documentation Question about documentation

Comments

@AlexMRuch
Copy link

AlexMRuch commented Mar 8, 2022

Would it be possible for you to provide more details on how to train various models on users' own datasets?

It's not clear what has to go into the config files (e.g., what specifically should be mentioned for env, interpreter, program, or args (or why program appears twice in the config).

One thing that may be helpful is in the docs, you share results of the library on various datasets (https://zjukg.github.io/NeuralKG/result.html). If you could provide the command you used to run each of pipelines, that would be great.

Also, it's not clear from the docs how one must treat the data loaders differently compared to tabular data, as the examples in the docs refer to image datasets: https://zjukg.github.io/NeuralKG/neuralkg.data.html#neuralkg.data.base_data_module.BaseDataModule.train_dataloader. What must the structure of datasets be for various models? What can be done to datasets to better prepare them for different models (e.g., encoding entities/relations, etc.)

@chenxn2020
Copy link
Collaborator

Thanks very much for your many useful suggestions and I respond to your suggestions as follows.

  1. would it be possible to provide more details to guide users in training different models on their datasets, as well as details of the config file parameters, and the command line to run each pipeline?
    Actually, we are in the process of improving our documentation to include the parameters explanation of the config file, and we will soon be posting a blog on our website to guide users in training models on their datasets, and recording animated gifs to visualize them. For each pipeline, we will also shortly be providing both scripts and config files to reproduce the results of models
  2. Questions about the documentation of the BaseDataModule.train_dataloader example for processing image data.
    This part is PyTorchLightning's description, thank you for pointing it out, we will change it as soon as possible to avoid any misunderstanding. The dataset is given as triples (h, r, t). If indexed dictionaries of entities and relations are provided, the index will be read automatically, otherwise, it will be created automatically, the dataset structure can see FB15K237, and the dataset does not need to be processed manually, just select the appropriate pipeline and NeuraKG will preprocess the dataset accordingly.

@AlexMRuch
Copy link
Author

Sounds great! Thank you so much for taking the time to share these details. Looking forward to the updates!

@AnselCmy AnselCmy added topic: documentation Question about documentation status: in progresss and removed documentation labels Mar 18, 2022
@chenxn2020
Copy link
Collaborator

We have updated what was to be done in the last response.

  • We have released the script files and configuration files used to reproduce the model results
  • We have written notebooks on colab to guide users through the use of our tools and posted blogs on our website to present detailed examples of using neuralkg. We also show animated gifs in README to show the training and testing process.
  • In addition, we have updated the documentation with some basic parameter descriptions.

For more updates please see our news, this issue will be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: completed topic: documentation Question about documentation
Projects
None yet
Development

No branches or pull requests

4 participants