More Guidance Needed on Training Models on Own Datasets #2

AlexMRuch · 2022-03-08T22:19:27Z

Would it be possible for you to provide more details on how to train various models on users' own datasets?

It's not clear what has to go into the config files (e.g., what specifically should be mentioned for env, interpreter, program, or args (or why program appears twice in the config).

One thing that may be helpful is in the docs, you share results of the library on various datasets (https://zjukg.github.io/NeuralKG/result.html). If you could provide the command you used to run each of pipelines, that would be great.

Also, it's not clear from the docs how one must treat the data loaders differently compared to tabular data, as the examples in the docs refer to image datasets: https://zjukg.github.io/NeuralKG/neuralkg.data.html#neuralkg.data.base_data_module.BaseDataModule.train_dataloader. What must the structure of datasets be for various models? What can be done to datasets to better prepare them for different models (e.g., encoding entities/relations, etc.)

The text was updated successfully, but these errors were encountered:

chenxn2020 · 2022-03-09T16:27:20Z

Thanks very much for your many useful suggestions and I respond to your suggestions as follows.

would it be possible to provide more details to guide users in training different models on their datasets, as well as details of the config file parameters, and the command line to run each pipeline?
Actually, we are in the process of improving our documentation to include the parameters explanation of the config file, and we will soon be posting a blog on our website to guide users in training models on their datasets, and recording animated gifs to visualize them. For each pipeline, we will also shortly be providing both scripts and config files to reproduce the results of models
Questions about the documentation of the BaseDataModule.train_dataloader example for processing image data.
This part is PyTorchLightning's description, thank you for pointing it out, we will change it as soon as possible to avoid any misunderstanding. The dataset is given as triples (h, r, t). If indexed dictionaries of entities and relations are provided, the index will be read automatically, otherwise, it will be created automatically, the dataset structure can see FB15K237, and the dataset does not need to be processed manually, just select the appropriate pipeline and NeuraKG will preprocess the dataset accordingly.

AlexMRuch · 2022-03-10T21:02:44Z

Sounds great! Thank you so much for taking the time to share these details. Looking forward to the updates!

chenxn2020 · 2022-03-18T09:52:33Z

We have updated what was to be done in the last response.

We have released the script files and configuration files used to reproduce the model results
We have written notebooks on colab to guide users through the use of our tools and posted blogs on our website to present detailed examples of using neuralkg. We also show animated gifs in README to show the training and testing process.
In addition, we have updated the documentation with some basic parameter descriptions.

For more updates please see our news, this issue will be closed.

wencolani added the documentation label Mar 10, 2022

AnselCmy added topic: documentation Question about documentation status: in progresss and removed documentation labels Mar 18, 2022

chenxn2020 closed this as completed Mar 18, 2022

chenxn2020 added status: completed and removed status: in progresss labels Mar 18, 2022

RuizhouLiu mentioned this issue Dec 10, 2023

Multi-process for dataloader when set num_workers larger than 0 will raise an error by RGCN model #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Guidance Needed on Training Models on Own Datasets #2

More Guidance Needed on Training Models on Own Datasets #2

AlexMRuch commented Mar 8, 2022 •

edited

chenxn2020 commented Mar 9, 2022

AlexMRuch commented Mar 10, 2022

chenxn2020 commented Mar 18, 2022

More Guidance Needed on Training Models on Own Datasets #2

More Guidance Needed on Training Models on Own Datasets #2

Comments

AlexMRuch commented Mar 8, 2022 • edited

chenxn2020 commented Mar 9, 2022

AlexMRuch commented Mar 10, 2022

chenxn2020 commented Mar 18, 2022

AlexMRuch commented Mar 8, 2022 •

edited