Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the number of interactions and the substructure dimension #2

Closed
SakuraRiven opened this issue Jul 31, 2020 · 4 comments
Closed

Comments

@SakuraRiven
Copy link

Hi, thanks for your kind reply. I have two questions:

  1. The number of interactions in your paper is 74528 while the data in the code is 37264. I guess the "74528" actually contains the "drugA-drugB" and the "drugB-drugA"?Can we understand that the "drugA-drugB" and the "drugB-drugA" are actually the same event with same label?

  2. The substructure dimension when I run the code is 583 instead of the 881 in your paper. Is something wrong?

@YifanDengWHU
Copy link
Owner

Hi.
For question 1:
Yes, it's right. In fact, drugA-drugB is the same as drugB-drugA in this project, we just delete half of interactions to reduce the replication. However, if you try to use Knowledge Graph, the order will make a difference. It means (drugA,relation,drugB) is different from (drugB,relation,drugA). We need to determine the order with dependency relationship.
For question 2:
The original fingerprint is 881 dimension. We perform encoding again toward the results of the 881 dimension again so it results in 583. You can have a look on the table "drug" and column "smile". The maximum number is 881.

@SakuraRiven
Copy link
Author

If we want to reproduce the results in your paper or develop our own method, should the dataset keep 37264 in this repo or aug to 74528? But the latter may cause a different split and test set ...

@YifanDengWHU
Copy link
Owner

I think you should keep 37264, because it may be easier for the model to predict (drugB,drugA) if (drugA,drugB) already exists in the training set. This will lead to false high accuracy.

@SakuraRiven
Copy link
Author

OK, understand. Thanks for your answer~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants