Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event_db #6

Open
ShenggengLin opened this issue Nov 10, 2020 · 4 comments
Open

event_db #6

ShenggengLin opened this issue Nov 10, 2020 · 4 comments

Comments

@ShenggengLin
Copy link

Hello, Deng Yifan. I'm very interested in your article. I think it's a very good job. So right now I'm trying to replicate it.

I would like to ask you two questions:

First: the article says that you extracted 74528 pairs of DDI, but in the event.db There are only 37264 pairs of DDI. I wonder if your experiment only used 37264 pairs of DDI?

Second: In the drug table of event.db, the smile characteristics of drugs are some numbers. Did you use rdkit to convert a smile string into an 881 dimensional fingerprint? I am a fourth year undergraduate student. I have been looking for it on the Internet for a long time, but I still don't know how to convert it. If it's convenient, can you disclose this code?

Looking forward to your reply, thank you very much!

@YifanDengWHU
Copy link
Owner

Hi, Shenggeng!
For the first problem, this is because the same drug-drug pair are recorded twice in the data. For example, (sildenafil, Isosorbide mononitrate) and (Isosorbide mononitrate, sildenafil) for another. But they are the same in fact. So we delete half of them.
For the second problem. Just try to learn the usage of RDKit package. For example, for the drug Isosorbide mononitrate. We can collect its SMILES [H][C@]12OCC@@H[C@@]1([H])OC[C@@h]2O from DrugBank.
So here is the code:

from rdKit import Chem
from rdkit.Chem import AllChem
smile = '[H][C@]12OC[C@@H](O[N+]([O-])=O)[C@@]1([H])OC[C@@H]2O'
mol = Chem.MolFromSmiles(smile)
morgan_hashed = AllChem.GetMorganFingerprintAsBitVect(mol,2,nBits=881)
morgan_hashed.ToBitString()

It will be a bit vector of 881 length.

@ShenggengLin
Copy link
Author

Hello, Yifan!

Thank you very much for your reply. I have understand the first question. Thank you very much!

But I still have questions about the second question.

For drug DB01296, his smiles is' N[C@H]1C(O)OC@HC@@H[C@@h]1O '. Through the code you provided, I did get a 881 dimensional vector. But in the event.db , its smiles features are 9|10|14|18|19|20|178|181|283|284|285|286|299|308|332|338|339|340|341|344|345|346|347|351|352|365|366|367|380|393|405|406|528|563|566|567|571|582|592|614|615|617|637|638|639|643|661|662|663|679|680|681|682|683|689|690|691|701|703.
I wonder what these numbers mean?Does it mean that these positions are 1 in the 881 dimensional vector? But if this is the case, for the drug db01296, its ninth digit is 0, but there are 9 in these numbers. And its 16th digit is 1, but there is no 16 in these numbers.

@YifanDengWHU
Copy link
Owner

Yes, you are right.
The reason is because the fingerprint methods are different. For the fingerprint in the current dataset, it is obtained by a former student. He used the RDkit in JAVA.
The code in my code used MorganFingerprint. It is the most common method. I have test the result. There is little difference between the current dataset's fingerprint and MorganFingerprint.

@ShenggengLin
Copy link
Author

OK, I see. Thank you for your reply!Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants