Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different result about dataset query #10

Open
Laser-Cho opened this issue Jun 15, 2020 · 1 comment
Open

different result about dataset query #10

Laser-Cho opened this issue Jun 15, 2020 · 1 comment

Comments

@Laser-Cho
Copy link

I tried to reproduce with the same dataset (chemble22) that the author said was used in the paper by referring to the code created by the you, but the results are different.

I tried below.

SELECT DISTINCT canonical_smiles FROM compound_structures WHERE molregno IN ( SELECT DISTINCT molregno FROM activities WHERE standard_type IN ("Kd", "Ki", "Kb", "IC50", "EC50") AND standard_units = "nM" );
result is [Result: 802320 rows]

Author said "dataset of 677,044 SMILES strings with annotated nanomolar activities(Kd/i/B, IC/EC50) from ChEMBL22 "

So I use Chembl22, and
insert [standard_units = "nM"] for "nanomolar" ,
and [standard_type IN ("Kd", "Ki", "Kb", "IC50", "EC50")] for "activities(Kd/i/B, IC/EC50)"

what I missed?

@topazape
Copy link
Owner

topazape commented Jul 1, 2020

Hi, @Laser-Cho,

Sorry for the late reply.

You are right, I am aware that the number of molecules used in the paper does not match the number of molecules that can be obtained in the SQL query described in the README.md.
However, I don't know the correct SQL query because the paper doesn't give that.
If you have any good ideas, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants