# A demonstration of ChEMBL webresource client

ChEMBL webresource client is the official Python cliet library that helps access ChEMBL data. It provides a simple interface to the ChEMBL web services and allows to query the ChEMBL database and retrieve data in a programmatic way. Adapted from the official [demo](http://beta.mybinder.org/v2/gh/chembl/chembl_webresource_client/master?filepath=demo_wrc.ipynb) included in the repo of [`chembl_webresource_client`](https://github.com/chembl/chembl_webresource_client), this jupyter notebook presents some more detailed examples of how to use the client library to access ChEMBL data.

## Available filters
ChEMBL webresource client provides a number of lookups for flexible quering of the database, making it filter and retrieve specific data. Here are some notes about each lookup type supported by ChEMBL:
  - `exact`: Matches an exact value, e.g., `name__exact='Aspirin'` would match if the name is exactly `'Aspirin'`.
  - `iexact`: Case-insensitive exact value, e.g., `name__iexact='aspirin'` would match if the name is `'aspirin'`, `'ASPIRIN'`, etc.
  - `contains`: Checks if a value contains the specified substring, e.g., `description__contains='pain'` would match if the description contains the substring `'pain'`.
  - `icontains`: Case-insensitive contains, e.g., `description__icontains='Pain'` would match if the description contains `'pain'`, `'Pain'`, etc.
  - `in`: Matches if the value is within a specified list, e.g., `id__in=[1, 2, 3]` would match if the id is 1, 2, or 3.
  - `gt`/`gte/lt/lte`: Greater than/greater than or equal to/less than/less than or equal to, e.g., `value__gt=10` would match if the value is greater than 10.
  - `startswith/endswith`: Matches if a value starts/ends with the specified substring, e.g., `name__startswith='Asp'` would match if the name starts with `'Asp'`.
  - `istartswith/iendswith`: Case-insensitive starts/ends with, e.g., `name__istartswith='asp'` would match if the name starts with `'asp'`, `'Asp'`, etc.
  - `range`: Matches if a value is within a specified range, e.g., `value__range=(1, 10)` would atch if the value is between 1 and 10 inclusive. 
  - `isnull`: Matches if a value is null, e.g., `date_isnull=True` would match if the date is null.
  - `regex`: Matches if a value matches the specified regular expression, e.g.,  `name__regex=r'^[A-Z]'` would match if the name starts with an uppercase letter.
  - `iregex`: Case-insensitive regular expression match, e.g., `name__iregex=r'^[a-z]'` would match if the name starts with a letter, regardless of case.
  - `search`: Full-text search, supported in som backends like PostgresSQL, e.g., `description__search='pain relief'` would match if the description matches the full-text search term `'pain relief'`.


## The `only` operator
### Introduction
The `only` method in the ChEMBL webresrouce client is used to limit the results of a query to a specific set of fields. The `only` method takes a single argument, which is a list of fields that one wants to include in the result. When one uses the `only` method to specify a subset of fields, the API will return only those fields, rather than the entire dataset. This reduces the amount of data being transmitted over the network, thus saving bandwidth and making the API call/query faster.

### Example

In [2]:
from chembl_webresource_client.new_client import new_client

# Create a client for molecules
molecule = new_client.molecule

mol_1 = molecule.get('CHEMBL25')
mol_2 = molecule.filter(chembl_id='CHEMBL25').only(['chembl_id', 'molecule_properties', 'molecule_structures'])
mol_3 = molecule.filter(chembl_id='CHEMBL25')

As can be checked, `mol_1` is a dictionary containing 36 keys relevant to the compound aspirin, including `'atc_classifications'`, `'availability_type'`, ... `'molecule_properties'`, `'molecule_structures'`, ... etc.

In [6]:
mol_1.keys()



On the other hand, `mol_2` is an object of `chembl_webresource_client.query_set.QuerySet`. It can be indexed and has only one element in this case. Specifically, `mol_2[0]` is a dictionary containing only two keys (`'molecule_properties'` and `'molecule_structures'` for the same compound aspirin.)

In [7]:
type(mol_2)

chembl_webresource_client.query_set.QuerySet

In [8]:
mol_2[0]

{'molecule_properties': {'alogp': '1.31',
  'aromatic_rings': 1,
  'cx_logd': '-2.16',
  'cx_logp': '1.24',
  'cx_most_apka': '3.41',
  'cx_most_bpka': None,
  'full_molformula': 'C9H8O4',
  'full_mwt': '180.16',
  'hba': 3,
  'hba_lipinski': 4,
  'hbd': 1,
  'hbd_lipinski': 1,
  'heavy_atoms': 13,
  'molecular_species': 'ACID',
  'mw_freebase': '180.16',
  'mw_monoisotopic': '180.0423',
  'np_likeness_score': '0.12',
  'num_lipinski_ro5_violations': 0,
  'num_ro5_violations': 0,
  'psa': '63.60',
  'qed_weighted': '0.55',
  'ro3_pass': 'N',
  'rtb': 2},
 'molecule_structures': {'canonical_smiles': 'CC(=O)Oc1ccccc1C(=O)O',
  'molfile': '\n     RDKit          2D\n\n 13 13  0  0  0  0  0  0  0  0999 V2000\n   19.8052   -4.2758    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n   19.8040   -5.0953    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n   20.5121   -5.5043    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n   21.2217   -5.0948    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0\n  

`mol_3` is also an object of `chembl_webresource_client.query_set.QuerySet` and also has only one element, which is a dictionary for the compound aspirin. However, since the `only` method was not used. `mol_3[0]` is equivalent to `mol_1`.

In [9]:
mol_3[0] == mol_1

True

So for `mol_1` and `mol_3`, the query retrieved all information about the molecule with ChEMBL ID `CHEMBL25`, which can include a lot of data. On the other hand, for `mol_2`, the query was limited to only return the `chembl_id`, `molecule_properties`, and `molecule_structures` fields for the molecule with ChEMBL ID `CHEMBL25`. This reduces the amount of data returned and can speed up the query.