# Pre-trained Model Library

**XenonPy.MDL** is a library of pre-trained models that were obtained by feeding diverse materials data on structure-property relationships into neural networks and some other supervised learning algorithms.

XenonPy offers a simple-to-use toolchain to perform **transfer learning** with the given **pre-trained models** seamlessly.
In this tutorial, we will focus on model querying and retrieving.

### useful functions

Run this cell will load some well-used packages such as `numpy`, `pandas`, and so on.
The running will also import some valuable functions which are written by ourselves.
There is no magic,  see `samples/tools.ipynb` to know what will be imported.

In [1]:
%run tools.ipynb

### access pre-trained models with MDL class

We exposed a wide range of APIs to let you query and download our models.
These APIs basically can be accessed via any HTTP requests.
For convenience, we implemented some of the most-used APIs and wrapped them into XenonPy.
All these functions can be accessed using `xenonpy.datatools.MDL`.

In [2]:
# --- import necessary libraries

from xenonpy.datatools import MDL

In [3]:
# --- init and check

mdl = MDL()
mdl

MDL(api_key='')

You can see MDL has an optional parameter ``api_key``, at now it is invalidity. Public model uploading is scheduled; this parameter is the placeholder that will be used to identify users in the future.

If everything is right, we can query models via some keywords.
Let's say we want to retrieve some models that trained by the inorganic compounds data and can predict the property of volume.
In this case, we need to feed the parameter `modelset_has` with **Stable inorganic compounds** and the `property_has` with **volume**.

If successful, the calling will return a `pandas.DataFrame` object will as the querying result. This will contains information about what models there are and their downloadable urls. All the available names are listing at https://xenonpy.readthedocs.io/en/latest/features.html#xenonpy-mdl.

In [4]:
# --- query data

summary = mdl(
    modelset_has="Stable inorganic compounds",  # sub string in the name of modelset
    property_has="volume", # substring for property name 
)

You can use `?` mark under any function's name to show it's docs.

In [5]:
mdl?

[0;31mSignature:[0m      
[0mmdl[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mmodelset_has[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mproperty_has[0m[0;34m=[0m[0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdescriptor_has[0m[0;34m=[0m[0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmethod_has[0m[0;34m=[0m[0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlang_has[0m[0;34m=[0m[0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mregress_is[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtransferred_is[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msucceed_is[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            MDL
[0;31mString form:[0m     MDL(api_key='')
[0;31mFile:[0m            ~/projects/xenonpy/xenonpy/datatools/mdl.py
[0;31mDocstring:[0m       <no docstrin

In [6]:
summary.head(5)

Unnamed: 0_level_0,url,modelSet,property,descriptor,method,lang,regress,transferred,succeed,mae,r
mId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M23001,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,22.565939,0.996093
M23002,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,295.966614,0.991945
M23003,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,151.815582,0.994928
M23004,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,50.362647,0.995265
M23005,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,87.297256,0.993133


### model downloading

Maybe you have noticed that the result also has a column named 'url.' As the name means, we can use this information to download models.
Since HTTP downloading in python is not an easy job for a novice, we also offered a rather simple function to help your downloading.

Assuming we want to download the top 5 best performance models based on their **MAE**. The downloading procedure is straight-forward as below.

#### 1. sort models by the value of **MAE**

In [7]:
summary = summary.sort_values('mae')
summary.head(5)

Unnamed: 0_level_0,url,modelSet,property,descriptor,method,lang,regress,transferred,succeed,mae,r
mId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M23001,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,22.565939,0.996093
M23265,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,22.759855,0.99542
M24137,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,22.883263,0.995601
M23203,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,22.954258,0.995931
M25054,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.volume,xenonpy.composition,pytorch.nn.neural_network,python,True,False,True,22.983179,0.996287


#### 2. get the first 5 **url**s

In [8]:
urls = summary['url'].iloc[:5]
urls

mId
M23001    http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...
M23265    http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...
M24137    http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...
M23203    http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...
M25054    http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...
Name: url, dtype: object

#### 3. download models by using ``mdl.pull`` method

In [9]:
results = mdl.pull(urls)

100%|██████████| 5/5 [00:00<00:00,  5.21it/s]


The result object is a list that contains the local paths where the downloaded models are. You can exactly specify the saving-path by feeding a string of path to the `save_to` parameter of `mdl.pull`.

In [10]:
results

['/Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1',
 '/Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/ajc1-290-261-122-66-25-10@1',
 '/Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/rzr8-290-285-177-111-58-27@1',
 '/Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/p0nx-290-285-126-52-22-13@1',
 '/Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/r4t2-290-243-131-67-23@1']

Let's see some other models. For example, the `crystal graph CNN` models.

In [11]:
summary = mdl(
    modelset_has="Stable inorganic compounds",
    descriptor_has='crystal_graph',
)

In [12]:
summary.head(5)

Unnamed: 0_level_0,url,modelSet,property,descriptor,method,lang,regress,transferred,succeed,mae,r
mId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M69008,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.efermi,xenonpy.cgcnn.crystal_graph_cnn,pytorch.nn.neural_network,python,True,False,True,0.680264,0.946909
M69009,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.efermi,xenonpy.cgcnn.crystal_graph_cnn,pytorch.nn.neural_network,python,True,False,True,0.680264,0.946909
M69010,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.efermi,xenonpy.cgcnn.crystal_graph_cnn,pytorch.nn.neural_network,python,True,False,True,0.734605,0.940486
M69011,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.efermi,xenonpy.cgcnn.crystal_graph_cnn,pytorch.nn.neural_network,python,True,False,True,0.735575,0.939878
M69012,http://xenon.ism.ac.jp/mdl/S1/inorganic.crysta...,Stable inorganic compounds in materials projec...,inorganic.crystal.efermi,xenonpy.cgcnn.crystal_graph_cnn,pytorch.nn.neural_network,python,True,False,True,0.716477,0.944133


In [13]:
summary.groupby('property')['r'].max()

property
inorganic.crystal.band_gap                     0.929118
inorganic.crystal.density                      0.995538
inorganic.crystal.efermi                       0.965173
inorganic.crystal.final_energy_per_atom        0.975265
inorganic.crystal.formation_energy_per_atom    0.990847
inorganic.crystal.total_magnetization          0.666429
inorganic.crystal.volume                       0.601476
Name: r, dtype: float64

### retrieve model

You can use `xenonpy.model.utils.Checker` to reload the downloaded models. For example, we load the first downloaded model into our notebook.

In [14]:
# --- import necessary libraries

from xenonpy.model.utils import Checker

In [15]:
checker = Checker(results[0])

If successful, use ``checker.trained_model`` to retrieve your model.

**<span style="color: red; ">Warning: checker.trained_model is deprecated, will be removed in v0.5.0</span>**

In [16]:
checker.trained_model

  """Entry point for launching an IPython kernel.


Sequential(
  (0): Layer1d(
    (layer): Linear(in_features=290, out_features=281, bias=True)
    (batch_nor): BatchNorm1d(281, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act_func): ReLU()
    (dropout): Dropout(p=0.1)
  )
  (1): Layer1d(
    (layer): Linear(in_features=281, out_features=153, bias=True)
    (batch_nor): BatchNorm1d(153, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act_func): ReLU()
    (dropout): Dropout(p=0.1)
  )
  (2): Layer1d(
    (layer): Linear(in_features=153, out_features=75, bias=True)
    (batch_nor): BatchNorm1d(75, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act_func): ReLU()
    (dropout): Dropout(p=0.1)
  )
  (3): Layer1d(
    (layer): Linear(in_features=75, out_features=21, bias=True)
    (batch_nor): BatchNorm1d(21, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (act_func): ReLU()
    (dropout): Dropout(p=0.1)
  )
  (4): Layer1d(
    (layer): Linear(in_featur

You can list all the information for a download model by printing the ``checker`` object.

In [17]:
checker

<Checker> includes:
"init_model.@1": /Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1/init_model.@1.pkl.z
"describe.@1": /Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1/describe.@1.pkl.z
"y_true.@1": /Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1/y_true.@1.pkl.z
"y_pred.@1": /Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1/y_pred.@1.pkl.z
"runner.@1": /Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1/runner.@1.pkl.z
"y_indices.@1": /Users/liuchang/projects/xenonpy/samples/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_

You can use ``checker[<file name withour extension>]`` to load any data under the model dir. An example is below:

In [18]:
checker['scores.@1']

{'mae': 22.565939,
 'rmse': 46.829636,
 'r2': 0.991713183810694,
 'pearsonr': 0.9960926,
 'spearmanr': 0.9976251253835112,
 'p_value': 0.0}

### download R model

There are also a lot of R models in **XenonPy.MDL**, download them is exactly the same works like we did above. Just use ``lang_has='r'`` when querying.

In [19]:
from xenonpy.datatools import MDL

mdl = MDL()

summary = mdl(
    modelset_has="QM9",  # sub string in the name of modelset
    property_has="hartree", # substring for property name
    lang_has='r',
)

In [20]:
summary.head(3)

Unnamed: 0_level_0,url,modelSet,property,descriptor,method,lang,regress,transferred,succeed,mae,r
mId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M18001,http://xenon.ism.ac.jp/mdl/S3/organic.nonpolym...,QM9 Dataset from Quantum-Machine project,organic.nonpolymer.g_hartree,rcdk.fp.fingerprint,mxnet.nn.neural_network,r,True,False,True,15.8705,0.8282
M18002,http://xenon.ism.ac.jp/mdl/S3/organic.nonpolym...,QM9 Dataset from Quantum-Machine project,organic.nonpolymer.g_hartree,rcdk.fp.fingerprint,mxnet.nn.neural_network,r,True,False,True,19.7939,0.789
M18003,http://xenon.ism.ac.jp/mdl/S3/organic.nonpolym...,QM9 Dataset from Quantum-Machine project,organic.nonpolymer.g_hartree,rcdk.fp.fingerprint,mxnet.nn.neural_network,r,True,False,True,14.9261,0.851


In [21]:
urls = summary['url'].iloc[:3]
results = mdl.pull(urls)

100%|██████████| 3/3 [00:00<00:00, 20.92it/s]


In [22]:
results

['/Users/liuchang/projects/xenonpy/samples/S3/organic.nonpolymer.g_hartree/rcdk.fp.fingerprint/mxnet.nn.neural_network/shotgun_G_Hartree_randFP1021_corr-0.8282_mxnet_400-81-10-1_2018-04-20',
 '/Users/liuchang/projects/xenonpy/samples/S3/organic.nonpolymer.g_hartree/rcdk.fp.fingerprint/mxnet.nn.neural_network/shotgun_G_Hartree_randFP1026_corr-0.789_mxnet_266-127-94-10-1_2018-04-20',
 '/Users/liuchang/projects/xenonpy/samples/S3/organic.nonpolymer.g_hartree/rcdk.fp.fingerprint/mxnet.nn.neural_network/shotgun_G_Hartree_randFP1033_corr-0.851_mxnet_150-65-21-1_2018-04-20']

**R tutorials will be released later.**