### サンプルデータの用意

この演習は1,000個くらいの化学式が必要です．下記のブロックを執行して，`Materials Project`から化学式を取得します．

#### API Key

下記のブロックを実行するためには，`Materials Project`のAPI keyを予め生成してくだい．API Keyの生成に関しては，[The Materials API](https://materialsproject.org/open)を参考して下さい．

In [1]:
# your api key

api_key = '1vRmHNP6w40CzaiO'

#### import packages

In [2]:
from itertools import zip_longest
from pathlib import Path

from pymatgen.ext.matproj import MPRester
from tqdm import tqdm

import pandas as pd
import numpy as np

#### fetch function

In [3]:
def data_fetcher(api_key, mp_ids):
    
    # split requests into fixed number groups
    # eg: grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    def grouper(iterable, n, fillvalue=None):
        """Collect data into fixed-length chunks or blocks"""
        args = [iter(iterable)] * n
        return zip_longest(fillvalue=fillvalue, *args)

    # the following props will be fetched
    mp_props = [
        'band_gap',
        'density',
        'volume',
        'material_id',
        'pretty_formula',
        'elements',
        'efermi',
        'e_above_hull',
        'formation_energy_per_atom',
        'final_energy_per_atom',
        'unit_cell_formula',
        'structure'
    ]



    entries = []
    mpid_groups = [g for g in grouper(mp_ids, 10)]

    with MPRester(api_key) as mpr:
        for group in tqdm(mpid_groups):
            mpid_list = [id for id in filter(None, group)]
            chunk = mpr.query({"material_id": {"$in": mpid_list}}, mp_props)
            entries.extend(chunk)


    df = pd.DataFrame(entries, index=[e['material_id'] for e in entries])
    df = df.drop('material_id', axis=1)
    df = df.rename(columns={'unit_cell_formula': 'composition'})
    df = df.reindex(columns=sorted(df.columns))

    return df

#### データ取得

In [4]:
# read ids
mp_ids = [s.decode('utf-8') for s in np.loadtxt('data/mp_ids.txt', 'S20')]

# fetch data as pandas.DataFrame
df = data_fetcher(api_key, mp_ids)
df.head(5)

100%|██████████| 100/100 [00:41<00:00,  2.44it/s]


Unnamed: 0,band_gap,composition,density,e_above_hull,efermi,elements,final_energy_per_atom,formation_energy_per_atom,pretty_formula,structure,volume
mp-20866,0.0,"{'Ge': 4.0, 'Rh': 4.0}",9.755533,0.042011,6.337905,"[Ge, Rh]",-6.496779,-0.518768,GeRh,"[[0.80283734 1.66009566 3.26577034] Ge, [1.660...",119.52198
mp-30759,0.0,"{'Li': 1.0, 'Mg': 2.0, 'Tl': 1.0}",5.022909,0.079248,4.280492,"[Li, Mg, Tl]",-1.919116,-0.053024,LiMg2Tl,"[[3.502481 3.502481 3.502481] Li, [1.7512405 1...",85.932483
mp-3416,6.7252,"{'Na': 6.0, 'Al': 2.0, 'F': 12.0}",2.844097,0.0,-1.611541,"[Na, Al, F]",-5.035871,-3.415316,Na3AlF6,"[[3.14484661 5.38110567 2.00280763] Na, [0.304...",245.150323
mp-505412,2.0103,"{'K': 8.0, 'In': 8.0, 'S': 16.0}",3.080923,0.0,1.459451,"[In, K, S]",-3.972722,-1.270707,KInS2,"[[-3.49606564 7.70034675 13.24465715] K, [-3....",940.171139
mp-684652,6.805,"{'Be': 3.0, 'F': 6.0}",1.246979,0.222983,-5.836445,"[Be, F]",-5.542443,-3.348938,BeF2,"[[0.86831036 6.79832225 6.01464299] Be, [0.238...",187.79857


#### サンプルデータを保存

サンプルデータは`pandas.DataFrame`にして，`data/mp_samples.pd.xz`で保存します，

In [5]:
df.to_pickle('data/mp_samples.pd.xz')