In [1]:
from pymatgen import Structure

# user-friendly print
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Crystal structure generation

To generate legal structure under a given space group with specific chemical composition, basicly the following four steps are needed.

1. calculate possible Wyckoff configurations under a given space group for each chemical composition.
2. generate fraction positions for each element with given a Wyckoff configuration which is calculated from step 1), randomly.
3. generate lattice for the given space group which is used in step 1), randomly.
4. combine the results from step 2) and 3) to obtain a legal structure.

Usually, we also have to check the `volume` and `atomic distances` of generated structure, only keep the structures which have reasonable `volume` and `atomic distances`.

To facilitate these jobs, our `crystallus` library provides four modules:

* `WyckoffCfgGenerator`: generate possible Wyckoff configurations for the given space group and composition of primitive cell.
* `CrystalGenerator`: generate crystal structures for the given space group and Wyckoff configurations.
* `WyckoffDB, SpaceGroupDB`: database include space group and corresponding Wyckoff information.

We will show how to use `crystallus` to generate legal structures.

### 1. generate Wyckoff configurations

As an example, we will try to generate structures for `Ca2C2O6`. The true space group of this structure is `167`, and the Wyckoff configuration is `{Ca: 2b, C: 2a, O: 6e}`.
You can use `SpaceGroupDB` to get the information of Wyckoff position about space group `167`.

In [2]:
from crystallus import SpaceGroupDB

wys = SpaceGroupDB.get(spacegroup_num=167).wyckoffs
[{'Wyckoff letter': w.letter, 'multiplicity': w.multiplicity, 'reusable': w.reuse, 'Wyckoff position': w.positions} for w in wys ]

[{'Wyckoff letter': 'f',
  'multiplicity': 12,
  'reusable': True,
  'Wyckoff position': '(x,y,z), (z,x,y), (y,z,x), (-y+1/2,-x+1/2,-z+1/2), (-x+1/2,-z+1/2,-y+1/2), (-z+1/2,-y+1/2,-x+1/2), (-x,-y,-z), (-z,-x,-y), (-y,-z,-x), (y+1/2,x+1/2,z+1/2), (x+1/2,z+1/2,y+1/2), (z+1/2,y+1/2,x+1/2)'},
 {'Wyckoff letter': 'e',
  'multiplicity': 6,
  'reusable': True,
  'Wyckoff position': '(x,-x+1/2,1/4), (1/4,x,-x+1/2), (-x+1/2,1/4,x), (-x,x+1/2,3/4), (3/4,-x,x+1/2), (x+1/2,3/4,-x)'},
 {'Wyckoff letter': 'd',
  'multiplicity': 6,
  'reusable': False,
  'Wyckoff position': '(1/2,0,0), (0,1/2,0), (0,0,1/2), (1/2,0,1/2), (0,1/2,1/2), (1/2,1/2,0)'},
 {'Wyckoff letter': 'c',
  'multiplicity': 4,
  'reusable': True,
  'Wyckoff position': '(x,x,x), (-x+1/2,-x+1/2,-x+1/2), (-x,-x,-x), (x+1/2,x+1/2,x+1/2)'},
 {'Wyckoff letter': 'b',
  'multiplicity': 2,
  'reusable': False,
  'Wyckoff position': '(0,0,0), (1/2,1/2,1/2)'},
 {'Wyckoff letter': 'a',
  'multiplicity': 2,
  'reusable': False,
  'Wyckoff position

Let's generate some possible Wyckoff configurations for the composition `Ca2C2O6` under space group `167`.

In [3]:
from crystallus import WyckoffCfgGenerator

WyckoffCfgGenerator?

[0;31mInit signature:[0m [0mWyckoffCfgGenerator[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mmax_recurrent[0m[0;34m=[0m[0;36m1000[0m[0;34m,[0m [0mn_jobs[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m**[0m[0mcomposition[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      <no docstring>
[0;31mInit docstring:[0m
A generator for possible Wyckoff configuration generation.

Parameters
----------
max_recurrent : int, optional
    Max recurrent until generate a reasonable structure, by default 5_000
n_jobs : int, optional
    Number of cpu cores when parallel calculation, by default -1
composition: Dict
    Composition of compounds in the primitive cell; should be formated
    as {<element symbol>: <ratio in float>}.
[0;31mFile:[0m           /usr/local/miniconda3/envs/crystallus/lib/python3.7/site-packages/crystallus/wyckoff_cfg_generator.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [4]:
composition = {'Ca': 2, 'C': 2, 'O': 6}

wyg = WyckoffCfgGenerator(**composition)
wyg

WyckoffCfgGenerator(            
    max_recurrent=1000,            
    n_jobs=-1            
    composition={'Ca': 2, 'C': 2, 'O': 6}            
)

You have noticed that the minimum input for the initialization of a `WyckoffCfgGenerator` is just a composition.
Now, we can try to use this generator to generate Wyckoff configuration(s). First, let's try to generate one. This can be done by the `gen_one` method.

In [5]:
wyg.gen_one?

[0;31mSignature:[0m [0mwyg[0m[0;34m.[0m[0mgen_one[0m[0;34m([0m[0mspacegroup_num[0m[0;34m:[0m [0mint[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to generate a possible Wyckoff configuration under the given space group.

Parameters
----------
spacegroup_num : int
    Space group number.

Returns
-------
Dict
    Wyckoff configuration set, which is a dict with format like:
    {"Li": ["a", "c"], "O": ["i"]}. Here, the "Li" is an available element
    symbol and ["a", "c"] is a list which contains coresponding Wyckoff
    letters. For convenience, dict will be sorted by keys.
[0;31mFile:[0m      /usr/local/miniconda3/envs/crystallus/lib/python3.7/site-packages/crystallus/wyckoff_cfg_generator.py
[0;31mType:[0m      method


In [6]:
cfg = wyg.gen_one(spacegroup_num=167)
cfg

{'C': ['a'], 'Ca': ['b'], 'O': ['d']}

If everything goes well, the above cell will return you a dict with something like: `{'C': ['b'], 'Ca': ['a'], 'O': ['d']}`.
Here, `C`, `Ca`, and `O` are the element names that had been sorted by the alphabet. `['b']`, `['a']`, and `['d']` are the corresponding Wyckoff positions which are provided by Wyckoff letters.

Maybe you are confused that the return of this cell is not unique. That makes sense because all the possible configurations for composition `Ca2C2O6` under space group `167` are four. This means if you run this cell many times, you will get all these configurations. Because in almost all the cases, the possible configurations are not one, so instead of using `gen_one`, you should use `gen_many`.

In [7]:
wyg.gen_many?

[0;31mSignature:[0m [0mwyg[0m[0;34m.[0m[0mgen_many[0m[0;34m([0m[0msize[0m[0;34m:[0m [0mint[0m[0;34m,[0m [0;34m*[0m[0mspacegroup_num[0m[0;34m:[0m [0mint[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to generate possible Wyckoff configuration sets.

Parameters
----------
size : int
    How many times to try for one space group.
spacegroup_num: int
    The spacegroup numbers.

Returns
-------
Dict[int, List[Dict]], List[Dict]
    A collection contains spacegroup number and it's corresponding Wyckoff
    configurations (wy_cfg). If only one spacegroup number was given,
    will only return the list of wy_cfgs, otherwise return in dict with
    spacegroup number as key. wy_cfgs will be formated as
    {element 1: [Wyckoff_letter, Wyckoff_letter, ...], element 2: [...], ...}.
[0;31mFile:[0m      /usr/local/miniconda3/envs/crystallus/lib/python3.7/site-packages/crystallus/wyckoff_cfg_generator.py
[0;31mType:[0m      method


In [8]:
cfgs = wyg.gen_many(100, 167)
cfgs

[{'C': ['a'], 'Ca': ['b'], 'O': ['d']},
 {'C': ['b'], 'Ca': ['a'], 'O': ['e']},
 {'C': ['b'], 'Ca': ['a'], 'O': ['d']},
 {'C': ['a'], 'Ca': ['b'], 'O': ['e']}]

You can calculate more than one space group in one call. Just list space group numbers as `*` parameters. In this case, the return will be a dict with space group number as key and configuration list as value. For example, if our space group candidate are `[194, 148, 167]`, you can call `gen_many` like this:

In [9]:
%%time

cfgs = wyg.gen_many(20, 194, 148, 167)
cfgs

CPU times: user 7.56 ms, sys: 3.85 ms, total: 11.4 ms
Wall time: 2.4 ms


{194: [{'C': ['a'], 'Ca': ['d'], 'O': ['c', 'f']},
  {'C': ['a'], 'Ca': ['c'], 'O': ['b', 'e']},
  {'C': ['a'], 'Ca': ['d'], 'O': ['g']},
  {'C': ['d'], 'Ca': ['b'], 'O': ['a', 'e']},
  {'C': ['b'], 'Ca': ['d'], 'O': ['e', 'c']},
  {'C': ['c'], 'Ca': ['b'], 'O': ['d', 'f']},
  {'C': ['d'], 'Ca': ['c'], 'O': ['b', 'e']},
  {'C': ['b'], 'Ca': ['d'], 'O': ['f', 'a']},
  {'C': ['d'], 'Ca': ['a'], 'O': ['f', 'c']},
  {'C': ['d'], 'Ca': ['b'], 'O': ['g']},
  {'C': ['b'], 'Ca': ['c'], 'O': ['f', 'a']},
  {'C': ['b'], 'Ca': ['c'], 'O': ['g']},
  {'C': ['a'], 'Ca': ['d'], 'O': ['h']},
  {'C': ['b'], 'Ca': ['a'], 'O': ['g']},
  {'C': ['c'], 'Ca': ['b'], 'O': ['g']},
  {'C': ['c'], 'Ca': ['b'], 'O': ['d', 'e']},
  {'C': ['c'], 'Ca': ['a'], 'O': ['h']},
  {'C': ['b'], 'Ca': ['c'], 'O': ['d', 'e']}],
 148: [{'C': ['c'], 'Ca': ['c'], 'O': ['f']},
  {'C': ['a', 'b'], 'Ca': ['c'], 'O': ['f']},
  {'C': ['a', 'b'], 'Ca': ['c'], 'O': ['d', 'e']},
  {'C': ['b', 'a'], 'Ca': ['c'], 'O': ['e', 'd']},
  {'C':

An iterative version of `gen_many` name `gen_many_iter` is also provided. You can use this method to render a progress bar during generation.

In [17]:
%%time

from tqdm.notebook import tqdm

space_group_cans = [194, 148, 167, 161, 11, 12, 65, 140, 225]

with tqdm(total=len(space_group_cans)) as pbar:
    for spacegroup_num, cfg_list in wyg.gen_many_iter(5000, *space_group_cans):
        print(f'space group: {spacegroup_num}, size of generated samples: {len(cfg_list)}')
        pbar.update()

HBox(children=(FloatProgress(value=0.0, max=9.0), HTML(value='')))

space group: 194, size of generated samples: 120
space group: 148, size of generated samples: 56
space group: 167, size of generated samples: 4
space group: 161, size of generated samples: 2
space group: 11, size of generated samples: 630
space group: 12, size of generated samples: 4365
space group: 65, size of generated samples: 4789
space group: 140, size of generated samples: 192
space group: 225, size of generated samples: 8

CPU times: user 22.5 s, sys: 117 ms, total: 22.7 s
Wall time: 3.28 s


### 2. generate crystal structures

In [None]:
from crystallus import CrystalGenerator

In [None]:
composition = 'Ti4O8'
estimated_volume = 146.706
estimated_variance = 20
sp_num = 12

In [None]:
cg = CrystalGenerator(sp_num, estimated_volume, estimated_variance)
cg

In [None]:
len(cfgs[sp_num])

In [None]:
%%time

ret = cg.gen_many(10, *cfgs[sp_num])

In [None]:
%%time

len(ret)
joblib.dump(ret, f"{composition}_space_group_{sp_num}.pkl.z")

In [None]:
for i, tmp in enumerate(ret):
    s = Structure(lattice=tmp['lattice'], species=tmp['species'], coords=np.asarray(tmp['coords']).reshape(-1, 3))
    s.to(fmt='cif', filename=f'generated_cifs/{i}.cif', symprec=0.01)

---------

In [None]:
structure_table = pd.read_pickle('Ti4O8_structure_table_old.pd.xz')
structure_table

In [None]:
tmp = structure_table.groupby('spacegroup_num')['wy_letters'].value_counts()

In [None]:
from matplotlib import pyplot as plt

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8),dpi=100)
structure_table.shape
ax = structure_table.groupby('spacegroup_num').count().sort_values('formula', ascending=False).plot.bar(y=['structure'], ax=ax1)
ax.text(25,190000,'Ca2C2O6\nsize: 624899', fontsize=15, ha='right')

ax = structure_table.volume.hist(ax=ax2)
ax.xaxis.grid(False)
ax.grid(linestyle='--', linewidth=1, axis='y')
ax.set_xlabel('volume')

plt.tight_layout()