In [1]:
# user-friendly print
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Crystal structure generation

To generate reasonable crystal structure under a given space group with a specific chemical composition, basicly the following four steps are needed.

1. calculate possible Wyckoff configurations under a given space group for each chemical composition.
2. generate fraction positions for each element with given a Wyckoff configuration which is calculated from step 1), randomly.
3. generate lattice for the given space group which is used in step 1), randomly.
4. combine the results from step 2) and 3) to obtain a crystal structure.

Usually, we also have to check the `volume` and `atomic distances` of generated structure, only keep the structures which have reasonable `volume` and `atomic distances`.

To facilitate all these jobs, our `crystallus` library provides three modules:

* `WyckoffCfgGenerator`: generate possible Wyckoff configurations for the given space group and composition of primitive cell.
* `CrystalGenerator`: generate crystal structures for the given space group and Wyckoff configurations.
* `WyckoffDB, SpaceGroupDB`: database include space group and corresponding Wyckoff information.

The folloing content shows how to use `crystallus`.

### 1. Spacegroup information DB

You can use `SpaceGroupDB` to get the information of a given spacegroup such as Wyckoff positions. For example, get information of space group `167`

In [2]:
from crystallus import SpaceGroupDB

wys = SpaceGroupDB.get(spacegroup_num=167).wyckoffs
[{'Wyckoff letter': w.letter, 'multiplicity': w.multiplicity, 'reusable': w.reuse, 'Wyckoff position': w.positions} for w in wys ]

[{'Wyckoff letter': 'f',
  'multiplicity': 12,
  'reusable': True,
  'Wyckoff position': '(x,y,z), (z,x,y), (y,z,x), (-y+1/2,-x+1/2,-z+1/2), (-x+1/2,-z+1/2,-y+1/2), (-z+1/2,-y+1/2,-x+1/2), (-x,-y,-z), (-z,-x,-y), (-y,-z,-x), (y+1/2,x+1/2,z+1/2), (x+1/2,z+1/2,y+1/2), (z+1/2,y+1/2,x+1/2)'},
 {'Wyckoff letter': 'e',
  'multiplicity': 6,
  'reusable': True,
  'Wyckoff position': '(x,-x+1/2,1/4), (1/4,x,-x+1/2), (-x+1/2,1/4,x), (-x,x+1/2,3/4), (3/4,-x,x+1/2), (x+1/2,3/4,-x)'},
 {'Wyckoff letter': 'd',
  'multiplicity': 6,
  'reusable': False,
  'Wyckoff position': '(1/2,0,0), (0,1/2,0), (0,0,1/2), (1/2,0,1/2), (0,1/2,1/2), (1/2,1/2,0)'},
 {'Wyckoff letter': 'c',
  'multiplicity': 4,
  'reusable': True,
  'Wyckoff position': '(x,x,x), (-x+1/2,-x+1/2,-x+1/2), (-x,-x,-x), (x+1/2,x+1/2,x+1/2)'},
 {'Wyckoff letter': 'b',
  'multiplicity': 2,
  'reusable': False,
  'Wyckoff position': '(0,0,0), (1/2,1/2,1/2)'},
 {'Wyckoff letter': 'a',
  'multiplicity': 2,
  'reusable': False,
  'Wyckoff position

### 2. generate Wyckoff configurations

As an example, we will try to generate structures for `Ca2C2O6`. The true space group of this structure is `167`, and the Wyckoff configuration is `{Ca: 2b, C: 2a, O: 6e}`.
First, let's generate some possible Wyckoff configurations for the composition `Ca2C2O6` under space group `167`.

In [3]:
from crystallus import WyckoffCfgGenerator

WyckoffCfgGenerator?

[0;31mInit signature:[0m
[0mWyckoffCfgGenerator[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mcomposition[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmax_recurrent[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m1000[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_jobs[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;34m-[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpriority[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mDict[0m[0;34m[[0m[0mint[0m[0;34m,[0m [0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mfloat[0m[0;34m][0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      <no docstring>
[0;31mInit docstring:[0m
A generator for possible Wyckoff configuration generation.

Parameters
----------
max_recurrent:
    Max recurrent until generate a reasonable structure, by default 5_000
n_jobs:
    Number of cpu

In [5]:
composition = {'Ca': 2, 'C': 2, 'O': 6}

wyg = WyckoffCfgGenerator(composition)
wyg

WyckoffCfgGenerator(            
    max_recurrent=1000,            
    n_jobs=-1            
    priority=None            
    composition={'Ca': 2, 'C': 2, 'O': 6}            
)

You have noticed that the minimum input for the initialization of a `WyckoffCfgGenerator` is just chemical composition as a python dict.
Then, we can use `wyg.gen_one` or `wyg.gen_many` methods to generate Wyckoff configuration(s).

First, let's try the `gen_one` method.

In [6]:
wyg.gen_one?

[0;31mSignature:[0m [0mwyg[0m[0;34m.[0m[0mgen_one[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mspacegroup_num[0m[0;34m:[0m [0mint[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to generate a possible Wyckoff configuration under the given space group.

Parameters
----------
spacegroup_num:
    Space group number.

Returns
-------
Dict
    Wyckoff configuration set, which is a dict with format like:
    {"Li": ["a", "c"], "O": ["i"]}. Here, the "Li" is an available element
    symbol and ["a", "c"] is a list which contains coresponding Wyckoff
    letters. For convenience, dict will be sorted by keys.
[0;31mFile:[0m      ~/projects/crystallus/crystallus/wyckoff_cfg_generator.py
[0;31mType:[0m      method


In [7]:
cfg = wyg.gen_one(spacegroup_num=167)
cfg

{'C': ['a'], 'Ca': ['b'], 'O': ['e']}

If everything goes well, the above cell will return a dict contains something like: `{'C': ['b'], 'Ca': ['a'], 'O': ['d']}`.
Here, `C`, `Ca`, and `O` are the element names. All elements are sorted by their alphabet. The `['b']`, `['a']`, and `['d']` are the corresponding Wyckoff positions which are provided by Wyckoff letters.

Maybe you are confused that the return of this method is not unique. That makes sense because under space group `167`, there are four possible configurations for the composition `Ca2C2O6`. Call the `gen_one` method will execute a random search in all possible configurations. When it finds one, it returns the result and stops searching. This means if you want to get more configurations, you should call the `gen_one` method many times.

We know that for almost all the cases, the possible configurations are not one, to simplify your works, we provide the `gen_many` method.

In [8]:
wyg.gen_many?

[0;31mSignature:[0m [0mwyg[0m[0;34m.[0m[0mgen_many[0m[0;34m([0m[0msize[0m[0;34m:[0m [0mint[0m[0;34m,[0m [0;34m*[0m[0;34m,[0m [0mspacegroup_num[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mint[0m[0;34m,[0m [0mSequence[0m[0;34m[[0m[0mint[0m[0;34m][0m[0;34m][0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to generate possible Wyckoff configuration sets.

Parameters
----------
size:
    How many times to try for one space group.
spacegroup_num:
    Spacegroup numbers to generate Wyckoff configurations.

Returns
-------
Dict[int, List[Dict]], List[Dict]
    A collection contains spacegroup number and it's corresponding Wyckoff
    configurations (wy_cfg). If only one spacegroup number was given,
    will only return the list of wy_cfgs, otherwise return in dict with
    spacegroup number as key. wy_cfgs will be formated as
    {element 1: [Wyckoff_letter, Wyckoff_letter, ...], element 2: [...], ...}.
[0;31mFile:[0m      ~/projects/cry

In [9]:
cfgs = wyg.gen_many(100, spacegroup_num=167)
cfgs

[{'C': ['a'], 'Ca': ['b'], 'O': ['d']},
 {'C': ['b'], 'Ca': ['a'], 'O': ['d']},
 {'C': ['a'], 'Ca': ['b'], 'O': ['e']},
 {'C': ['b'], 'Ca': ['a'], 'O': ['e']}]

You can calculate more multiply space group in one call. Just list space group numbers as `*` parameters. In this case, the return will be a dict with space group number as key and configuration list as value. For example, if our space group candidate are `[194, 148, 167]`, you can call `gen_many` like this:

In [10]:
%%time

cfgs = wyg.gen_many(20, spacegroup_num=(194, 148, 167))
cfgs

CPU times: user 83.4 ms, sys: 23.5 ms, total: 107 ms
Wall time: 8.54 ms


{194: [{'C': ['c'], 'Ca': ['d'], 'O': ['b', 'e']},
  {'C': ['b'], 'Ca': ['d'], 'O': ['c', 'f']},
  {'C': ['b'], 'Ca': ['d'], 'O': ['h']},
  {'C': ['d'], 'Ca': ['c'], 'O': ['h']},
  {'C': ['d'], 'Ca': ['a'], 'O': ['b', 'e']},
  {'C': ['a'], 'Ca': ['c'], 'O': ['g']},
  {'C': ['d'], 'Ca': ['c'], 'O': ['g']},
  {'C': ['b'], 'Ca': ['d'], 'O': ['a', 'e']},
  {'C': ['d'], 'Ca': ['c'], 'O': ['a', 'f']},
  {'C': ['c'], 'Ca': ['b'], 'O': ['g']},
  {'C': ['a'], 'Ca': ['c'], 'O': ['b', 'e']},
  {'C': ['c'], 'Ca': ['b'], 'O': ['h']},
  {'C': ['b'], 'Ca': ['c'], 'O': ['d', 'f']},
  {'C': ['c'], 'Ca': ['a'], 'O': ['b', 'f']},
  {'C': ['a'], 'Ca': ['b'], 'O': ['d', 'f']},
  {'C': ['c'], 'Ca': ['a'], 'O': ['g']},
  {'C': ['a'], 'Ca': ['c'], 'O': ['h']},
  {'C': ['a'], 'Ca': ['b'], 'O': ['c', 'e']},
  {'C': ['d'], 'Ca': ['a'], 'O': ['c', 'e']}],
 148: [{'C': ['c'], 'Ca': ['c'], 'O': ['f']},
  {'C': ['c'], 'Ca': ['a', 'b'], 'O': ['f']},
  {'C': ['a', 'b'], 'Ca': ['c'], 'O': ['d', 'e']},
  {'C': ['a', 'b'

`gen_many_iter` is an iterative version of `gen_many`. You can use this method to render a progress bar during generation, or something else you want.

In [11]:
%%time

from tqdm.notebook import tqdm

space_group_cans = [194, 148, 167, 161, 11, 12, 65, 140, 225]

with tqdm(total=len(space_group_cans)) as pbar:
    for spacegroup_num, cfg_list in wyg.gen_many_iter(5000, spacegroup_num=space_group_cans):
        print(f'space group: {spacegroup_num}, size of generated samples: {len(cfg_list)}')
        pbar.update()

  0%|          | 0/9 [00:00<?, ?it/s]

space group: 194, size of generated samples: 72
space group: 148, size of generated samples: 14
space group: 167, size of generated samples: 4
space group: 161, size of generated samples: 2
space group: 11, size of generated samples: 199
space group: 12, size of generated samples: 2395
space group: 65, size of generated samples: 4358
space group: 140, size of generated samples: 96
space group: 225, size of generated samples: 4
CPU times: user 3min 56s, sys: 723 ms, total: 3min 57s
Wall time: 9.35 s


### 3. generate Wyckoff configurations with prior probability

Sometimes, we'd like to generate Wyckoff configurations from a given priority list of Wyckoff letters. For example, in the case of `Ca2C2O6` with space group `167`, we expect that the possible Wyckoff positions [`a`, `b`, `d`, `e`] should be sampled by the probability of [40%, 20%, 40%, 0%], respectively.
We provide the `priority` option just for these cases.

In [21]:
composition = {'Ca': 2, 'C': 2, 'O': 6}

wyg = WyckoffCfgGenerator(
    composition,
    priority={
        167: {'a': 0.4, 'b': 0.2, 'd': 0.4, 'e': 0}
    },
)
wyg

WyckoffCfgGenerator(            
    max_recurrent=1000,            
    n_jobs=-1            
    priority={167: {'a': 0.4, 'b': 0.2, 'd': 0.4, 'e': 0}}            
    composition={'Ca': 2, 'C': 2, 'O': 6}            
)

When generating, the priority list will be normalized. You can see that we can only get two configurations without letter `e`.

In [22]:
cfgs = wyg.gen_many(100, spacegroup_num=167)
cfgs

[{'C': ['a'], 'Ca': ['b'], 'O': ['d']}, {'C': ['b'], 'Ca': ['a'], 'O': ['d']}]

Also you can test that if we remove letter `a` from the priority list, we can not generate any configuration because `a` and `b` must exist.

In [23]:
wyg = WyckoffCfgGenerator(
    composition,
    priority={
        167: {'a': 0.0}
    },
)
wyg.gen_many(100, spacegroup_num=167)

[]

### 4. generate crystal structures

We have generated some Wyckoff configurations, the next is consuming the Wyckoff configurations to generate crystal structures. To facilitate the task, we provide the `CrystalGenerator` class.

In [11]:
from crystallus import CrystalGenerator

CrystalGenerator?

[0;31mInit signature:[0m
[0mCrystalGenerator[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mspacegroup_num[0m[0;34m:[0m [0mint[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvolume_of_cell[0m[0;34m:[0m [0mfloat[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvariance_of_volume[0m[0;34m:[0m [0mfloat[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mangle_range[0m[0;34m:[0m [0mTuple[0m[0;34m[[0m[0mfloat[0m[0;34m,[0m [0mfloat[0m[0;34m][0m [0;34m=[0m [0;34m([0m[0;36m30.0[0m[0;34m,[0m [0;36m150.0[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mangle_tolerance[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m20.0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlattice[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mTuple[0m[0;34m[[0m[0mfloat[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mempirical_coords[0m[0;34m:[0m [0mOptional[0m[0;34m[[0

To initialize a `CrystalGenerator`, parameter `spacegroup_num`, `volume_of_cell` of primitive cell and the `variance_of_volume` are needed at least.

In [12]:
volume_of_cell = 127.170256
variance_of_volume = 20.
sp_num = 167

cg = CrystalGenerator(sp_num, volume_of_cell, variance_of_volume)
cg

CrystalGenerator(            
    spacegroup_num=167,            
    volume_of_cell=127.170256,            
    variance_of_volume=20.0,            
    angle_range=(30.0, 150.0),            
    angle_tolerance=20.0,            
    max_attempts_number=5000,            
    lattice=None,            
    empirical_coords=None,            
    empirical_coords_variance=0.01,            
    empirical_coords_sampling_rate=1.0,            
    empirical_coords_loose_sampling=True,            
    verbose=False            
    n_jobs=-1            
)

Like the `WyckoffCfgGenerator`, there are also `gen_one`, `gen_many`, and `gen_may_iter` methods attached with `CrystalGenerator` object. Let's ues the `gen_one` method for a quick try.

In [13]:
cg.gen_one?

[0;31mSignature:[0m
[0mcg[0m[0;34m.[0m[0mgen_one[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcheck_distance[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdistance_scale_factor[0m[0;34m:[0m [0mfloat[0m [0;34m=[0m [0;36m0.1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mcfg[0m[0;34m:[0m [0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mTuple[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Try to generate a legal crystal structure with given configuration set.

Parameters
----------
check_distance: bool, optional
    Whether the atomic distance should be checked. default ``True``
distance_scale_factor : float, optional
    Scale factor to determine the tolerance of atomic distances when distance checking. Unit is Å,
    When ``check_distance`` is ``True``,

All `gen_xxx` methods consume Wyckoff configurations to generate crystal structures. Please note the parameter `distance_scale_factor`, generator use this parameter to determine the acceptabel atomic distance. Here is the accept condition:
> distance between atom `a` and `b` > (radius of `a` + radius of `b`) x (1 – `distance_scale_factor`).

If the generator cannot generate any structure after multiple attempts, you can try to relax the atomic distance constraint by increasing this parameter.

In [14]:
%%time

cfgs[sp_num][0]

raw_s = cg.gen_one(**cfgs[sp_num][0])
raw_s

CPU times: user 1.9 ms, sys: 191 µs, total: 2.09 ms
Wall time: 2.1 ms


{'spacegroup_num': 167,
 'volume': 94.25287702045583,
 'lattice': [[4.301584035379682, 0.0, 3.458960900453139],
  [1.657136727181531, 3.9695746725385375, 3.458960900453139],
  [0.0, 0.0, 5.519785840437738]],
 'species': ['C', 'C', 'Ca', 'Ca', 'O', 'O', 'O', 'O', 'O', 'O'],
 'wyckoff_letters': ['a', 'a', 'b', 'b', 'd', 'd', 'd', 'd', 'd', 'd'],
 'coords': [[0.25, 0.25, 0.25],
  [0.75, 0.75, 0.75],
  [0.0, 0.0, 0.0],
  [0.5, 0.5, 0.5],
  [0.5, 0.0, 0.0],
  [0.0, 0.5, 0.0],
  [0.0, 0.0, 0.5],
  [0.5, 0.0, 0.5],
  [0.0, 0.5, 0.5],
  [0.5, 0.5, 0.0]]}

The result is a dict contains `species`, `lattice`, `coords` and other information. These information can be used to build the  `pymatgen.Structure` or `ase.Structure` object like following.

In [26]:
# ASE Atoms

from ase import Atoms

atoms = Atoms(
    symbols=raw_s['species'],
    cell=raw_s['lattice'],
    scaled_positions=raw_s['coords'],
    pbc=True,
)
atoms

Atoms(symbols='C2Ca2O6', pbc=True, cell=[[4.301584035379682, 0.0, 3.458960900453139], [1.657136727181531, 3.9695746725385375, 3.458960900453139], [0.0, 0.0, 5.519785840437738]])

In [30]:
from pymatgen.core import Structure

structure = Structure(
    lattice=raw_s['lattice'],
    species=raw_s['species'],
    coords=raw_s['coords'],
)
structure

Structure Summary
Lattice
    abc : 5.519785840437738 5.519785840437738 5.519785840437738
 angles : 51.19677597568162 51.19677597568162 51.19677597568162
 volume : 94.2528770204558
      A : 4.301584035379682 0.0 3.458960900453139
      B : 1.657136727181531 3.9695746725385375 3.458960900453139
      C : 0.0 0.0 5.519785840437738
PeriodicSite: C (1.4897, 0.9924, 3.1094) [0.2500, 0.2500, 0.2500]
PeriodicSite: C (4.4690, 2.9772, 9.3283) [0.7500, 0.7500, 0.7500]
PeriodicSite: Ca (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]
PeriodicSite: Ca (2.9794, 1.9848, 6.2189) [0.5000, 0.5000, 0.5000]
PeriodicSite: O (2.1508, 0.0000, 1.7295) [0.5000, 0.0000, 0.0000]
PeriodicSite: O (0.8286, 1.9848, 1.7295) [0.0000, 0.5000, 0.0000]
PeriodicSite: O (0.0000, 0.0000, 2.7599) [0.0000, 0.0000, 0.5000]
PeriodicSite: O (2.1508, 0.0000, 4.4894) [0.5000, 0.0000, 0.5000]
PeriodicSite: O (0.8286, 1.9848, 4.4894) [0.0000, 0.5000, 0.5000]
PeriodicSite: O (2.9794, 1.9848, 3.4590) [0.5000, 0.5000, 0.0000]

The following is batched generation.

In [30]:
%%time

raw_ss = cg.gen_many(100, *cfgs[sp_num])

print(f"type of raw_ss: {raw_ss.__class__}, size: {len(raw_ss)}")

type of raw_ss: <class 'tuple'>, size: 143
CPU times: user 455 ms, sys: 37.6 ms, total: 493 ms
Wall time: 28 ms


Also, the iterative version

In [31]:
%%time

with tqdm(total=len(cfgs[sp_num])) as pbar:
    for cfg, structures in cg.gen_many_iter(500, *cfgs[sp_num]):
        print(f'configuration: {cfg}, size of structures: {len(structures)}')
        pbar.update()

  0%|          | 0/4 [00:00<?, ?it/s]

configuration: {'C': ['a'], 'Ca': ['b'], 'O': ['e']}, size of structures: 136
configuration: {'C': ['b'], 'Ca': ['a'], 'O': ['e']}, size of structures: 110
configuration: {'C': ['a'], 'Ca': ['b'], 'O': ['d']}, size of structures: 434
configuration: {'C': ['b'], 'Ca': ['a'], 'O': ['d']}, size of structures: 19
CPU times: user 2.14 s, sys: 75.3 ms, total: 2.22 s
Wall time: 139 ms
