# Si-Ge cluster expansion workflow - part 1

This is a CASM project tutorial to generate a phase diagram using a Si-Ge binary alloy cluster expansion fit to DFT calculations. The overall workflow is split into two parts.

Topics covered in part 1:

1. **Project initialization**: Define the primitive crystal structure and allowed atoms on each crystal site
2. **Enumeration**: Enumerate crystal structures which are symmetrically distinct orderings of the atoms allowed by the prim occupation DoF
3. **Calculation**: Calculate the energies of the enumerated structures using DFT
4. **Import and mapping**: Import calculation results, mapping to orderings on the prim
5. **Set reference states**: Choose reference states to define a formation energy for each structure
6. **Query**: Query calculation properties

Topics covered in part 2:

7. Construct a cluster expansion basis
8. Fit cluster expansion coefficients based on the calculated formation energies
9. Perform Monte Carlo calculations using the cluster expansion at a range of chemical potentials and temperatures
10. Perform thermodynamic integration to calculate free energies as a function of composition and temperature
11. Construct a phase diagram



- Project initialization
- Supercell and configuration enumeration
- Parametric composition
- VASP input file setup

In [1]:
import pathlib
import libcasm.xtal as xtal
from casm.project import Project

project_path = pathlib.Path("SiGe")
project_path.mkdir(parents=True, exist_ok=True)

## Project initialization

### Specify the "prim"

A primitive crystal structure and allowed degrees of freedom (the "prim") specifies:

- lattice vectors
- crystal basis sites
- global degrees of freedom
- site degrees of freedom, including allowed occupant species on each basis site.

When combined with a choice of basis function type, order, and truncation, the prim provides all the information needed to generate cluster expansion basis functions.

Here is the prim for the Si-Ge binary alloy project, which we write to a JSON-formatted file named "prim.json":

In [2]:
prim_data = {
    "title" : "SiGe",
    "lattice_vectors" : [
        [ 0.000000000000, 2.800000000000, 2.800000000000 ], # 1st lattice vector 
        [ 2.800000000000, 0.000000000000, 2.800000000000 ], # 2nd lattice vector
        [ 2.800000000000, 2.800000000000, 0.000000000000 ], # 3rd lattice vector
    ],
    "coordinate_mode" : "Fractional",
    "basis" : [
        {
            "coordinate" : [ 0.0, 0.0, 0.0 ],
            "occupant_dof" : [ "Si", "Ge" ],
        },
        {
            "coordinate" : [ 0.25, 0.25, 0.25 ],
            "occupant_dof" : [ "Si", "Ge" ],
        }
    ],
}

with open(project_path / "prim.json", 'w') as f:
    f.write(xtal.pretty_json(prim_data))

For this particular project, the prim contains:

- **lattice_vectors**: A list of crystal lattice vectors. Units are typically Angstrom, but are ultimately determined by the method used to perform calculations. 
- **basis**: A list of crystal basis sites, including coordinate and allowed degrees of freedom. For this ZrO project, the basis sites contain:
  - **coordinate**: The location of the basis site, according to the "coordinate_mode".
  - **occupants**: A list of the possible occupant species that may reside at each site. The names are case sensitive, and “Va” is reserved for vacancies.
- **coordinate_mode**: Defines the units of basis site coordinates. May be one of:
  - "Cartesian": To specify basis coordinates using Cartesian coordinates:
    $$ r_{cart} = (x, y, z) $$
  - "Fractional" or "Direct": To specify basis coordinates defined in terms of the lattice vectors:
    $$ r_{cart} = L r_{frac}, $$
    where:
    - $r_{frac}$ are the coordinates in the fractional representation
    - $r_{cart}$ are the coordinates in the Cartesian representation
    - $L$ is the lattice as a column-vector matrix. 
  
<div class="alert alert-info">
For "lattice_vectors", it is common, but not required, to use the results of a fully relaxed calculation of the structure with the default occupation values. The default occupation on each site is the species listed first in "occupants". For occupation cluster expansions, ideal supercells of the prim lattice are used for the initial state of DFT calculations and are the default reference for strain.
</div>

### Initialize a CASM project

A CASM project is a directory containing data related to a particular prim. The CASM project directory structure standardizes the location of various files used by multiple CASM methods. This makes it easier to perform the most common operations and easier to share a project with others.

A CASM project is initialized by defining a prim and using [Project.init TODO](TODO). This will:

1. Check if the prim has a primitive unit cell with a CASM standard lattice orientation
2. Perform a symmetry analysis
3. Generate default directories
4. Generate default composition axes
5. Generate a default strain reference lattice
6. Perform a configuration check 


Notes:

- Project files that the user should not typically modify directly, including a copy of the prim, are stored in a hidden `.casm` sub-directory of the CASM project directory. The presense or absence of the `.casm` directory is used by CASM to detect a CASM project.


In [3]:
project = Project.init(path=project_path)
print(project.path.resolve())
assert project.chemical_composition.calculator is not None

CASM project already exists at SiGe
Using existing project
/Users/bpuchala/codes/CASM_v2_source/CASMcode_modules/CASMcode_project/notebooks/SiGe_occ/SiGe


coming: 
- Show what happens with non-primitive prim or non-standard lattice?
- Visualize the prim

### Selecting composition axes

In [4]:
# project.chemical_composition_axes.init(
#     xtal_prim=project.prim.xtal_prim,
# )

data = project.chemical_composition_axes.to_dict()
print(xtal.pretty_json(data))
project.chemical_composition_axes.print_table()
print(project.chemical_composition.calculator)
print(project.chemical_composition.converter)

{
  "allowed_occs": [
    ["Si", "Ge"],
    ["Si", "Ge"]
  ],
  "components": ["Ge", "Si"],
  "current_axes": "0",
  "enumerated": ["0", "1"],
  "possible_axes": {
    "0": {
      "a": [2.0, 0.0],
      "components": ["Ge", "Si"],
      "independent_compositions": 1,
      "mol_formula": "Ge(2a)Si(2-2a)",
      "origin": [0.0, 2.0],
      "param_formula": "a(0.5+0.25Ge-0.25Si)"
    },
    "1": {
      "a": [0.0, 2.0],
      "components": ["Ge", "Si"],
      "independent_compositions": 1,
      "mol_formula": "Ge(2-2a)Si(2a)",
      "origin": [2.0, 0.0],
      "param_formula": "a(0.5-0.25Ge+0.25Si)"
    }
  }
}

  KEY  ORIGIN    a      GENERAL FORMULA
-----  --------  -----  -----------------
    0  Si(2)     Ge(2)  Ge(2a)Si(2-2a)
    1  Ge(2)     Si(2)  Ge(2-2a)Si(2a)
{
  "allowed_occs": [
    ["Si", "Ge"],
    ["Si", "Ge"]
  ],
  "components": ["Ge", "Si"]
}
{
  "a": [2.0, 0.0],
  "components": ["Ge", "Si"],
  "independent_compositions": 1,
  "mol_formula": "Ge(2a)Si(2-2a)",
  "origi

## Enumeration

### Introduction

To fit a cluster expansion for the Si-Ge system, we need a set of calculated energies for Si-Ge crystal structures with various orderings to use as training data. To begin, we use CASM to enumerate symmetrically distinct [Supercell]() and [Configuration]():

- A [Supercell]() defines the three-dimensional translations that repeat a crystal structure. 

  -  A supercell can be specified by the integer transformation matrix, $T$, relating the superstructure lattice vectors, $S$, to the unit structure lattice vectors, $L$, according to $S = L T$, where $S$ and $L$ are shape=(3,3) matrices with lattice vectors as columns.
  
  TODO: figure
 
- A [Configuration]() is a compact representation of the unit cell for a crystal structure that is allowed by the DoF specified in the prim. For this Si-Ge project, a configuration can be specified by:
 
  - the supercell that is the unit cell for the crystal structure, and
  - the occupant on each sites in the supercell (i.e. is Si or Ge on each site in the supercell).
  
  TODO: figure

The Supercell object holds symmetry representations that efficiently applying symmetry operations to a configuration, allowing for comparisons and checks to determine if a configuration is symmetrically distinct. The same symmetry representations can be used to transform any configuration in the same supercell, so a Supercell object can be shared by multiple Configuration objects.

### Supercell enumeration

#### Enumerating supercells by volume

The method [enum.supercells_by_volume]() enumerates symmetrically distinct supercells from a minimum to a maximum volume, specified as integer multiples of the prim unit cell volume. It also has additional options, described in the reference documentation, for more complex use cases:

- Enumerate supercells of another supercell
- Enumerate 1d or 2d supercells
- Enumerate supercells with a fixed shape but different sizes

In [5]:
# Enumerate supercells with volume 1 to 4
project.enum.supercells_by_volume(
    max=4, 
    min=1, 
    id="supercells_by_volume.1", 
    verbose=True,
)

-- Begin: Enumerating supercells by volume --

  Generated: SCEL1_1_1_1_0_0_0 (already existed)
  Generated: SCEL2_2_1_1_0_1_1 (already existed)
  Generated: SCEL2_2_1_1_0_0_1 (already existed)
  Generated: SCEL3_3_1_1_0_2_2 (already existed)
  Generated: SCEL3_3_1_1_0_2_1 (already existed)
  Generated: SCEL3_3_1_1_0_0_2 (already existed)
  Generated: SCEL4_4_1_1_0_0_0 (already existed)
  Generated: SCEL4_4_1_1_0_1_0 (already existed)
  Generated: SCEL4_4_1_1_0_0_2 (already existed)
  Generated: SCEL4_4_1_1_0_0_3 (already existed)
  Generated: SCEL4_4_1_1_0_2_1 (already existed)
  Generated: SCEL4_2_2_1_0_1_0 (already existed)
  Generated: SCEL4_2_2_1_1_1_0 (already existed)
  DONE

-- Summary --

  Initial number of supercells: 13
  Final number of supercells: 13
  Enumerated 13 supercells (0 new, 13 existing).

overwrite: SiGe/enumerations/enum.supercells_by_volume.1/meta.json
overwrite: SiGe/enumerations/enum.supercells_by_volume.1/scel_set.json


#### Enumeration Data

The results of an enumeration can be accessed using the [EnumData]() class. The last enumeration is saved in [project.enum.last](). We can put additional information, including a text description of the enumeration, in the [EnumData.meta]() dict and save the updated EnumData using [commit](). Subsequently, if "desc" exists in [EnumData.meta](), it will be printed along with summary information such as the number of supercells.

In [6]:
enum_data = project.enum.last
enum_data.meta = {"desc": "Initial supercell enumeration"}
enum_data.commit()
print(enum_data)

overwrite: SiGe/enumerations/enum.supercells_by_volume.1/meta.json
overwrite: SiGe/enumerations/enum.supercells_by_volume.1/scel_set.json
EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells


Later, the [EnumData]() may also be accessed by id string using the [enum.get]() method.

In [7]:
enum_data = project.enum.get("supercells_by_volume.1")
print(enum_data)

EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells


#### SupercellSet

The supercells enumerated by [enum.supercells_by_volume]() are stored as [SupercellRecord](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.SupercellRecord.html#supercellrecord) in a [SupercellSet](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.SupercellSet.html#supercellset). Each SupercellRecord includes a [Supercell](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.Supercell.html#supercell) and some additional information about the supercell, including a [supercell_name]() string which is used as an identifier.

A SupercellSet:

- does not keep multiple SupercellRecord for supercells that have the same superlattice vectors;
- does allow storing separate SupercellRecord for supercells which are distinct (have different superlattice vectors) but are symmetrically equivalent (superlattice points are mapped by a crystal group operation).

Iterating over a [SupercellSet](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.SupercellSet.html#supercellset) yields [SupercellRecord](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.SupercellRecord.html#supercellrecord). Each SupercellRecord includes a [Supercell](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.Supercell.html#supercell) and some additional information about the Supercell.

In [8]:
# Iterate over the first three SupercellRecord
# in the SupercellSet and print the record
for i, record in enumerate(project.enum.last.supercell_set):
    print(record)
    if i == 2:
        break


{
  "canonical_supercell_name": "SCEL1_1_1_1_0_0_0",
  "is_canonical": true,
  "supercell": {
    "supercell_name": "SCEL1_1_1_1_0_0_0",
    "transformation_matrix_to_supercell": [
      [1, 0, 0],
      [0, 1, 0],
      [0, 0, 1]
    ]
  },
  "supercell_name": "SCEL1_1_1_1_0_0_0"
}
{
  "canonical_supercell_name": "SCEL2_2_1_1_0_1_1",
  "is_canonical": true,
  "supercell": {
    "supercell_name": "SCEL2_2_1_1_0_1_1",
    "transformation_matrix_to_supercell": [
      [1, 0, 1],
      [0, 1, 1],
      [-1, -1, 0]
    ]
  },
  "supercell_name": "SCEL2_2_1_1_0_1_1"
}
{
  "canonical_supercell_name": "SCEL2_2_1_1_0_0_1",
  "is_canonical": true,
  "supercell": {
    "supercell_name": "SCEL2_2_1_1_0_0_1",
    "transformation_matrix_to_supercell": [
      [0, -1, -1],
      [0, 1, -1],
      [1, 0, 1]
    ]
  },
  "supercell_name": "SCEL2_2_1_1_0_0_1"
}


#### Storing multiple enumerations

Enumerations are stored in directories based on their id string. If an id is not given, or has value None, a new enumeration is automatically generated in sequential order. If the id of an existing enumeration is given, that enumeration is updated with any additional supercells generated.


In [9]:
# Enumerate supercells with volume 3 to 5
project.enum.supercells_by_volume(max=5, min=3, id="supercells_by_volume.2")
print()
print(project.enum.last)

-- Begin: Enumerating supercells by volume --

  Generated: SCEL3_3_1_1_0_2_2 (already existed)
  Generated: SCEL3_3_1_1_0_2_1 (already existed)
  Generated: SCEL3_3_1_1_0_0_2 (already existed)
  Generated: SCEL4_4_1_1_0_0_0 (already existed)
  Generated: SCEL4_4_1_1_0_1_0 (already existed)
  Generated: SCEL4_4_1_1_0_0_2 (already existed)
  Generated: SCEL4_4_1_1_0_0_3 (already existed)
  Generated: SCEL4_4_1_1_0_2_1 (already existed)
  Generated: SCEL4_2_2_1_0_1_0 (already existed)
  Generated: SCEL4_2_2_1_1_1_0 (already existed)
  Generated: SCEL5_1_1_5_0_0_0 (already existed)
  Generated: SCEL5_5_1_1_0_4_3 (already existed)
  Generated: SCEL5_5_1_1_0_0_3 (already existed)
  Generated: SCEL5_5_1_1_0_0_4 (already existed)
  Generated: SCEL5_5_1_1_0_1_3 (already existed)
  DONE

-- Summary --

  Initial number of supercells: 15
  Final number of supercells: 15
  Enumerated 15 supercells (0 new, 15 existing).

overwrite: SiGe/enumerations/enum.supercells_by_volume.2/scel_set.json

EnumD

### Configuration enumeration

#### Enumerating configurations by supercell

The method [enum.occ_by_supercell]() enumerates all occupations in supercells ranging from a minimum to a maximum volume. All configurations are guaranteed to be in a canonical supercell. By default it:

- only outputs primitive configurations,
- only outputs configurations in canonical form (the configuration that compares greatest to all configurations in a supercell that can be mapped by symmetry operations).

With these defaults, if enumeration proceeds without skipping supercells, all symmetrically distinct configurations will be enumerated.

As with enum.supercells_by_volume it also has additional options, described in the reference documentation, for more complex use cases:

- Enumerate occupations in supercells of another supercell
- Enumerate occupations in 1d or 2d supercells
- Enumerate occupations in supercells with a fixed shape but different sizes

**Warning**: The number of possible occupations in a $n$-component alloy with $m$ sites is $n^m$. Take care not to request too large of an enumeration. 
    

In [10]:
# Enumerate configurations in supercells with volume 1 to 4
project.enum.occ_by_supercell(
    max=4, 
    min=1, 
    id="occ_by_supercell.1",
)

-- Begin: Enumerating occupations by supercell --

Enumerate configurations for: SCEL1_1_1_1_0_0_0
3 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL2_2_1_1_0_1_1
4 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL2_2_1_1_0_0_1
3 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL3_3_1_1_0_2_2
13 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL3_3_1_1_0_2_1
10 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL3_3_1_1_0_0_2
10 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL4_4_1_1_0_0_0
36 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL4_4_1_1_0_1_0
27 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL4_4_1_1_0_0_2
36 configurations (0 new, 0 excluded by filter)

Enumerate configurations for: SCEL4_4_1_1_0_0_3
24 configurations (0 new, 0 exc

#### ConfigurationSet

The configurations enumerated by [enum.occ_by_supercell]() are stored as [ConfigurationRecord](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.ConfigurationRecord.html#configurationrecord) in a [ConfigurationSet](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.ConfigurationSet.html#configurationset). Each ConfigurationRecord includes a [Configuration](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.Configuration.html#configuration) and some additional information about the configuration, including a [configuration_name]() string which is used as an identifier.

A ConfigurationSet:

- requires configuration be in a canonical supercell;
- does not keep multiple ConfigurationRecord for configurations that have the same DoF values;
- does allow storing separate ConfigurationRecord for configurations which are distinct (have different DoF values) but are symmetrically equivalent (DoF values are mapped by a symmetry operation).
- users are responsible for placing any other constraints (canonical configurations only, primitive configurations only, etc.) on which configuration are added to ConfigurationSet.

**Warning**: ConfigurationSet is optimized for keeping unique configurations in canonical supercells. Users must ensure that configuration added to ConfigurationSet are in a canonical supercell. This is not checked by ConfigurationSet but required to ensure proper configuration naming, serialization, and deserialization. Configurations that are not in a canonical supercell should be stored in a list or some other data structure.

Iterating over a [ConfigurationSet](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.ConfigurationSet.html#configurationset) yields [ConfigurationRecord](https://prisms-center.github.io/CASMcode_pydocs/libcasm/configuration/2.0/reference/libcasm/_autosummary/libcasm.configuration.ConfigurationRecord.html#configurationrecord).

In [11]:
# Iterate over the first three ConfigurationRecord
# in the ConfigurationSet and print the record
for i, record in enumerate(project.enum.last.configuration_set):
    print(record)
    if i == 2:
        break


{
  "configuration": {
    "basis": "standard",
    "dof": {
      "occ": [0, 0]
    },
    "supercell_name": "SCEL1_1_1_1_0_0_0",
    "transformation_matrix_to_supercell": [
      [1, 0, 0],
      [0, 1, 0],
      [0, 0, 1]
    ]
  },
  "configuration_id": "0",
  "configuration_name": "SCEL1_1_1_1_0_0_0/0",
  "supercell_name": "SCEL1_1_1_1_0_0_0"
}
{
  "configuration": {
    "basis": "standard",
    "dof": {
      "occ": [1, 0]
    },
    "supercell_name": "SCEL1_1_1_1_0_0_0",
    "transformation_matrix_to_supercell": [
      [1, 0, 0],
      [0, 1, 0],
      [0, 0, 1]
    ]
  },
  "configuration_id": "1",
  "configuration_name": "SCEL1_1_1_1_0_0_0/1",
  "supercell_name": "SCEL1_1_1_1_0_0_0"
}
{
  "configuration": {
    "basis": "standard",
    "dof": {
      "occ": [1, 1]
    },
    "supercell_name": "SCEL1_1_1_1_0_0_0",
    "transformation_matrix_to_supercell": [
      [1, 0, 0],
      [0, 1, 0],
      [0, 0, 1]
    ]
  },
  "configuration_id": "2",
  "configuration_name": "SCEL1_1_

#### Conversion to structure

The [Configuration.to_structure]() methods convert a CASM [Configuration]() to a CASM [Structure](). A Structure:

- represents a crystal structure with a 3d lattice,
- is not restricted to the DoF values allowed by a prim,
- has built in methods for conversions to and from VASP POSCAR format.

In [12]:
# Iterate over the first three ConfigurationRecord
# in the ConfigurationSet and print the record
for i, record in enumerate(project.enum.last.configuration_set):
    name = record.configuration_name
    structure = record.configuration.to_structure()
    poscar_str = structure.to_poscar_str(title=name)
    
    print("~~~")
    print(f"Configuration: {name}")
    print(f"Structure: {structure}")
    print(f"POSCAR:\n{poscar_str}", end="")
    if i == 2:
        break

~~~
Configuration: SCEL1_1_1_1_0_0_0/0
Structure: {
  "atom_coords": [
    [0.0, 0.0, 0.0],
    [0.25000000000000006, 0.25, 0.25000000000000006]
  ],
  "atom_type": ["Si", "Si"],
  "coordinate_mode": "Fractional",
  "lattice_vectors": [
    [0.0, 2.8, 2.8],
    [2.8, 0.0, 2.8],
    [2.8, 2.8, 0.0]
  ]
}
POSCAR:
SCEL1_1_1_1_0_0_0/0
1.00000000
0.00000000 2.80000000 2.80000000
2.80000000 0.00000000 2.80000000
2.80000000 2.80000000 0.00000000
Si 
2 
Direct
0.00000000 0.00000000 0.00000000 Si
0.25000000 0.25000000 0.25000000 Si

~~~
Configuration: SCEL1_1_1_1_0_0_0/1
Structure: {
  "atom_coords": [
    [0.0, 0.0, 0.0],
    [0.25000000000000006, 0.25, 0.25000000000000006]
  ],
  "atom_type": ["Ge", "Si"],
  "coordinate_mode": "Fractional",
  "lattice_vectors": [
    [0.0, 2.8, 2.8],
    [2.8, 0.0, 2.8],
    [2.8, 2.8, 0.0]
  ]
}
POSCAR:
SCEL1_1_1_1_0_0_0/1
1.00000000
0.00000000 2.80000000 2.80000000
2.80000000 0.00000000 2.80000000
2.80000000 2.80000000 0.00000000
Ge Si 
1 1 
Direct
0.000000

#### Filtered enumeration

A custom filter function may be used to filter configurations during enumeration. Here we:

- use [ConfigCompositionCalculator]() to calculate the number of each type of atom in the supercell,
- keep configurations that have exactly 2 Ge,
- use ``dry_run=True``so the enumeration is not committed automatically.

In [13]:
# Enumerate configurations:
# - in supercells with volume 1 to 3
# - with exactly 2 Ge atoms in the supercell

from libcasm.configuration import (
    Configuration,
    SupercellRecord,
)
from casm.project import EnumData

# Get the casm.project.ConfigCompositionCalculator
comp = project.chemical_composition

# Get the index of Ge in the composition arrays
i_Ge = comp.components.index("Ge")

# Print each check?
verbose_checks = True


def filter_f(config: Configuration, enum_data: EnumData) -> bool:
    """Return True to include; False to exclude"""
    
    # Get number of Ge in the supercell
    N_Ge = comp.per_supercell(config)[i_Ge]
    
    # Print info about the config being checked
    if verbose_checks:
        record = SupercellRecord(config.supercell)
        print(
            f"~check~ {record.supercell_name}",
            config.occupation,
            comp.param_composition(config),
            f"include?: {N_Ge == 2}",
        )
    return N_Ge == 2

project.enum.occ_by_supercell(
    max=3, 
    min=1, 
    filter_f=filter_f,
    verbose=True,
    dry_run=True,
)

-- Begin: Enumerating occupations by supercell --

Enumerate configurations for: SCEL1_1_1_1_0_0_0
~check~ SCEL1_1_1_1_0_0_0 [0 0] [0.] include?: False
~check~ SCEL1_1_1_1_0_0_0 [1 0] [0.5] include?: False
~check~ SCEL1_1_1_1_0_0_0 [1 1] [1.] include?: True
3 configurations (1 new, 2 excluded by filter)

Enumerate configurations for: SCEL2_2_1_1_0_1_1
~check~ SCEL2_2_1_1_0_1_1 [1 0 0 0] [0.25] include?: False
~check~ SCEL2_2_1_1_0_1_1 [1 0 1 0] [0.5] include?: True
~check~ SCEL2_2_1_1_0_1_1 [1 1 1 0] [0.75] include?: False
~check~ SCEL2_2_1_1_0_1_1 [1 0 0 1] [0.5] include?: True
4 configurations (2 new, 2 excluded by filter)

Enumerate configurations for: SCEL2_2_1_1_0_0_1
~check~ SCEL2_2_1_1_0_0_1 [1 0 0 0] [0.25] include?: False
~check~ SCEL2_2_1_1_0_0_1 [1 0 1 0] [0.5] include?: True
~check~ SCEL2_2_1_1_0_0_1 [1 1 1 0] [0.75] include?: False
3 configurations (1 new, 2 excluded by filter)

Enumerate configurations for: SCEL3_3_1_1_0_2_2
~check~ SCEL3_3_1_1_0_2_2 [1 0 0 0 0 0] [0.1666

### Acting on enumeration data

This section provides a reference for various actions that can be performed on enumerations, and may be skipped for the Si-Ge demonstration project.

#### Get an enumeration by id

- Also, update enumeration metadata and commit.

In [14]:
enum_data = project.enum.get("supercells_by_volume.1")
enum_data.meta = {"desc": "Initial supercell enumeration"}
enum_data.commit()
print(enum_data)

overwrite: SiGe/enumerations/enum.supercells_by_volume.1/meta.json
overwrite: SiGe/enumerations/enum.supercells_by_volume.1/scel_set.json
EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells


#### List all enumerations

- Print a summary of each enumeration in the project

In [15]:
project.enum.list()

EnumData:
- id: occ_by_supercell.1
- supercell_set: 13 supercells
- configuration_set: 214 configurations
EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells
EnumData:
- id: supercells_by_volume.2
- supercell_set: 15 supercells


#### Copy an enumeration

- Will raise if the destination enumeration already exists

In [16]:
project.enum.copy(
    src_id="supercells_by_volume.2", 
    dest_id="supercells_by_volume.3",
)
project.enum.list()

write: SiGe/enumerations/enum.supercells_by_volume.3/scel_set.json
EnumData:
- id: occ_by_supercell.1
- supercell_set: 13 supercells
- configuration_set: 214 configurations
EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells
EnumData:
- id: supercells_by_volume.2
- supercell_set: 15 supercells
EnumData:
- id: supercells_by_volume.3
- supercell_set: 15 supercells


#### Merge enumerations

- Supercells and configurations in source enumeration sets are inserted into the destination enumeration sets.
- Supercells and configurations in source enumeration lists are appended to the destination enumeration lists if they are not already present.

In [17]:
project.enum.merge(
    src_id="supercells_by_volume.1",
    dest_id="supercells_by_volume.3",
)
project.enum.list()

overwrite: SiGe/enumerations/enum.supercells_by_volume.3/scel_set.json
EnumData:
- id: occ_by_supercell.1
- supercell_set: 13 supercells
- configuration_set: 214 configurations
EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells
EnumData:
- id: supercells_by_volume.2
- supercell_set: 15 supercells
EnumData:
- id: supercells_by_volume.3
- supercell_set: 18 supercells


#### Remove an enumeration

- Will raise if the enumeration does not exist

In [18]:
project.enum.remove("supercells_by_volume.3")
project.enum.list()

EnumData:
- id: occ_by_supercell.1
- supercell_set: 13 supercells
- configuration_set: 214 configurations
EnumData:
- id: supercells_by_volume.1
- desc: "Initial supercell enumeration"
- supercell_set: 13 supercells
EnumData:
- id: supercells_by_volume.2
- supercell_set: 15 supercells
