# Prepare Protein

The `prepare_protein` module corrects protein structures by inserting missing residues and hydrogens.

1. **Fills in Missing Residues**  
   The module reconstructs missing segments of the protein backbone and sidechains, ensuring structural completeness.
   
2. **Adds Hydrogen Atoms**  
   Protonation states are assigned based on physiological conditions or user-defined pH values, ensuring a chemically valid structure.

Underneath the hood, `prepare_protein` relies on two widely used tools:

1. **[PDBFixer](https://github.com/openmm/pdbfixer)**  
   PDBFixer is a tool that identifies and corrects issues in PDB files, such as missing atoms and residues, while preserving the overall structure.

2. **[PDB2PQR](https://pdb2pqr.readthedocs.io/en/latest/)**  
   PDB2PQR is used for assigning protonation states and optimizing hydrogen bonding networks based on the specified pH conditions.

## Module Specification

The module takes an `dict` of module-specific options and a list of `TRC` tuples representing proteins, prepares those proteins, and outputs them as a new list of `TRC` tuples.

### Inputs

* options: `dict`
* protein_trcs: `[TRC]`

### Outputs

* prepped_protein_trcs: `[TRC]`

### Options

The options `dict` has the following fields:

| Name | Type | Default | Description |
|------|------|---------|-------------|
| ph | `Option<float>` | `7.0` | The pH to perform the hydrogen atom assignment at. |
| naming_scheme | `Option<str>` | `"Amber"` | What naming scheme to use for the amino acids. Stick with `none` for standardized names, or use "Amber" or "Charmm" if the outputs will be used for MD simulation using that force field. |
| truncation_threshold | `Option<uint>` | `2` | Adding long sequences of amino acids to the end of chains isn't an accurate procedure. If this many or more are missing from the end of a chain, don't add amino acids to that end. |
| capping_style | `Option<str>` | `"Truncated"` | Whether to add caps at the ends of the protein chains. Generally good for MD, but if the chain is totally complete, may not be necessary. One of: "Never", "Truncated", "Always". |

## Function usage
```haskell
let
    options = {
        ph = some 7.4,
        naming_scheme = some "Amber",
    },
    prepare_protein = \\protein_conformer_trc -> 
        map to_data (get 0 (
            prepare_protein_rex_s default_runspec options [protein_conformer_trc]
        ))
in
    \unprepped_trc -> prepare_protein unprepped_trc {- outputs the prepped TRC -}
```