# Rush Entities

Rush workflows operate on **entities**, which are **typed objects** that serve as the core data structures in Rush.  
These entities represent key biological and chemical data, ranging from **molecular structures** to **docking results**, and are passed between functions in `rex` expressions.

The table below summarizes the key entities used in **Rush**.

| Entity | Description |
| --- | --- |
|`Smol`| A small molecule (without any 3-D information) |
|`Protein`| A protein representation (without any 3-D information) |
| `Structure` | A molecular structure (either protein or small molecule). |
| `SmolConformer` | TODO:. |
| `ProteinConformer` | A processed protein structure used in docking. |
| `BoundingBox` | A predicted **binding site** on a protein, used for docking. |
| `BindingAffinity` | A processed **affinity result**, used for benchmarking. |

---

All entity contains **internal attributes** that define its structure and behavior. By default, all Rush entities will have the attributes id, metadata, project_id and account_id automatically generated. The rest of the attributes are decribed below as follows.


## Entities' attributes


### **Smol**
A **small molecule** representation, containing **SMILES, InChI, and associated metadata**.

| Attribute | Type | Description |
| --- | --- | --- |
| `smi` | `Option<str>` | The **SMILES str** representation, if available. |
| `inchi` | `Option<str>` | The **InChI str** representation, if available. |
| `data_blocks` | `Option<list<(str, str)>>` | Additional **data blocks** (e.g., molecular properties, annotations). |

#### **Example Usage**
```rex
smiles = smi (load smol_id "Smol")
```

### **Protein**
Represents a **biological protein sequence**, including metadata and external references.

| Attribute | Type | Description |
| --- | --- | --- |
| `id` | `str` | Unique identifier for the protein. |
| `metadata` | `Metadata` | Contains metadata about the protein. |
| `sequence` | `str` | The **amino acid sequence** of the protein. |
| `uniprot_id` | `Option<str>` | _(Optional)_ The **UniProt ID** for the protein, if available. |
| `project_id` | `str` | The **project** associated with this protein. |
| `account_id` | `str` | The **account** that owns this protein. |

#### **Example Usage**
```rex
protein = Protein {
    id = "protein_ABC123",
    metadata = Metadata {
        name = "Example Protein",
        description = "Hypothetical protein from UniProt",
        tags = ["enzyme", "binding site"]
    },
    sequence = "MKTWLLALIFAVFNTLLPVTTIGVSPTAYGNRIT",
    uniprot_id = "P12345",
    project_id = "proj_9876",
    account_id = "account_5432"
}
```

### **Structure**
A **molecular structure** that represents either a **protein** or a **small molecule** in Rush workflows.

| Attribute | Type | Description |
| --- | --- | --- |
| `rcsb_id` | `Option<str>` | The **RCSB PDB ID**, if available. |
| `topology` | `Object<Topology>` | Filepath to a Topology (stored in Rush's S3 object store). A Topology describes the **connectivity and molecular topology** of the structure's set of atoms. |
| `residues` | `Object<Residues>` | Filepath to a Residues. A Residues contains **residue-level information** for the structure. |
| `chains` | `Object<Chains>` | Filepath to a Chains. A Chains contains **chain information** in the structure. |

```rex
structure = load (structure_id protein) "Structure"
```

### **ProteinConformer**
A **protein structure** used in docking calculations.

| Attribute | Type | Description |
| --- | --- | --- |
| `protein_id` | `str` | Reference to the **original protein**. |
| `structure_id` | `str` | Reference to the **Sructure** associated with this conformer. |
| `residues` | `list<int>` | List of residue indices in the protein. |
| `pdb_id` | `Option<str>` | The **PDB ID** if available. |

```rex
protein = load (id (get 0 input)) "ProteinConformer",
```


### **BoundingBox**
A **predicted binding site** on a protein, defined by its **minimum and maximum coordinates** in space.

| Attribute | Type | Description |
| --- | --- | --- |
| `min` | `list<float>` | The **minimum coordinates** `[x, y, z]` defining one corner of the bounding box. |
| `max` | `list<float>` | The **maximum coordinates** `[x, y, z]` defining the opposite corner of the bounding box. |

In **`OpenFF Protein-Ligand Binding Benchmark`**, a bounding box is typically obtained using `p2rank`:

```rex
bounding_box = get 0 (get 0 (p2rank trc))
```


### **BindingAffinity**
Represents a **processed docking affinity result**, including metadata and molecule identifiers.

| Attribute | Type | Description |
| --- | --- | --- |
| `protein_id` | `str` | Identifier for the **protein** involved in docking. |
| `smol_id` | `str` | Identifier for the **small molecule (ligand)** involved in docking. |
| `affinity` | `float` | The **predicted binding affinity** value. |
| `affinity_metric` | `str` | The unit of measurement for binding affinity (e.g., `"kcal/mol"`). |

#### **Example Usage**
```rex
binding_affinity = BindingAffinity {
    affinity = -8.32,
    affinity_metric = "kcal/mol",
    protein_id = protein_123,
    smol_id = smol_id,
    metadata = Metadata {
        name = "Binding affinity for smol:" + smol_id + " and protein " + (protein_id protein),
        description = none,
        tags = []
    }
}
```

### **BenchmarkArg**
Represents an **argument used for benchmarking**, linking an entity type to a specific instance.

| Attribute | Type | Description |
| --- | --- | --- |
| `entity` | `str` | The type of entity being benchmarked (e.g., `"BindingAffinity"`). |
| `id` | `str` | The unique identifier of the specific entity instance. |

#### **Example Usage**
```rex
BenchmarkArg {
    entity = "BindingAffinity",
    id = save binding_affinity
}
```
