# Prepare a protein system from scratch

- BAT.py requires a protein system in PDB format. The atom names should be generic or match the names in the AMBER force field; PDB files from e.g. a snapshot from a CHARMM simulation, the output PDB from `dabble` when `charmm` forcefield is used, will not work.

We need two PDB files for preparing a protein system.

1. protein_input.pdb: A PDB of protein that is exported from Maestro; it means the protonation states of the protein are assigned (I think tleap can recongnize e.g ASP and ASH, maybe?). Water and ligand can be present in the protein_input.pdb but ligand will be removed during preparation.

2. system_input.pdb: A prepared simulation system with dabble. The ligand does not need to be present in the system_input.pdb.

For ligands that will be used in the simulation, we need their PDB files as well. The ligand should be in the docking poses for `system_input.pdb`

To get the anchor atoms for the protein, prepare a PDB with ligand docked into the protein `prot_lig_input.pdb`; it should share the same resid as the protein in `protein_input.pdb`. The ligand should be in the docking pose.

In [17]:
import MDAnalysis as mda
import numpy as np

In [18]:
protein_input = 'protein_input.pdb'
system_input = 'system_input.pdb'
prot_lig_input = 'prot_lig_input.pdb'

In [19]:
u_prot = mda.Universe(protein_input)
u_sys = mda.Universe(system_input)
u_prot_lig = mda.Universe(prot_lig_input)



## Clean up protein_input.pdb

- convert OW to O for all waters
- anything else?

In [11]:
water = u_prot.select_atoms('resname HOH')
print(f'Number of water molecules: {water.n_residues}')
print(f'Water atom names: {water.residues[0].atoms.names}')

Number of water molecules: 68
Water atom names: ['OW' 'H1' 'H2']


In [12]:
# set OW to O
# Otherwise tleap cannot recognize the water molecules
water.select_atoms('name OW').names = 'O'

In [14]:
# save as *_docked.pdb that matched `input-dd-amber.in`
u_prot.atoms.write('MOR_docked.pdb')

## Generate reference structure

In [15]:
protein_ref = u_sys.select_atoms('protein')

In [16]:
protein_ref.write('../build_files/reference.pdb')



## Get protein and ligand anchors
Follow the guideline in section 7 of https://github.com/GHeinzelmann/BAT.py/blob/master/doc/User-guide.pdf

Visualize and select anchor atoms with VMD.

Save the final l1_x,y,z values inside `input-dd-amber.in`

In [21]:
P1_atom = u_prot_lig.select_atoms('name CA and resid 149')
P2_atom = u_prot_lig.select_atoms('name CA and resid 119')
P3_atom = u_prot_lig.select_atoms('name CA and resid 328')
if P1_atom.n_atoms != 1 or P2_atom.n_atoms != 1 or P3_atom.n_atoms != 1:
    raise ValueError('Error: more than one atom selected')

In [22]:
potential_lig_l1 = u_prot_lig.select_atoms('resname MP and name C12')
if potential_lig_l1.n_atoms != 1:
    raise ValueError('There should be exactly one atom named C12 in the ligand')

In [27]:
# get ll_x,y,z distances

r_vect = potential_lig_l1.positions - P1_atom.positions
print(f'l1_x: {r_vect[0][0]:.2f}')
print(f'l1_y: {r_vect[0][1]:.2f}')
print(f'l1_z: {r_vect[0][2]:.2f}')

l1_x: 2.08
l1_y: -6.83
l1_z: 3.94
