# Prepare a protein system from scratch

- BAT.py requires a protein system in PDB format. The atom names should be generic or match the names in the AMBER force field; PDB files from e.g. a snapshot from a CHARMM simulation, the output PDB from `dabble` when `charmm` forcefield is used, will not work.

We need two PDB files for preparing a protein system.

1. protein_input.pdb: A PDB of protein that is exported from Maestro; it means the protonation states of the protein are assigned (I think tleap can recongnize e.g ASP and ASH, maybe?).
2. system_input.pdb: A prepared simulation system with dabble. The ligand does not need to be present in the system_input.pdb.

For ligands that will be used in the simulation, we need their PDB files as well. The ligand should be in the docking poses for `system_input.pdb`

In [14]:
import MDAnalysis as mda
import numpy as np

In [2]:
protein_input = 'protein_input.pdb'
system_input = 'system_input.pdb'

In [None]:
u_prot = mda.Universe(protein_input)
u_sys = mda.Universe(system_input)

## Clean up PDB

- convert OW to O for all waters

In [24]:
u = mda.Universe('./MOR.pdb')

water = u.select_atoms('resname HOH')

In [25]:
water.select_atoms('name OW').names = 'O'

In [26]:
u.atoms.write('MOR_docked.pdb')

## Get protein and ligand anchors

In [3]:
u = mda.Universe('MOR_lig.pdb')



In [8]:
P1_atom = u.select_atoms('name CA and resid 149')
P2_atom = u.select_atoms('name CA and resid 119')
P3_atom = u.select_atoms('name CA and resid 328')
if P1_atom.n_atoms != 1 or P2_atom.n_atoms != 1 or P3_atom.n_atoms != 1:
    raise ValueError('Error: more than one atom selected')

In [9]:
potential_lig_l1 = u.select_atoms('resname MP and name C12')
if potential_lig_l1.n_atoms != 1:
    raise ValueError('There should be exactly one atom named C12 in the ligand')

In [12]:
# get ll_x,y,z distances

r_vect = potential_lig_l1.positions - P1_atom.positions

In [13]:
r_vect

array([[ 2.077    , -6.828    ,  3.9369998]], dtype=float32)