### Remove all heteroatoms using Biopython (basic)

#### We can know if a residue is a heteroatom with the following code.
- First, create a PDBParser object.
- Then create a structure object from a PDB file
- Create a new structure to store only the non-hetero atoms

In [1]:
from Bio.PDB import *

parser = PDBParser(QUIET=True)

structure = parser.get_structure("6m0j", "6m0j.pdb")

# Create a new structure to store only the non-hetero atoms
filtered_structure = structure.copy() 

1. Iterate through the structure and add the non-hetero (the remaining ones) into a list. When accessing the resdidues, it provides a tuple where ("H_FLAG",res_nº,"insertion_code"). (The insertion code is usually empty)

    - If the H_FLAG is an empty string, then the residue is part of the main chain.
    - If the H_FLAG is non-empty string, the residue is a heteroatom


2. Remove all heteroatoms from the structure

3. Save the result (filtered structure)

We want to remove all the non-heteroatoms residues from the list of residues, add them in a list. Afterwards we have to remove the  atom itslef , since we only "selected" the parent residues of the atoms.

*Residues contain multiple atoms 

Here's an explanation of what each part does:

- atom : is the specific atom you want to remove from its parent residue. 
- atom.parent: retrieves the parent of the atom, which is the residue to which the atom belongs. 
- atom.detach_parent() removes the atom from its parent residue. 

https://biopython.org/docs/1.76/api/Bio.PDB.Entity.html


In [5]:
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.id[0] == "":
                filtered_structure.add(residue)


It technically does the same thing, since the code above returns some ERRORS

In [3]:
# Iterate through the residues and collect non-hetero residues

non_hetero_residues = []

for residue in filtered_structure.get_residues():
    R = residue.id
    if R[0] == "":
        non_hetero_residues.append(residue)    # Remove all heteroatoms from the filtered structure
for residue in non_hetero_residues:
    for atom in residue.get_atoms():
        atom.detach_parent()

To save the filtered structure using PDBIO, a tool needed to write structures in PDB format

In [6]:
# Save the filtered structure using PDBIO
io = PDBIO()
io.set_structure(filtered_structure)
io.save("6m0j_no_hetero.pdb")

### Analyse the result

After running the code, if we take a look at the resulting pdb file, we can see it is not entirely prepared for our energy analysis. If we compare it with the fixed_pdb of JLGELPI there are some differences:

- there are still some residues left on the exterior of the protein comples, mostly HOH (waters) but also NAG residues

- theres a ZN molecule in almost in the middle of the complex