# Protein Folding with Pyrosetta

In this tutorial, we will fold a protein structure using Rosetta, and compare the folded structure with the solved crystal structure of the protein.

### Importing relevant libraries

We begin by importing the relevant libraries from Python. If running the following cell produces any errors or warnings, make sure you have followed all the steps in the "Setting up Pyrosetta" section.

In [1]:
import os
import glob
import shutil
import pandas as pd
import nglview as ngl
import pyrosetta as prs
prs.init()
from pyrosetta import rosetta

core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.Release.python36.mac r213 2019.10+release.fd1bdffb01b fd1bdffb01b7866da84942b9bf1b06e96270656e http://www.pyrosetta.org 2019-03-05T15:28:05
core.init: command: PyRosetta -ex1 -ex2aro -database /anaconda3/envs/pyrosetta/lib/python3.6/site-packages/pyrosetta-2019.10+release.fd1bdffb01b-py3.6-macosx-10.7-x86_64.egg/pyrosetta/database
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=1563846545 seed_offset=0 real_seed=1563846545
core.init.random: RandomGenerator:init: Normal mode, seed=1563846545 RG_type=mt19937


### Setting up score functions that will be used across parts

In [2]:
scorefxn_low = prs.create_score_function('score3')
scorefxn_high = prs.get_fa_scorefxn()

basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/env_log.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cbeta_den.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/pair_log.txt
basic.io.database: Database file opened: scoring/score_functions/EnvPairPotential/cenpack_log.txt
basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.HS.resmooth
basic.io.database: Database file opened: scoring/score_functions/SecondaryStructurePotential/phi.theta.36.SS.resmooth
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
core.scoring.etable: F

### Loading the native (solved crystal) structure

In [3]:
native_pose = prs.pose_from_pdb('test_in.pdb')

core.import_pose.import_pose: File 'test_in.pdb' automatically determined to be of type PDB


In [4]:
native_pose.sequence()

'DAITIHSILDWIEDNLESPLSLEKVSERSGYSKWHLQRMFKKETGHSLGQYIRSRKMTEIAQKLKESNEPILYLAERYGFESQQTLTRTFKNYFDVPPHKYRMTNMQGESRFLHPL'

In [4]:
view = ngl.show_rosetta(native_pose)
view.update_cartoon(color='sstruct', component=0)

In [5]:
view

NGLWidget()

In [6]:
print(native_pose.residue(1))

Residue 1: ASP:NtermProteinFull (ASP, D):
Base: ASP
 Properties: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS SC_ORBITALS POLAR CHARGED NEGATIVE_CHARGE METALBINDING ALPHA_AA L_AA
 Variant types: LOWER_TERMINUS_VARIANT
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O   1H   2H   3H    HA 
 Side-chain atoms:  CB   CG   OD1  OD2 1HB  2HB 
Atom Coordinates:
   N  : 0.229, 36.012, 74.172
   CA : 0.041, 35.606, 75.594
   C  : -0.096, 36.849, 76.498
   O  : -0.951, 36.895, 77.382
   CB : 1.225, 34.718, 76.092
   CG : 2.159, 34.156, 74.999
   OD1: 1.688, 33.361, 74.151
   OD2: 3.378, 34.497, 75.007
  1H  : 1.056, 35.74, 73.68
  2H  : -0.43, 35.723, 73.478
  3H  : 0.251, 36.981, 73.928
   HA : -0.884, 35.037, 75.696
  1HB : 1.839, 35.199, 76.854
  2HB : 0.67, 33.892, 76.539
Mirrored relative to coordinates in ResidueType: FALSE



In [7]:
pose = prs.pose_from_sequence(native_pose.sequence())
test_pose = prs.Pose()
test_pose.assign(pose)
test_pose.pdb_info().name('Linearized Pose')

In [8]:
view = ngl.show_rosetta(test_pose)
view.add_ball_and_stick()
view

NGLWidget()

In [9]:
to_centroid = prs.SwitchResidueTypeSetMover('centroid')
to_full_atom = prs.SwitchResidueTypeSetMover('fa_standard')

In [10]:
to_full_atom.apply(test_pose)
print('Full Atom Score:', scorefxn_high(test_pose))
to_centroid.apply(test_pose)
print('Centroid Score:', scorefxn_low(test_pose))

basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /anaconda3/envs/pyrosetta/lib/python3.6/site-packages/pyrosetta-2019.10+release.fd1bdffb01b-py3.6-macosx-10.7-x86_64.egg/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/anaconda3/envs/pyrosetta/lib/python3.6/site-packages/pyrosetta-2019.10+release.fd1bdffb01b-py3.6-macosx-10.7-x86_64.egg/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.255034 seconds to load from binary
Full Atom Score: 46617.13925908482
cor

In [11]:
view = ngl.show_rosetta(test_pose)
view.add_ball_and_stick()
view

NGLWidget()

In [12]:
print(test_pose.residue(1))

Residue 1: ASP:NtermProteinFull (ASP, D):
Base: ASP
 Properties: POLYMER PROTEIN CANONICAL_AA LOWER_TERMINUS POLAR CHARGED NEGATIVE_CHARGE ALPHA_AA L_AA
 Variant types: LOWER_TERMINUS_VARIANT
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H  
 Side-chain atoms:  CB   CEN
Atom Coordinates:
   N  : 0, 0, 0
   CA : 1.458, 0, 0
   C  : 2.00885, 1.42017, 0
   O  : 1.25096, 2.39022, -2.58987e-16
   CB : 1.99452, -0.771871, -1.208
   CEN: 2.35051, -1.69379, -1.45468
   H  : -0.5, -0.433013, -0.75
Mirrored relative to coordinates in ResidueType: FALSE



In [13]:
long_frag_filename = 'test9_fragments.txt'
long_frag_length = 9
short_frag_filename = 'test3_fragments.txt'
short_frag_length = 3

long_inserts=5
short_inserts=5

kT = 3.0
cycles = 1000
jobs = 50
job_output = 'fold_output/structure'

In [14]:
movemap = prs.MoveMap()
movemap.set_bb(True)

fragset_long = rosetta.core.fragment.ConstantLengthFragSet(long_frag_length, long_frag_filename)
long_frag_mover = rosetta.protocols.simple_moves.ClassicFragmentMover(fragset_long, movemap)

fragset_short = rosetta.core.fragment.ConstantLengthFragSet(short_frag_length, short_frag_filename)
short_frag_mover = rosetta.protocols.simple_moves.ClassicFragmentMover(fragset_short, movemap)

insert_long_frag = prs.RepeatMover(long_frag_mover, long_inserts)
insert_short_frag = prs.RepeatMover(short_frag_mover, short_inserts)

core.fragments.ConstantLengthFragSet: finished reading top 200 9mer fragments from file test9_fragments.txt
core.fragments.ConstantLengthFragSet: finished reading top 200 3mer fragments from file test3_fragments.txt


In [15]:
folding_mover = prs.SequenceMover()
folding_mover.add_mover(insert_long_frag)
folding_mover.add_mover(insert_short_frag)

In [16]:
test_pose.assign(pose)
to_centroid.apply(test_pose)

In [17]:
mc = prs.MonteCarlo(test_pose, scorefxn_low, kT)
trial = prs.TrialMover(folding_mover, mc)

In [18]:
folding = prs.RepeatMover(trial, cycles)

In [19]:
scores = [0] * (jobs + 1)
scores[0] = scorefxn_low(test_pose)

In [20]:
if os.path.isdir(os.path.dirname(job_output)):
    shutil.rmtree(os.path.dirname(job_output), ignore_errors=True)
os.makedirs(os.path.dirname(job_output))
jd = prs.PyJobDistributor(job_output, nstruct=jobs, scorefxn=scorefxn_high)

Working on decoy: fold_output/structure_22.pdb


In [21]:
counter = 0 
while not jd.job_complete:
    # a. set necessary variables for the new trajectory
    # -reload the starting pose
    test_pose.assign(pose)
    to_centroid.apply(test_pose)
    # -change the pose's PDBInfo.name, for the PyMOL_Observer
    counter += 1
    test_pose.pdb_info().name(job_output + '_' + str(counter))
    # -reset the MonteCarlo object (sets lowest_score to that of test_pose)
    mc.reset(test_pose)

    #### if you create a custom protocol, you may have additional
    ####    variables to reset, such as kT

    #### if you create a custom protocol, this section will most likely
    ####    change, many protocols exist as single Movers or can be
    ####    chained together in a sequence (see above) so you need
    ####    only apply the final Mover
    # b. apply the refinement protocol
    folding.apply(test_pose)

    ####
    # c. export the lowest scoring decoy structure for this trajectory
    # -recover the lowest scoring decoy structure
    mc.recover_low(test_pose)
    # -store the final score for this trajectory
    # -convert the decoy to fullatom
    # the sidechain conformations will all be default,
    #    normally, the decoys would NOT be converted to fullatom before
    #    writing them to PDB (since a large number of trajectories would
    #    be considered and their fullatom score are unnecessary)
    # here the fullatom mode is reproduced to make the output easier to
    #    understand and manipulate, PyRosetta can load in PDB files of
    #    centroid structures, however you must convert to fullatom for
    #    nearly any other application
    to_full_atom.apply(test_pose)
    scores[counter] = scorefxn_high(test_pose)
    # -output the fullatom decoy structure into a PDB file
    jd.output_decoy(test_pose)
    # -export the final structure to PyMOL
    test_pose.pdb_info().name(job_output + '_' + str(counter) + '_fa')

Working on decoy: fold_output/structure_23.pdb
Working on decoy: fold_output/structure_25.pdb
Working on decoy: fold_output/structure_15.pdb
Working on decoy: fold_output/structure_20.pdb
Working on decoy: fold_output/structure_36.pdb
Working on decoy: fold_output/structure_10.pdb
Working on decoy: fold_output/structure_44.pdb
Working on decoy: fold_output/structure_11.pdb
Working on decoy: fold_output/structure_38.pdb
Working on decoy: fold_output/structure_35.pdb
Working on decoy: fold_output/structure_41.pdb
Working on decoy: fold_output/structure_46.pdb
Working on decoy: fold_output/structure_5.pdb
Working on decoy: fold_output/structure_12.pdb
Working on decoy: fold_output/structure_39.pdb
Working on decoy: fold_output/structure_26.pdb
Working on decoy: fold_output/structure_14.pdb
Working on decoy: fold_output/structure_17.pdb
Working on decoy: fold_output/structure_7.pdb
Working on decoy: fold_output/structure_45.pdb
Working on decoy: fold_output/structure_4.pdb
Working on decoy

In [22]:
decoy_poses = [prs.pose_from_pdb(f) for f in glob.glob(job_output + '*.pdb')]

core.import_pose.import_pose: File 'fold_output/structure_4.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_5.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_7.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_6.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_2.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_3.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_1.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_48.pdb' automatically determined to be of type PDB
core.import_pose.import_pose: File 'fold_output/structure_49.pdb' automatically determined to be of type PDB
core.import_pose.import_po

In [23]:
def align_and_get_rmsds(native_pose, decoy_poses):
    prs.rosetta.core.pose.full_model_info.make_sure_full_model_info_is_setup(native_pose)
    rmsds = []
    for p in decoy_poses:
        prs.rosetta.core.pose.full_model_info.make_sure_full_model_info_is_setup(p)
        rmsds += [prs.rosetta.protocols.stepwise.modeler.align.superimpose_with_stepwise_aligner(native_pose, p)]
    return rmsds

In [24]:
rmsds = align_and_get_rmsds(native_pose, decoy_poses)

protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 15.2121494)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 17.0218803)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 17.0667456)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 13.3502234)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 11.6867353)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 11.0417020)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 atoms in ), superimposed on 349 atoms in 1-116 (RMSD 16.1847525)
protocols.stepwise.modeler.align.StepWisePoseAligner: RMSD 0.000 (0 a

In [25]:
rmsd_data = []
for i in range(1, len(decoy_poses)):  # print out the job scores
    rmsd_data.append({'structure': decoy_poses[i].pdb_info().name(), 'rmsd': rmsds[i]})

In [26]:
rmsd_df = pd.DataFrame(rmsd_data)

In [None]:
rmsd_df.sort_values('rmsd')