## About
This notebook creates a baseline to compare RMSDs against. It generates a random, non-clashing RNA pose by moving residues randomly (monte-carlo moves, but no scoring used), just ensuring that the energy is negative.<br/>
At this point, it starts randomly making moves and creating many snapshots of random poses. It then aligns them to the reference pose and scores them for their RMSD.<br/>
This is a good baseline in the sense of comparing your answer to a random guess.

**Note:** Some parts need to be changed according to your system. They will be labelled with a *'TODO: Change accordingly'* tag

In [1]:
import MDAnalysis as mda
from MDAnalysis.analysis import align
import os
%run ../PyRosetta//General.ipynb
%run ../PyRosetta/RNAFolder.ipynb


Due to the on going maintenance burden of keeping command line application
wrappers up to date, we have decided to deprecate and eventually remove these
modules.

We instead now recommend building your command line and invoking it directly
with the subprocess module.


┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov and PyRosetta Team              │
│              (C) Copyright Rosetta Commons Member Institutions               │
│                                                                              │
│ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │
│              See LICENSE.md or email license@uw.edu for details              │
└──────────────────────────────────────────────────────────────────────────────┘
PyRosetta-4 2024 [Rosetta PyRosetta4.Release.python310.linux 2024.08+release.717d2e8232174371f0c672564f23a097062db88a 2024-02-21T10:16:44] retrieved from: http://www.pyrosetta.org



In [2]:
PDB_PATH="/home/venkata/python/PyRosetta/R1107/7qr4_clean_noprotein.pdb" #TODO: Change accordingly (path to the protein crystal structure)

true_pose=LoadedPDB(PDB_PATH)
new_pose=LoadedPDB(PDB_PATH)

# Loading the scoring function
scorefxn = pyrosetta.rosetta.core.scoring.ScoreFunctionFactory.create_score_function("rna/denovo/rna_hires");
scorefxn(true_pose) # Should be negative

-152.12682711052625

### Moving to a random pose

In [3]:
N_PERT=20000

# Generate a set of random perturbations
perts=generate_random_perturbations(new_pose,k=N_PERT)
for i,p in enumerate(perts):
    new_pose=p.apply(new_pose)
    if scorefxn(new_pose)>0:
        new_pose=p.apply(new_pose,inverse=True)
    if i%500==0: print(i,"of",N_PERT)

0 of 20000
500 of 20000
1000 of 20000
1500 of 20000
2000 of 20000
2500 of 20000
3000 of 20000
3500 of 20000
4000 of 20000
4500 of 20000
5000 of 20000
5500 of 20000
6000 of 20000
6500 of 20000
7000 of 20000
7500 of 20000
8000 of 20000
8500 of 20000
9000 of 20000
9500 of 20000
10000 of 20000
10500 of 20000
11000 of 20000
11500 of 20000
12000 of 20000
12500 of 20000
13000 of 20000
13500 of 20000
14000 of 20000
14500 of 20000
15000 of 20000
15500 of 20000
16000 of 20000
16500 of 20000
17000 of 20000
17500 of 20000
18000 of 20000
18500 of 20000
19000 of 20000
19500 of 20000


In [4]:
ref_pose=LoadedPDB(PDB_PATH)

N_PERT=1000000 # TODO: Change accordingly (This is the total number of steps. You can change this to suit your case. Generally, bigger RNA will need more)
TAKE_EVERY=N_PERT/1000 # TODO: Change accordingly (Save structure every 'k' snapshots. A good idea is to use N_PERT/1000 for 1000 structures)

# Pick the "new" random pose
pick_pose=new_pose

# Generate a set of random perturbations
perts=generate_random_perturbations(new_pose,k=N_PERT)
rmsds=[]
for i,p in enumerate(perts):
    new_pose=p.apply(new_pose)

    # If there is a clash (positive energy) reverse the change.
    if scorefxn(new_pose)>0:
        new_pose=p.apply(new_pose,inverse=True)

    # Pick
    if i%TAKE_EVERY==0:
        print(i,"of",N_PERT)
        aligned_pose=align_poses(ref_pose,new_pose,silent=True)
        rmsd=get_rmsd(ref_pose,aligned_pose)
        print("RMSD at step",i,":",rmsd)
        rmsds.append(rmsd)
rmsds=np.array(rmsds)

0 of 1000000
RMSD at step 0 : 31.61536447736471
1000 of 1000000
RMSD at step 1000 : 35.808484233949834
2000 of 1000000
RMSD at step 2000 : 44.29514292016612
3000 of 1000000
RMSD at step 3000 : 46.63615213972265
4000 of 1000000
RMSD at step 4000 : 37.48365824506734
5000 of 1000000
RMSD at step 5000 : 34.865869944054744
6000 of 1000000
RMSD at step 6000 : 37.992461280921795
7000 of 1000000
RMSD at step 7000 : 39.187736992847476
8000 of 1000000
RMSD at step 8000 : 29.092893383840956
9000 of 1000000
RMSD at step 9000 : 27.23925772029659
10000 of 1000000
RMSD at step 10000 : 24.83048097943314
11000 of 1000000
RMSD at step 11000 : 32.91081972596904
12000 of 1000000
RMSD at step 12000 : 30.28751936925461
13000 of 1000000
RMSD at step 13000 : 31.716809867709078
14000 of 1000000
RMSD at step 14000 : 34.02175766442806
15000 of 1000000
RMSD at step 15000 : 40.725094076986544
16000 of 1000000
RMSD at step 16000 : 34.46495712775113
17000 of 1000000
RMSD at step 17000 : 28.361367436303276
18000 of 1

In [5]:
#np.save("R1107/baseline_rmsds.npy",rmsds)
np.save(os.path.dirname(PDB_PATH)+"/baseline_rmsds.npy",rmsds)
print("Saved baseline to:",os.path.dirname(PDB_PATH)+"/baseline_rmsds.npy")

Saved baseline to: /home/venkata/python/PyRosetta/R1107/baseline_rmsds.npy


In [6]:
print("Baseline spread:",np.mean(rmsds),"+/-",np.std(rmsds))
print("Baseline median:",np.median(rmsds))
print("Baseline range:",np.min(rmsds),"to",np.max(rmsds))

Baseline spread: 25.904679410913992 +/- 3.540152352964534
Baseline median: 25.654822643330746
Baseline range: 17.48876861129199 to 57.198255960222035
