# DeepCGSA---An accurate tool for calculating SASA based on CG structures. 
**<font size=4 >The following notebook gives an example of using DeepCGSA.py to predict SASA based on example files.</font>**<br>
```
1. Create a bash script (test.sh) to converts all-atom example pdbs to different CG structures by using -c option of DeepCGSA.py.
2. Use DeepCGSA to predict SASA based on their CG structures, and write results to csv files.
3. Compare prediction with reference SASA calculated by NACCESS and summarizes all results in this notebook below.
P.S.
1. model_weight should be copied to current work path with DeepCGSA.py together. (Also with martinize.py, if you use -c martini)
2. To perform prediction with your own CG structures, please make sure the input file is in the same format as shown in example CG files.
3. This script was tested on Ubuntu18, so we recommend to run DeepCGSA.py on linux system.
```

In [1]:
%%file test.sh
#!/bin/bash

# converts all-atom pdb files to different CG structures
cp protein_example.pdb protein_example_AA.pdb
python DeepCGSA.py -f protein_example.pdb -c CA
python DeepCGSA.py -f protein_example.pdb -c CACB
python DeepCGSA.py -f protein_example.pdb -c martini   #or use martinize.py directly
cp RNA_example.pdb RNA_example_AA.pdb
python DeepCGSA.py -f RNA_example.pdb -c P
python DeepCGSA.py -f RNA_example.pdb -c 3SPN

# predict SASA with DeepCGSA (for CG structures) or NACCESS (for all-atom structures)
python DeepCGSA.py -f protein_example_CA.pdb -t CA -o protein_CA
python DeepCGSA.py -f protein_example_CACB.pdb -t CACB -o protein_CACB
python DeepCGSA.py -f protein_example_martini.pdb -t martini -o protein_martini
python DeepCGSA.py -f RNA_example_P.pdb -t P -o RNA_P
python DeepCGSA.py -f RNA_example_3SPN.pdb -t 3SPN -o RNA_3SPN
python DeepCGSA.py -f protein_example_AA.pdb -t AA -o protein_AA
python DeepCGSA.py -f RNA_example_AA.pdb -t AA -o RNA_AA

Overwriting test.sh


In [2]:
import os
import numpy as np
import pandas as pd
import scipy.stats

# predict SASA
os.system('bash test.sh')
df_protaa = pd.read_csv('protein_AA.csv')
df_protca = pd.read_csv('protein_CA.csv')
df_protcacb = pd.read_csv('protein_CACB.csv')
df_protmartini = pd.read_csv('protein_martini.csv')
df_rnaaa = pd.read_csv('RNA_AA.csv')
df_rnap = pd.read_csv('RNA_P.csv')
df_rna3spn = pd.read_csv('RNA_3SPN.csv')

# summarize results
dic = {}
dic['CG type'] = []
dic['pearson-R'] = []
for name,df_pred in zip(['Cα', 'Cα-Cβ', 'Martini'],[df_protca, df_protcacb, df_protmartini]):
    dic['CG type'].append(name)
    dic['pearson-R'].append(scipy.stats.pearsonr(df_pred['SASA'], df_protaa['SASA'])[0])
for name,df_pred in zip(['P-based', '3SPN'],[df_rnap, df_rna3spn]):
    dic['CG type'].append(name)
    dic['pearson-R'].append(scipy.stats.pearsonr(df_pred['SASA'], df_rnaaa['SASA'])[0])
print('Compared to all-atom calculation:')
pd.DataFrame.from_dict(dic)

Compared to all-atom calculation:


Unnamed: 0,CG type,pearson-R
0,Cα,0.962651
1,Cα-Cβ,0.984619
2,Martini,0.987261
3,P-based,0.875123
4,3SPN,0.981319
