<a href="https://colab.research.google.com/github/tetsufmbio/remolog/blob/main/notebooks/remolog_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Remolog: identifying remote homologs using structural alignment data

## Quick start

1. Press "Runtime" --> "Run all".
2. In the next cell (*Upload protein structure*), it will appear the bottom "Choose file". Click on it and choose one or more pdb file to be uploaded and analyzed by the pipeline*.
3. After the running, it will download a file named "remolog_final_result.tab". See its description in the end of this notebook.
4. Running time of this pipeline in Colab take ~10 min when using foldseek for screening**.

\* If you have only the amino acid sequence, you can predict its structure using the [AlphaFold Colab notebook](https://github.com/sokrypton/ColabFold)

\** If no remote homolog was found, try to increase the parameter "n" or change the screening method.


In [1]:
%cd /content
! if [ -d input ]; then \
    rm -rf input; \
  fi
! mkdir input


%cd input

from google.colab import files

uploaded = files.upload()
#@markdown  By running this cell, a bottom "Choose files" may appear. Click on it and choose the pdb file to be analyzed (you may upload multiple files).

%cd /content


/content
/content/input


Saving WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb to WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb
/content


## Setting parameters

In [2]:
screening = "foldseek" #@param ["foldseek","tmalign", "fatcat"]
#@markdown  - software to be used to retrieve structurely similar proteins in SCOPe database.

n = 20 #@param {type:"integer"}
#@markdown  - analyze n most similar protein structures.

database = "scope40" #@param ["scope40", "scope95"]
#@markdown  - protein structure database to be used.

jobName = "remolog_final_result" #@param {type: "string"}
#@markdown  - prefix for the final output name.


## Installing dependencies and downloading database

In [3]:
# setup environment variables
import os
os.environ['FATCAT'] = '/content/programs/FATCAT-dist'
os.environ['PATH'] += ':/content/programs/FATCAT-dist/FATCATMain:/content/bin'
os.environ['HEADN'] = str(n)
os.environ['SCREEN'] = screening

In [4]:
%cd /content/
! if [ ! -d bin ]; then mkdir bin; fi
! if [ ! -d programs ]; then mkdir programs; fi
! if [ ! -d view ]; then mkdir view; fi

/content


In [5]:
# download and install TMalign
%cd /content/bin
! if [ ! -e TMalign ]; then \
    wget "https://zhanggroup.org/TM-align/TMalign.cpp"; \
    g++ -static -O3 -ffast-math -lm -o TMalign TMalign.cpp; \
  fi


/content/bin
--2023-04-17 21:15:04--  https://zhanggroup.org/TM-align/TMalign.cpp
Resolving zhanggroup.org (zhanggroup.org)... 141.213.137.249
Connecting to zhanggroup.org (zhanggroup.org)|141.213.137.249|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 182097 (178K) [text/plain]
Saving to: ‘TMalign.cpp’


2023-04-17 21:15:05 (1.85 MB/s) - ‘TMalign.cpp’ saved [182097/182097]



In [6]:
# download and install FATCAT
%cd /content/programs
! if [ ! -d FATCAT-dist ]; then \
    git clone https://github.com/GodzikLab/FATCAT-dist.git; \
    cd FATCAT-dist/; ./Install; \
  fi


/content/programs
Cloning into 'FATCAT-dist'...
remote: Enumerating objects: 119, done.[K
remote: Counting objects: 100% (119/119), done.[K
remote: Compressing objects: 100% (104/104), done.[K
remote: Total 119 (delta 17), reused 109 (delta 13), pack-reused 0[K
Receiving objects: 100% (119/119), 2.37 MiB | 15.20 MiB/s, done.
Resolving deltas: 100% (17/17), done.
g++ -O2 -Wall -I./ -c SAlnOpt.C
g++ -O2 -Wall -I./ -c Prot.C
g++ -O2 -Wall -I./ -c AFPchain.C
  872 | [01;35m[K/[m[K/       j1 -|---------------------\
      | [01;35m[K^[m[K
[01m[KAFPchain.C:[m[K In member function ‘[01m[Kvoid AFPCHAIN::MergeAfp(int, int*, int*)[m[K’:
  445 |  int i, j, k, f1, f2, i1, i2, j1, [01;35m[Kj2[m[K, frag;
      |                                   [01;35m[K^~[m[K
[01m[KAFPchain.C:[m[K In member function ‘[01m[Kvoid AFPCHAIN::UpdateScore()[m[K’:
 1154 |  double [01;35m[Kt[m[K, conn, dvar;
      |         [01;35m[K^[m[K
[01m[KAFPchain.C:[m[K In member func

In [7]:
# download and install lovoalign
%cd /content/bin
! if [ ! -e lovoalign ]; then \
    wget "https://github.com/m3g/lovoalign/archive/refs/tags/22.0.0.tar.gz"; \
    tar -xzf 22.0.0.tar.gz; cd lovoalign-22.0.0/src; make; cp ../bin/lovoalign /content/bin; \
  fi

/content/bin
--2023-04-17 21:15:21--  https://github.com/m3g/lovoalign/archive/refs/tags/22.0.0.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/m3g/lovoalign/tar.gz/refs/tags/22.0.0 [following]
--2023-04-17 21:15:21--  https://codeload.github.com/m3g/lovoalign/tar.gz/refs/tags/22.0.0
Resolving codeload.github.com (codeload.github.com)... 140.82.112.9
Connecting to codeload.github.com (codeload.github.com)|140.82.112.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘22.0.0.tar.gz’

22.0.0.tar.gz           [ <=>                ]  31.57K  --.-KB/s    in 0.01s   

2023-04-17 21:15:21 (2.34 MB/s) - ‘22.0.0.tar.gz’ saved [32325]

 ------------------------------------------------------ 
 Compiling LovoAlign with gfortran 
 Flags: -O3 -ffast-math -llapack  
 -

In [8]:
# download some scripts and model
%cd /content/programs
! if [ ! -d remolog ]; then \
    git clone https://github.com/tetsufmbio/remolog.git; \
  fi

/content/programs
Cloning into 'remolog'...
remote: Enumerating objects: 140, done.[K
remote: Counting objects: 100% (140/140), done.[K
remote: Compressing objects: 100% (117/117), done.[K
remote: Total 140 (delta 63), reused 86 (delta 23), pack-reused 0[K
Receiving objects: 100% (140/140), 16.52 MiB | 17.21 MiB/s, done.
Resolving deltas: 100% (63/63), done.


In [9]:
%cd /content/programs
! if [ ! -e /content/bin/foldseek ]; then \
    wget https://mmseqs.com/foldseek/foldseek-linux-sse2.tar.gz; tar xvzf foldseek-linux-sse2.tar.gz; \
    cp foldseek/bin/foldseek /content/bin; \
  fi

/content/programs
--2023-04-17 21:15:30--  https://mmseqs.com/foldseek/foldseek-linux-sse2.tar.gz
Resolving mmseqs.com (mmseqs.com)... 141.5.100.26
Connecting to mmseqs.com (mmseqs.com)|141.5.100.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41044169 (39M) [application/octet-stream]
Saving to: ‘foldseek-linux-sse2.tar.gz’


2023-04-17 21:15:32 (19.5 MB/s) - ‘foldseek-linux-sse2.tar.gz’ saved [41044169/41044169]

foldseek/
foldseek/README.md
foldseek/bin/
foldseek/bin/foldseek


In [10]:
# download and format scope database
%cd /content
! if [ ! -d database ]; then \
    mkdir database; \
  fi;
%cd /content/database
! if [[ $database = "scope40" ]]; then \
    wget https://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-40-2.08.tgz; \
    tar -zxf pdbstyle-sel-gs-bib-40-2.08.tgz; mv pdbstyle-2.08/*/*.ent . ; rm -rf pdbstyle*; \
  elif [[ $database = "scope95" ]]; then \
    wget https://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-95-2.08.tgz; \
    tar -zxf pdbstyle-sel-gs-bib-95-2.08.tgz; mv pdbstyle-2.08/*/*.ent . ; rm -rf pdbstyle*; \
  fi; 
! ls *.ent > ../list_scope.tab; for i in *.ent; do mv $i $i.pdb; done;


/content
/content/database
--2023-04-17 21:15:35--  https://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-40-2.08.tgz
Resolving scop.berkeley.edu (scop.berkeley.edu)... 128.32.236.13
Connecting to scop.berkeley.edu (scop.berkeley.edu)|128.32.236.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1040288513 (992M) [application/x-gzip]
Saving to: ‘pdbstyle-sel-gs-bib-40-2.08.tgz’


2023-04-17 21:16:09 (29.6 MB/s) - ‘pdbstyle-sel-gs-bib-40-2.08.tgz’ saved [1040288513/1040288513]



In [11]:
# creating foldseek database
%cd /content
! if [ ! -d foldseek_data ]; then \
  mkdir foldseek_data; cd /content/foldseek_data; \
  foldseek createdb /content/database/ fs_data ; \
fi


/content
createdb /content/database/ fs_data 

MMseqs Version:        	543191f6dc9674607fdd6c0ecc173d9d748150ac
Chain name mode        	0
Mask b-factor threshold	0
Coord store mode       	2
Write lookup file      	1
Tar Inclusion Regex    	.*
Tar Exclusion Regex    	^$
Threads                	2
Verbosity              	3

Output file: fs_data
Time for merging to fs_data_ss: 0h 0m 0s 20ms
Time for merging to fs_data_h: 0h 0m 0s 11ms
Time for merging to fs_data_ca: 0h 0m 0s 55ms
Time for merging to fs_data: 0h 0m 0s 21ms
Ignore 0 out of 39611.
Too short: 0, incorrect  0.
Time for processing: 0h 0m 44s 87ms


## Running protein structure alignment

In [12]:
%cd /content
! if [ -d result ]; then rm -rf result ; fi
! mkdir result;
! mkdir result/screening;

/content


In [13]:
# screening for similar proteins using foldseek
%cd /content/input

! if [ $SCREEN = "foldseek" ]; then \
    for f in *; do \
      foldseek easy-search $f /content/foldseek_data/fs_data /content/result/screening/tmp.tab.fmt /content/tmpFolder --max-seqs $HEADN -e inf; \
      cut -f 1,2 /content/result/screening/tmp.tab.fmt | sort | uniq | perl -ne '@a = split(/\./, $_); print join(".", @a[0 .. $#a-2])."\n";' > /content/result/screening/$f.tab.fmt; \
      rm /content/result/screening/tmp.tab.fmt; \
    done; \
  fi

/content/input
Create directory /content/tmpFolder
easy-search WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb /content/foldseek_data/fs_data /content/result/screening/tmp.tab.fmt /content/tmpFolder --max-seqs 20 -e inf 

MMseqs Version:              	543191f6dc9674607fdd6c0ecc173d9d748150ac
Seq. id. threshold           	0
Coverage threshold           	0
Coverage mode                	0
Max reject                   	2147483647
Max accept                   	2147483647
Add backtrace                	false
TMscore threshold            	0
TMalign hit order            	0
TMalign fast                 	1
Preload mode                 	0
Threads                      	2
Verbosity                    	3
LDDT threshold               	0
Sort by structure bit score  	1
Substitution matrix          	aa:3di.out,nucl:3di.out
Alignment mode               	3
Alignment mode               	0
E-value threshold            	inf
Min alignment length         	0
Seq. id. mode             

In [14]:
# screening for similar proteins using FATCAT

%cd /content/input
! if [ $SCREEN = "fatcat" ]; then \
    for f in *; do \
      FATCATSearch.pl $f /content/list_scope.tab -b -i1 /content/input -i2 /content/database | \
      sort -k11nr | \
      head -n $HEADN | \
      perl /content/programs/remolog/scripts/format_result_FATCAT.pl - /content/programs/remolog/data/maxScore_fatcat.tab > /content/result/screening/$f.tab.fmt; \
    done; \
    cat /content/result/screening/*.fmt > /content/result/fatcat_formatted.tab; \
  fi

/content/input


In [15]:
# screening for similar proteins using tmalign

%cd /content/input
! if [ $SCREEN = "tmalign" ]; then \
  for f in *; do \
    if [ -f /content/result/screening/${f}.tab ]; then \
      rm /content/result/screening/${f}.tab; \
    fi; \
    for l in $(cat /content/list_scope.tab); \
      do TMalign /content/input/$f /content/database/${l}.pdb | perl /content/programs/remolog/scripts/parser_TMalign.pl - >> /content/result/screening/${f}.tab ; \
      done; \
    sort -k3nr /content/result/screening/${f}.tab | grep ${f} | head -n $HEADN > /content/result/screening/${f}.tab.fmt; \
    done; \
  cat /content/result/screening/*.fmt > /content/result/tmalign_formatted.tab; \
fi

/content/input


In [16]:
%cd /content/input
! if [ $SCREEN != "fatcat" ]; then \
  if [ -f /content/result/fatcat_formatted.tab ]; then \
    rm /content/result/fatcat_formatted.tab; \
  fi; \
  for f in *; do \
    for l in $(cut -f2 /content/result/screening/$f.tab.fmt); do \
      FATCAT -p1 $f -p2 $l.ent.pdb -i1 /content/input -i2 /content/database -b | \
      perl /content/programs/remolog/scripts/format_result_FATCAT.pl - /content/programs/remolog/data/maxScore_fatcat.tab >>  /content/result/fatcat_formatted.tab ; \
    done; \
  done; \
  fi

/content/input


In [17]:
%cd /content/input
! if [ $SCREEN != "tmalign" ]; then \
    if [ -f /content/result/tmalign_formatted.tab ]; then \
      rm /content/result/tmalign_formatted.tab; \
    fi; \
    for f in *; do \
      for l in $(cut -f2 /content/result/screening/$f.tab.fmt); do \
        TMalign /content/input/$f /content/database/$l.ent.pdb | perl /content/programs/remolog/scripts/parser_TMalign.pl - >> /content/result/tmalign_formatted.tab ; \
      done; \
    done;\
  fi

/content/input


In [18]:
%cd /content/input
! if [ $SCREEN != "lovoalign" ]; then \
    if [ -f /content/result/lovoalign_formatted.tab ]; then \
      rm /content/result/lovoalign_formatted.tab; \
    fi; \
    for f in *; do \
      for l in $(cut -f2 /content/result/screening/$f.tab.fmt); do \
        lovoalign -p1 /content/input/$f -p2 /content/database/$l.ent.pdb | perl /content/programs/remolog/scripts/parser_lovoalign.pl - >> /content/result/lovoalign_formatted.tab; \
      done; \
    done; \
  fi

/content/input


In [19]:
%cd /content/result
! perl /content/programs/remolog/scripts/join_table.pl /content/result/fatcat_formatted.tab /content/result/tmalign_formatted.tab | \
  perl /content/programs/remolog/scripts/join_table.pl - /content/result/lovoalign_formatted.tab | \
  perl /content/programs/remolog/scripts/add_scope_class.pl - /content/programs/remolog/data/dir.cla.scope.2.08-stable_filtered40.txt > result.tab

/content/result


## Making prediction and writing the results

In [20]:
from joblib import load
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
import pandas as pd
import numpy as np

clf = load('/content/programs/remolog/model.joblib')

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [21]:
header = ["query", "subject",
          "lovo_finalScore", "lovo_coverage", "lovo_rmsd", "lovo_gaps", "lovo_relCov", "lovo_relGaps", "lovo_finalScoreNorm",
          "tm_AliLen", "tm_RMSD", "tm_n_ident/n_aln", "tm_TM-score (chain 2)", "tm_d0 (chain 2)","tm_cov",
          "fatcat_subject-len", "fatcat_Twists", "fatcat_ini-len", "fatcat_ini-rmsd", "fatcat_opt-equ", "fatcat_opt-rmsd", "fatcat_chain-rmsd", "fatcat_Score", "fatcat_align-len", "fatcat_Gaps", "fatcat_rel_score", "fatcat_rel_align",
          'cl=46456', 'cl=48724', 'cl=51349', 'cl=53931', 'cl=56572', 'cl=56835', 'cl=56992',
            ]
data = pd.read_csv("/content/result/result.tab", sep="\t", header=None)
data.columns = header

annot = pd.read_csv("/content/programs/remolog/data/dir.cla.scope.2.08-stable_filtered40.txt", header=None, sep="\t")
annot[['cl','cf','sf','fa','dm','sp','px']] = annot[5].str.split(',',expand=True)


In [22]:
X = data.iloc[:,2:]
pred = clf.predict(X)
pred_proba = clf.predict_proba(X)
data["pred"] = pred
data["pred_proba"] = pred_proba[:,1]



In [23]:
annot.index = annot.loc[:, 0]
annot = annot.loc[:,["cl","cf","sf"]]
data = data.join(annot, on="subject")

In [24]:
data = data.sort_values(by=["query", "pred_proba"], ascending=[True, False])

In [25]:
# write and download result file
from google.colab import files

data.to_csv("/content/result/"+jobName+".tab", index=False)
files.download("/content/result/"+jobName+".tab")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [26]:
# function to display table
from google.colab import data_table
data_table.enable_dataframe_formatter()

def hyperlink(path):
	
    # returns the substring of a path

    pathList = path.split("=")
    f_url = pathList[len(pathList)-1]
    path="https://scop.berkeley.edu/search/?ver=2.08&key="+f_url
    #print(f_url)
    
    # convert the path into clickable link
    return '<a target="_blank" href="{}">{}</a>'.format(path, f_url)


In [27]:
# functions to display 3d alignment
! pip install py3Dmol
import py3Dmol
import glob
import matplotlib.pyplot as plt 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting py3Dmol
  Downloading py3Dmol-2.0.1.post1-py2.py3-none-any.whl (12 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.0.1.post1


In [28]:
import ipywidgets as widgets

def print_query(subject):
    print(subject)

def select_subject(query):
    subject_picker.options = list(data[data["query"] == query].subject)

query_list = data.loc[:,"query"].unique()
query_picker = widgets.Dropdown(options=query_list, value=query_list[0])
subject_list = list(data[data["query"] == query_list[0]].subject)
subject_picker = widgets.Dropdown(options=subject_list, value=subject_list[0])
j = widgets.interactive(print_query, subject=subject_picker)
i = widgets.interactive(select_subject, query=query_picker)

#button = widgets.Button(description="Submit")
#output = widgets.Output()
#display(button)

def display_str():
  selected_subject = data[(data["subject"] == subject_picker.value) & (data["query"] == query_picker.value)]
  os.chdir("/content/view")
  os.system("rm *")
  os.system("FATCAT -p1 "+query_picker.value+" -p2 "+subject_picker.value+".ent.pdb -i1 /content/input -i2 /content/database -t")

  with open("/content/view/tmp.opt.twist.pdb") as ifile:
      system = "".join([x for x in ifile])
      
  view = py3Dmol.view(width=400, height=300)
  view.addModelsAsFrames(system)

  view.setStyle({'chain':'A'}, {'cartoon':{'color':'blue'}})
  view.setStyle({'chain':'B'}, {'cartoon':{'color':'yellow'}})
  view.zoomTo()
  view.show()


  display(pd.DataFrame(selected_subject))


#button.on_click(on_button_clicked)



## Display table 

In [29]:
data2 = data.style.format({'subject': hyperlink, 'sf': hyperlink})
data2

Unnamed: 0,query,subject,lovo_finalScore,lovo_coverage,lovo_rmsd,lovo_gaps,lovo_relCov,lovo_relGaps,lovo_finalScoreNorm,tm_AliLen,tm_RMSD,tm_n_ident/n_aln,tm_TM-score (chain 2),tm_d0 (chain 2),tm_cov,fatcat_subject-len,fatcat_Twists,fatcat_ini-len,fatcat_ini-rmsd,fatcat_opt-equ,fatcat_opt-rmsd,fatcat_chain-rmsd,fatcat_Score,fatcat_align-len,fatcat_Gaps,fatcat_rel_score,fatcat_rel_align,cl=46456,cl=48724,cl=51349,cl=53931,cl=56572,cl=56835,cl=56992,pred,pred_proba,cl,cf,sf
3,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1guia_,0.26642,135,3.26252,16,0.871,0.1185,0.654878,136,3.29,0.14,0.65425,4.64,0.8774,155,0,80,3.8,131,3.03,3.8,120.75,194,63,0.2648,0.8452,0,1,0,0,0,0,0,1,0.777703,cl=48724,cf=49784,49785
1,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1cx1a1,0.270919,141,3.505395,21,0.9338,0.1489,0.683576,140,3.35,0.114,0.68259,4.58,0.9272,151,2,112,3.23,134,3.19,4.64,106.51,188,54,0.2466,0.8874,0,1,0,0,0,0,0,1,0.591836,cl=48724,cf=49784,49785
2,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1gu3a_,0.254955,129,3.294609,18,0.9085,0.1395,0.684069,131,3.29,0.099,0.68157,4.43,0.9225,142,0,96,3.31,125,3.11,3.31,110.86,175,50,0.2717,0.8803,0,1,0,0,0,0,0,1,0.527407,cl=48724,cf=49784,49785
12,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d4bj0a1,0.294247,145,3.147365,21,0.8788,0.1448,0.679444,146,3.16,0.089,0.67823,4.79,0.8848,165,0,104,3.9,139,3.18,3.9,131.01,195,56,0.2729,0.8424,0,1,0,0,0,0,0,1,0.513047,cl=48724,cf=49784,49785
14,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d4leja1,0.186949,167,24.820392,13,1.0,0.0778,0.426513,102,4.45,0.108,0.41472,4.82,0.6108,167,3,96,6.2,120,3.18,16.42,105.18,221,101,0.2191,0.7186,0,1,0,0,0,0,0,0,0.40761,cl=48724,cf=51181,51182
11,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d3rf3a1,0.176324,117,10.754425,16,0.9213,0.1368,0.528972,99,3.76,0.101,0.50176,4.18,0.7795,127,1,104,3.01,112,3.26,13.51,224.42,124,12,0.6234,0.8819,1,0,0,0,0,0,0,0,0.313741,cl=46456,cf=47161,47220
5,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1nc7a1,0.148697,99,13.884749,12,0.9167,0.1212,0.524569,87,3.35,0.138,0.52684,3.82,0.8056,108,1,56,3.31,78,3.1,4.41,81.29,117,39,0.2605,0.7222,0,1,0,0,0,0,0,0,0.290532,cl=48724,cf=89231,89232
4,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1j2ha_,0.160505,110,11.048299,8,0.9821,0.0727,0.546005,79,2.67,0.063,0.53397,3.9,0.7054,112,1,88,1.57,99,2.2,7.2,247.82,107,8,0.7943,0.8839,1,0,0,0,0,0,0,0,0.216126,cl=46456,cf=46965,89009
6,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d1nzea_,0.152333,111,16.598669,11,0.9911,0.0991,0.518204,84,3.49,0.083,0.50936,3.9,0.75,112,2,96,3.03,102,1.8,15.56,231.77,106,4,0.7429,0.9107,1,0,0,0,0,0,0,0,0.211696,cl=46456,cf=47161,101112
13,WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000.pdb,d4f01a2,0.157462,95,10.251438,10,0.9896,0.1053,0.624928,81,2.67,0.086,0.623,3.57,0.8438,96,1,88,3.11,89,2.44,7.35,217.47,110,21,0.8237,0.9271,1,0,0,0,0,0,0,0,0.193445,cl=46456,cf=46996,100934


In [30]:
data_table.disable_dataframe_formatter()

Description of the columns
- query: Query name
- subject: Subject name

Lovoalign
- lovo_finalScore: Final score;
- lovo_coverage: Alignment coverage;
- lovo_rmsd: RMSD;
- lovo_gaps: # of gaps;
- lovo_relCov: proportion of the coverage and subject length
- lovo_relGaps: propotion of gaps to coverage
- lovo_finalScoreNorm: Normalized score;

TM-align
- tm_AliLen: Alignment length;
- tm_RMSD: RMSD;
- tm_n_ident/n_aln: proportion of # identical atom and aligned length;
- tm_TM-score (chain 2): TM-score normalized by subject;
- tm_d0 (chain 2): scale factor used to calculate TM-score; 
- tm_cov: coverage of the alignment (subject)

FATCAT
- fatcat_subject-len: subject length;
- fatcat_Twists: # of twists;
- fatcat_ini-len: Initial alignment length;
- fatcat_ini-rmsd: Initial RMSD;
- fatcat_opt-equ: # of equivalent positions in the alignment;
- fatcat_opt-rmsd: RMSD of aligned Cα atoms of the input structures with structural rearragement;
- fatcat_chain-rmsd: RMSD of aligned Cα atoms of the input structures without structural rearragement;
- fatcat_Score: Alignment score
- fatcat_align-len: Alignment length 
- fatcat_Gaps: # of gaps in the alignment
- fatcat_rel_score: proportion of the alignment score and maximum score
- fatcat_rel_align: proportion of the # of aligned position with subject length

Prediction
- pred: prediction (0: not remote homolog; 1: remote homolog)
- pred_proba: prediction probability

SCOPe annotation
- cl: subject SCOPe class
- cf: subject SCOPe fold
- sf: subject SCOPe superfamily

## Display protein structure alignment

In [31]:
print("Choose a query")
display(query_picker)
print("Choose the Subject")
display(subject_picker)

Choose a query


Dropdown(description='query', options=('WP_0004663221_495e4_unrelaxed_rank_001_alphafold2_ptm_model_5_seed_000…

Choose the Subject


Dropdown(description='subject', options=('d1guia_', 'd1cx1a1', 'd1gu3a_', 'd4bj0a1', 'd4leja1', 'd3rf3a1', 'd1…

In [32]:
# Choose a query and a subject in the cell above and run this cell to display
# the structure alignement performed by FATCAT.
# chain in blue: Query
# chain in yellow: Subject
display_str()

Unnamed: 0,query,subject,lovo_finalScore,lovo_coverage,lovo_rmsd,lovo_gaps,lovo_relCov,lovo_relGaps,lovo_finalScoreNorm,tm_AliLen,...,cl=51349,cl=53931,cl=56572,cl=56835,cl=56992,pred,pred_proba,cl,cf,sf
3,WP_0004663221_495e4_unrelaxed_rank_001_alphafo...,d1guia_,0.26642,135,3.26252,16,0.871,0.1185,0.654878,136,...,0,0,0,0,0,1,0.777703,cl=48724,cf=49784,sf=49785
